Performance
Released in December 2012, Fluids v.3 is the fastest, large scale, open source, GPU-based fluid simulator around. At least until next year anyway.
Measurement of interactive SPH fluid simulations in academic papers and commercial products, are typically reported with # of particles, and frames per second, for a given hardware. Measurements of offline simulations for motion pictures are typically given as total simulation time required for a number of frames.
Using these measures, Fluids v.3 achieves the following:
– 8,388,608 particles at 1/4 fps on a GeForce GTX460M (192 cores), or 2880 frames (2 mins actual time) simulated in 3.5 hours
– 4,194,304 particles at 1/2 fps on a GeForce GTX460M, or 2880 frames (2 mins actual) in 1.5 hours
– 1,048,576 particles at 4.2 fps, GTX460M, or 2880 frames (2 mins actual) in 11 minutes.
– 262,144 particles at 23 fps, GTX460M, or 2880 frames (2 mins actual) in 2 minutes.
– 65,536 particles at 113 fps, and 16,384 particles at 434 fps.
UPDATE: I recently tested Fluids v.3 on a newer GeForce GTX670, with 1334 cores, with:
– 4,194,304 particles at 4 fps, or 2880 frames (2 mins actual) in 11 minutes
– 1,048,576 particles at 20 fps
I. Hardware Algorithm Efficiency
While number of particles and frames-per-second gives us a picture of real world performance in an application, it is more useful to compare pure algorithm efficiency regardless of the number of particles. This can be found by multiplying # particles by the frames per second, to get the average number of particles simulated per second, on a given hardware platform.
H.E. (in particles per second) = # Particles * fps
Figure 1. Algorithm performance for various SPH fluid simulators, measured using Hardware Efficiency metric (pps), relative to the number of particles. Orange curve is the current Fluids v.3 simulator, on a GeForce GTX460M. Blue line is NVIDIA’s PhysX, measured from Spaete’s Fluid Sandbox. Green and Purple lines are recent academic results by Pajarola (2010), and Fang Chao (2010).
Figure 1 shows the hardware-based algorithm efficiency for various SPH fluid simulators. These results were calculated by running each SPH simulator on a GeForce GTX460M, disabling any advanced rendering, and measuring pure simulation efficiency as a frame rate for a given number of particles. The hardware efficiency, H.A.E is computed from this.
# Particles | msec / frame | Hardware Efficiency (particles per second) |
Fluids v.3 (GPU) | ||
4,096 | 0.68 | 6,113,432 |
8,192 | 1.30 | 6,301,538 |
16,384 | 2.30 | 7,123,478 |
32,767 | 4.20 | 7,801,666 |
65,536 | 8.80 | 7,447,272 |
131,072 | 18.21 | 7,197,803 |
262,144 | 42.30 | 6,197,257 |
524,288 | 98.00 | 5,349,877 |
1,048,576 | 234.00 | 4,481,094 |
2,900,800 | 1085.00 | 2,673,548 |
8,388,608 | 4433.00 | 1,892,309 |
Fluid Sandbox, NVIDIA PhysX (GPU) | ||
26,000 | 41.67 | 624,000 |
74,140 | 60.75 | 1,220,344 |
102,440 | 68.92 | 1,486,404 |
200,000 | 104.17 | 1,920,000 |
Fang Chao (OpenCL) | ||
16,384 | 7.69 | 2,129,920 |
65,536 | 28.57 | 2,293,760 |
Pajarola, 2010 | ||
16,128 | 8.13 | 1,983,744 |
75,200 | 38.46 | 1,955,200 |
129,024 | 58.82 | 2,193,408 |
255,600 | 100.00 | 2,556,000 |
RealFlow 2012 ** | ||
3,987 | 20 secs | 22,327 |
27,865 | 203 secs | 13,726 |
158,778 | 755 secs | 7,781 |
1,200,000 | 8 hours | 8,571 |
2,700,000 | 14 hours, 53 min | 13,303 |
** RealFlow 2012 uses an adaptive time step, and hybrid fluid-grid methods for increased realism. Thus comparisons should be taken lightly. Measurements based on simulation experiments reported on youtube.
II. Algorithm Efficiency
Hardware efficiency, above, is independent of number of particles, but still depends on the capabilities of the GPU. A better measure would report pure algorithm efficiency normalized for different hardware. This can be accomplished by dividing by the peak GFlop rating of the GPU.
A.E. = particles per second per Gflop
Measuring pure Algorithm Efficiency requires us to run the SPH simulation on a number of different hardware devices. At present, I have measure Fluids v.3. on a Tesla GeForce GTX460M (92 cores), and a Kepler GeForce GTX670 (1334 cores).
Initial results are as follows:
# Particles | H.E. (pps) | Hardware | A.E. (pps per Gflop) |
1,048,576 | 4,481,094 | GTX 460M, 192 core, 518.4 Gflops | 8644 |
1,048,576 | 12,000,000 | GTX 670, 1334 core, 2460 Gflops | 4878 |
More tests are needed (see Development page). What this shows is that simulation efficiency varies both with number of particles, and with the underlying hardware. While a jump in hardware from 518 Gflops to 2460 Gflops should result in a 4.75x increase, the actual increase is only 2.67x. The reasons for this are subtle. More measurements, for different number of particles, and different hardware, should provide a clearer picture. Overall, Fluids v.3 achieves 4,400,00 pps on a GTX460M, running 4 million particles at 1/4 fps, and 12,000,000 pps efficiency on a GeForce GTX 670, allowing simulations of 4 million particles at 4 fps.
Leave a Reply