Shkset
ZEUS-3D Benchmarks
for Problem Shkset
Cray Y-MP The data below were obtained for ZEUS-3D version 3.2.1. The Y-MP data is used below as a standard of comparison for other machines. A few CALMATH library BLAS routines boost performance by several percent. All routines were compiled with: cft77 -ez Speedup/
Processors tused(s) Zone-Cycles/sec MFLOPS YMPs Speedup Processor
1 3.6212 183144 178.72 1.00 1.00 1.00
For 6400 zones, the Y-MP performed 190293 zone-cycles/sec, or 190.18 MFLOPS.Cray C90 The data below were obtained for ZEUS-3D version 3.2.1. All routines were compiled with: cft77 -ez Speedup/
Processors tused(s) Zone-Cycles/sec MFLOPS YMPs Speedup Processor
1 1.6012 414189 403.40 2.26 1.00 1.00
SGI Power Challenge The data below was obtained with a version of ZEUS-3D modified by Robert Fiedler to run in parallel on this shared-memory machine. The performance analyzer recognizes the parallelism in the algorithm at the loop level and automatically inserts the appropriate directives. These runs were performed in multi-user mode under a relatively light system load. Single-user mode data should be similar, except for improved performance when the job requires nearly all available processors. Compiled: f77 -c -O3 -w1 -g3 -pfa list -WK,-ROUNDOFF=3,-SO=3,-AS=L 1) FIXED PROBLEM SIZE Speedup/ Procs Zones tused(s) Zone-Cycles/sec MFLOPS YMPs Speedup Processor 1 6400 7.8889 89239 87.40 0.46 1.00 1.00 2 6400 4.6746 150603 150.10 0.79 1.68 0.84 3 6400 3.4797 202318 201.40 1.06 2.26 0.75 4 6400 3.1080 226510 226.10 1.19 2.53 0.63 5 6400 2.6594 264717 264.10 1.39 2.96 0.59 6 6400 2.4133 291722 290.70 1.53 3.26 0.54 7 6400 2.2889 307571 305.90 1.61 3.44 0.49 8 6400 2.2591 311623 309.70 1.63 3.49 0.43 9 6400 2.1876 321808 321.10 1.69 3.60 0.40 10 6400 2.3504 299523 298.30 1.57 3.35 0.33 11 6400 2.3048 305455 304.00 1.60 3.42 0.31 12 6400 2.4835 283466 281.20 1.48 3.17 0.26 13 6400 2.4182 291124 288.80 1.52 3.26 0.25 14 6400 2.5368 277519 275.50 1.45 3.10 0.22 15 6400 4.4041 159851 159.60 0.84 1.79 0.11 16 6400 4.7874 147051 146.30 0.77 1.64 0.102) FIXED AMOUNT OF WORK PER PROCESSOR The numerical solution varies wildly with the number of zones (but not the number of processors). Perhaps the MoC algorithm should be used only with 64-bit arithmetic. However, on the SGI Challenge, the solution converges as expected when the number of zones increases. Speedup/ Procs Zones tused(s) Zone-Cycles/sec MFLOPS YMPs Speedup Processor 1 400 2.6958 62764 60.80 0.32 1.00 1.00 2 800 6.7306 98535 96.90 0.51 1.56 0.78 3 1200 20.006 134061 133.00 0.70 2.13 0.71 4 1600 20.003 167174 165.30 0.87 2.66 0.66 5 2000 20.007 199928 199.50 1.05 3.18 0.63 6 2400 1.3618 158614 157.70 0.83 2.52 0.42 7 2800 20.008 258473 256.50 1.35 4.11 0.58 8 3200 11.714 265806 264.10 1.39 4.23 0.52 9 3600 1.4390 197639 195.70 1.03 3.14 0.34 10 4000 3.7464 239165 237.50 1.25 3.81 0.38 11 4400 1.7949 220625 218.50 1.15 3.51 0.31 12 4800 3.6850 257910 256.50 1.35 4.10 0.34 13 5200 20.007 287986 286.90 1.51 4.58 0.35 14 5600 20.009 274557 273.60 1.44 4.37 0.31 15 6000 20.003 176970 174.80 0.92 2.81 0.18 16 6400 4.8083 146412 144.40 0.76 2.3 0.14 SGI Challenge The data below was obtained with a version of ZEUS-3D modified by Robert Fiedler to run in parallel on this shared-memory machine. The performance analyzer recognizes the parallelism in the algorithm at the loop level and automatically inserts the appropriate directives. These runs were performed in single-user mode. Multi-user mode data obtined under a relatively light system load is essentialy similar. Compile: f77 -c -O2 -pfa list, -WK,-roundoff=2,-AS=L -w1 -g3 -Nq9999 1) FIXED PROBLEM SIZE
Processors Zones MFLOPS Y-MPs Speedup Speedup/Processor
1 6400 20.16 0.11 1.00 1.00
2 6400 38.17 0.20 1.89 0.95
3 6400 57.77 0.30 2.87 0.96
4 6400 74.69 0.39 3.71 0.93
5 6400 87.14 0.46 4.32 0.86
6 6400 100.93 0.53 5.01 0.83
7 6400 122.49 0.64 6.08 0.87
8 6400 126.73 0.67 6.29 0.79
9 6400 137.07 0.72 6.80 0.76
10 6400 156.54 0.82 7.77 0.78
11 6400 161.82 0.85 8.03 0.73
12 6400 160.28 0.84 7.95 0.66
13 6400 170.71 0.90 8.47 0.65
14 6400 190.21 1.00 9.44 0.67
15 6400 198.94 1.05 9.87 0.66
16 6400 203.11 1.07 10.08 0.63
17 6400 197.50 1.04 9.80 0.58
18 6400 214.53 1.13 10.64 0.59
19 6400 220.55 1.16 10.94 0.58
20 6400 207.26 1.09 10.28 0.51
21 6400 224.74 1.18 11.15 0.53
22 6400 211.21 1.11 10.48 0.48
23 6400 222.64 1.17 11.05 0.48
24 6400 240.34 1.26 11.92 0.50
25 6400 217.03 1.14 10.77 0.43
26 6400 247.21 1.30 12.26 0.47
27 6400 239.24 1.26 11.87 0.44
28 6400 238.62 1.25 11.84 0.42
29 6400 227.63 1.20 11.29 0.39
30 6400 236.36 1.24 11.73 0.39
31 6400 233.87 1.23 11.60 0.37
32 6400 225.93 1.19 11.21 0.35
2) FIXED AMOUNT OF WORK PER PROCESSOR
Processors Zones MFLOPS Y-MPs Speedup Speedup/Processor
1 200 18.38 0.10 1.00 1.00
2 400 32.03 0.17 1.74 0.87
3 600 46.00 0.24 2.50 0.83
4 800 62.04 0.33 3.38 0.84
5 1000 64.48 0.34 3.51 0.70
6 1200 82.99 0.44 4.52 0.75
7 1400 93.13 0.49 5.07 0.72
8 1600 104.57 0.55 5.69 0.71
9 1800 115.70 0.61 6.30 0.70
10 2000 117.83 0.62 6.41 0.64
11 2200 128.20 0.67 6.98 0.63
12 2400 134.71 0.71 7.33 0.61
13 2600 114.63 0.60 6.24 0.48
14 2800 157.43 0.83 8.57 0.61
15 3000 161.24 0.85 8.77 0.58
16 3200 172.30 0.91 9.38 0.59
17 3400 181.08 0.95 9.85 0.58
18 3600 175.22 0.92 9.54 0.53
19 3800 136.40 0.72 7.42 0.39
20 4000 182.99 0.96 9.96 0.50
21 4200 141.61 0.74 7.71 0.37
22 4400 117.05 0.62 6.37 0.29
23 4600 214.03 1.13 11.65 0.51
24 4800 213.75 1.12 11.63 0.48
25 5000 190.21 1.00 10.35 0.41
26 5200 158.60 0.83 8.63 0.33
27 5400 153.34 0.81 8.34 0.31
28 5600 223.41 1.17 12.16 0.43
29 5800 173.34 0.91 9.43 0.33
30 6000 226.62 1.19 12.33 0.41
31 6200 123.13 0.65 6.70 0.22
32 6400 242.22 1.27 13.18 0.41
SGI Indigo 2 Extreme The data below was obtained with ZEUS-3D version 3.2.1. All routines were compiled with: f77 -c -O2 -w1 -g3 -Nq9999 Speedup/
Processors tused(s) Zone-Cycles/sec MFLOPS YMPs Speedup Processor
1 27.760 23891 23.31 0.13 1.00 1.00
HP 715/80 The data below was obtained with ZEUS-3D version 3.2.1 ported to HP-UX. All routines were compiled with: f77 -c +O3 Speedup/
Processors tused(s) Zone-Cycles/sec MFLOPS YMPs Speedup Processor
1 26.090 25420 24.81 0.14 1.00 1.00
Convex C3880 The data below was obtained with ZEUS-3D version 3.2.1. All routines were compiled with: fc -c -fi -O2 -nw -or none -db Speedup/
Processors tused(s) Zone-Cycles/sec MFLOPS YMPs Speedup Processor
1 8.2914 79986 78.05 0.44 1.00 1.00
|
Back to Zeus-3D Main