HP/Convex Exemplar SPP-1200
GRID: 32 x 32 x 32 per processor(tile)
ZEUS-MP (10 Steps)
Speedup/
Processors Layout Wall Clock tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Processor
1 1x1x1 13.23 12.90 25247 23.11 .21 1.00 1.00
2 2x1x1 13.15 12.64 51553 47.20 .43 2.04 1.02
2 1x2x1 13.01 12.56 51903 47.52 .43 2.06 1.03
2 1x1x2 12.89 12.59 51753 47.38 .43 2.05 1.02
4 2x2x1 13.57 12.75 102294 93.66 .85 4.05 1.01
4 2x1x2 13.47 12.65 103073 94.37 .86 4.08 1.02
4 1x2x2 13.75 12.93 100848 92.33 .84 3.99 1.00
8 2x2x2 15.45 13.65 190981 174.86 1.59 7.56 .95
12 3x2x2 16.98 15.23 253011 231.65 2.11 10.02 .84
12 2x3x2 17.88 16.26 236404 216.44 1.97 9.36 .78
12 2x2x3 16.54 15.12 256747 235.07 2.14 10.17 .85
16 4x2x2 16.26 14.58 355321 325.32 2.96 14.07 .88
16 2x4x2 16.54 14.27 363743 333.03 3.03 14.41 .90
16 2x2x4 16.41 14.10 369519 338.32 3.08 14.64 .91
24 4x3x2 19.80 17.90 431783 395.33 3.59 17.10 .71
24 4x2x3 20.61 18.78 410831 376.14 3.42 16.27 .68
24 3x4x2 20.23 17.78 433639 397.03 3.61 17.18 .72
24 3x2x4 19.59 16.84 449868 411.88 3.74 17.82 .74
24 2x4x3 19.42 17.25 449888 411.90 3.74 17.82 .74
24 2x3x4 19.73 17.62 441053 403.81 3.67 17.47 .73
ZEUS-3D (10 Steps)
Speedup/
Processors Wall Clock tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Processor
1 21.54 19.42 16872 14.75 .08 1.00 1.00
2 20.63 17.74 36954 31.52 .12 2.19 1.10
4 24.17 19.90 65874 55.07 .19 3.90 .98
8 34.30 27.33 95911 78.49 .27 5.68 .71
12 40.04 31.81 123633 108.08 .30 7.33 .61
16 49.79 35.90 146053 118.26 .33 8.66 .54
GRID: 64 x 64 x 64 per processor(tile)
ZEUS-MP (10 Steps)
Speedup/
Processors Layout Wall Clock tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Processor
1 1x1x1 106.25 104.52 24980 19.85 .13 1.00 1.00
2 2x1x1 106.92 104.57 49944 39.69 .26 2.00 1.00
2 1x2x1 106.09 103.77 50321 39.99 .26 2.01 1.01
2 1x1x2 105.64 103.86 50259 39.94 .26 2.01 1.01
4 2x2x1 112.14 108.39 96210 76.45 .50 3.85 .96
4 2x1x2 124.16 121.02 86254 68.54 .45 3.45 .86
4 1x2x2 112.23 109.14 95720 76.06 .50 3.83 .96
8 2x2x2 133.61 125.29 166092 131.99 .87 6.65 .83
12 3x2x2 162.52 155.99 198564 157.79 1.04 7.95 .66
12 2x3x2 130.79 124.72 248235 197.26 1.30 9.94 .83
12 2x2x3 157.01 149.75 205874 163.60 1.08 8.24 .69
16 4x2x2 158.87 150.34 277229 220.30 1.45 11.10 .69
16 2x4x2 146.82 137.35 303961 241.54 1.59 12.17 .76
16 2x2x4 164.91 155.49 268922 213.70 1.41 10.77 .67
24 4x3x2 169.09 170.45 363080 288.52 1.90 14.53 .61
24 4x2x3 162.37 151.38 405550 322.27 2.12 16.23 .68
24 3x4x2 160.57 150.35 410881 326.51 2.15 16.45 .69
24 3x2x4 168.81 158.54 393492 312.69 2.06 15.75 .66
24 2x4x3 150.04 141.05 443702 352.59 2.32 17.76 .74
24 2x3x4 179.96 167.63 365722 290.62 1.91 14.64 .61
ZEUS-3D (10 Steps)
Speedup/
Processors Wall Clock tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Processor
1 160.30 154.34 16985 13.90 .05 1.00 1.00
2 149.66 137.18 38218 30.95 .09 2.25 1.13
4 185.32 165.53 63347 51.29 .14 3.73 .93
8 259.98 217.56 96395 78.05 .22 5.68 .71
12 315.24 255.90 122927 100.60 .28 7.24 .60
16 348.87 285.02 147156 119.16 .33 8.66 .54
GRID: 128 x 64 x 64 per processor(tile)
COMMENT: For 16 Processors (512 x 128 x 128), ZEUS-3D requires practically all
available global shared memory allowed in the dedicated batch queue (1536 MB),
and consequently runs extremely slowly.
ZEUS-MP (10 Steps)
Speedup/
Processors Layout Wall Clock tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Processor
1 1x1x1 180.37 177.20 29465 22.89 .14 1.00 1.00
2 2x1x1 184.40 180.38 57894 44.98 .28 1.96 .98
2 1x2x1 183.32 178.76 58418 45.39 .28 1.98 .99
2 1x1x2 185.76 182.05 57355 44.56 .27 1.95 .97
4 2x2x1 191.42 184.98 112802 87.64 .54 3.83 .96
4 2x1x2 190.85 185.37 112522 87.42 .54 3.82 .95
4 1x2x2 239.22 232.97 89528 69.56 .43 3.04 .76
8 2x2x2 277.92 273.16 152968 118.85 .73 5.19 .65
12 3x2x2 274.88 264.13 235129 182.68 1.12 7.98 .66
12 2x3x2 282.95 270.50 227820 177.00 1.09 7.73 .64
12 2x2x3 303.91 290.85 211659 164.45 1.01 7.18 .60
16 4x2x2 311.14 289.58 282359 219.38 1.35 9.58 .60
16 2x4x2 280.57 261.56 315084 244.80 1.50 10.69 .67
16 2x2x4 289.44 272.04 305104 237.05 1.45 10.35 .65
24 4x3x2 321.47 306.12 409177 317.91 1.95 13.89 .58
24 4x2x3 301.44 281.84 437455 339.88 2.09 14.85 .62
24 3x4x2 279.58 263.18 473942 368.23 2.26 16.08 .67
24 3x2x4 315.52 295.54 418151 324.88 1.99 14.19 .59
24 2x4x3 299.11 281.96 443263 344.39 2.11 15.04 .63
24 2x3x4 286.94 267.67 461444 358.52 2.20 15.66 .65
ZEUS-3D (10 Steps)
Speedup/
Processors Wall Clock tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Processor
1 302.59 287.39 18243 14.77 .04 1.00 1.00
2 326.27 301.93 34729 28.12 .08 1.90 .95
4 465.30 424.18 49440 40.03 .11 2.71 .68
8 618.57 540.37 77619 62.85 .18 4.25 .53
12 742.10 625.32 100612 81.47 .23 5.52 .46
WORK IS CONSTANT
GRID: Tile size adjusted to make the full mesh 128 x 128 x 128
ZEUS-MP (10 steps)
Speedup/
Processors Layout Wall Clock tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Processor
1 1x1x1 776.79 764.66 27324 21.23 .13 1.00 1.00
2 1x1x2 405.45 398.34 52456 40.76 .25 1.92 .96
4 1x2x2 191.16 185.20 112779 87.62 .54 4.13 1.03
8 2x2x2 126.93 118.20 176414 140.19 .92 6.50 .81
16 2x2x4 57.16 51.95 400037 310.81 1.91 14.64 .92
ZEUS-3D (10 steps)
Speedup/
Processors Wall Clock tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Processor
1 1063.78 1021.20 16430 13.30 .04 1.00 1.00
2 696.51 658.44 31851 25.79 .07 1.94 .97
3 496.41 462.76 45318 36.70 .10 2.76 .92
4 390.13 354.02 59238 47.97 .13 3.61 .90
5 338.13 303.39 69124 55.97 .16 4.21 .84
6 300.06 267.19 78489 63.55 .18 4.78 .80
7 274.28 241.24 86931 70.39 .20 5.29 .76
8 254.72 220.12 95272 77.14 .21 5.80 .72
9 239.49 203.01 103302 83.65 .23 6.29 .70
10 234.77 178.46 117517 95.16 .27 7.15 .72
11 243.52 166.28 126125 102.13 .28 7.68 .70
12 257.49 159.27 131675 106.62 .30 8.01 .67
13 264.45 141.05 148686 120.39 .34 9.05 .70
14 237.86 136.99 153090 123.96 .35 9.32 .67
15 266.59 151.92 138044 111.78 .31 8.40 .56
16 235.00 143.43 146214 118.39 .33 8.90 .56
Back to Scaling Comparison Main |