Cray T3D
|
These runs were performed with a version of ZEUS-MP tuned
for the T3D by unrolling loops 4 times (by hand or directive) in the subroutines
that perform advection in 1-D sweeps. It is always the "i" loop that gets
unrolled, even if it is not the inner loop. This reduces cache misses
because all 4 words in each cache line are used before being overwritten.
All routines were compiled with: cf77 -c -C cray-t3d -I/usr/local/mpp/include -Wf"-o unroll1" All timings are Wall Clock seconds -- the T3D lacks the CPU timer "second". These tests were run under the NQS batch system.
GRID: 32 x 32 x 32 per processor (tile) (10 steps)
Zone-Cycles/ Speedup/
Processors Layout Wall Clock tused(s) sec MFLOPS C90s Speedup Processor
1 1x1x1 13.27 13.19 24689 22.60 0.21 1.00 1.00
2 1x1x2 13.32 13.24 49192 45.04 0.41 1.99 1.00
4 1x2x2 13.42 13.34 97636 89.39 0.81 3.95 0.99
8 2x2x2 13.53 13.44 193785 177.42 1.61 7.85 0.98
16 2x2x4 13.56 13.47 386625 353.98 3.22 15.66 0.98
32 2x4x4 13.66 13.56 767387 702.59 6.39 31.08 0.97
64 4x4x4 13.74 13.63 1526440 1397.56 12.71 61.83 0.97
128 4x4x8 13.98 13.85 3000890 2747.52 24.98 121.55 0.95
256 4x8x8 13.99 13.86 5996400 5490.11 49.91 242.88 0.95
Back to Pure Hydro Main |