Cray T90
|
Cray UNICOS machines have a hardware performance monitor (hpm), which gives
the number of floating point operations per CPU second (FLOPS) performed
by a given process. All routines were compiled with: cft77 -ez
ZEUS-MP (10 steps)
GRID: 32 x 32 x 32 per processor (tile) (10 steps)
Processors tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Speedup/Processor
1 1.89 171032 156.40 1.42 1.00 1.00
GRID: 64 x 64 x 64 per processor (tile) (10 steps)
Processors tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Speedup/Processor
1 9.15 280357 222.54 1.47 1.00 1.00
GRID:128 x 64 x 64 per processor (tile) (10 steps)
Processors tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Speedup/Processor
1 16.37 312858 243.52 1.49 1.00 1.00
ZEUS-3D (10 steps)
GRID: 32 x 32 x 32 per processor (tile) (10 steps)
Processors tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Speedup/Processor
1 1.14 287836 274.04 1.48 1.00 1.00
GRID: 64 x 64 x 64 is full mesh (10 steps)
Processors tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Speedup/Processor
1 4.90 640854 457.68 1.88 1.00 1.00
GRID:128 x 64 x 64 per processor (tile) (10 steps)
Processors tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Speedup/Processor
1 7.44 704890 592.11 1.68 1.00 1.00
COMMENT: The following scaling study is for fixed total work. The ZEUS-3D
EDITOR preprocessor inserted parallelization directives. Fortran
routines compiled with cf77 -Zp -Wf"-ez". Light system load. The
speedup is the average concurrency reported by the job accounting
utility "ja".
Speedup/
Processors Wall Clock tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Processor
1 7.48 7.4594 703057 590.15 1.68 1.00 1.00
2 4.05 7.6290 1328777 1115.38 3.17 1.89 0.95
3 3.43 7.5399 1553755 1304.23 3.71 2.21 0.74
4 2.64 7.6126 2038865 1711.44 4.87 2.90 0.73
Back to Scaling Comparison Main |