Personal tools
You are here: Home Codes ZEUS 3D Cray C90
Document Actions

Cray C90

by streeter last modified 2007-03-30 04:37
Cray UNICOS machines have a hardware performance monitor (hpm), which gives the number of floating point operations per CPU second (FLOPS) performed by a given process. The FLOPS for other machines are determined from C90 FLOPS and the ratios of the Zone-Cycles/sec.

The FLOPS increases somewhat with problem size, e.g. by 13 percent in going from a 32-cubed problem to a 64-cubed problem. The FLOPS for a 128 x 64 x 64 problem is only about 2 percent greater than it is for a 64-cubed run. For ZEUS-MP, a 128-cubed problem cannot be run interactively due to memory restrictions, so I have used the FLOPS/(Zone-Cycles/sec) conversion factor from the 64-cubed run.

Note that although ZEUS-MP uses algorithms conceptually the same as those in ZEUS-3D and practically every loop vectorizes, the C90 runs ZEUS-3D twice as fast as it runs the current version of ZEUS-MP. Remarkably, the CPU time spent in loops with strides greater than 1 is more than double the amount for a similar loop with stride 1 (the stride depends on the direction of the sweep in advection substeps).

All routines were compiled with: cft77 -ez
ZEUS-MP (10 steps) 
 
GRID: 32 x 32 x 32 per processor (tile) 
 Processors tused(s) Zone-Cycles/sec   MFLOPS   C90s   Speedup Speedup/Processor 
    1         2.69       120144        109.54   1.00     1.00         1.00 
 
GRID: 64 x 64 x 64 per processor (tile) (10 steps) 
 Processors tused(s) Zone-Cycles/sec   MFLOPS   C90s   Speedup Speedup/Processor 
    1        13.50       190408        150.96   1.00     1.00         1.00 
 
GRID:128 x 64 x 64 per processor (tile) (10 steps) 
 Processors tused(s) Zone-Cycles/sec   MFLOPS   C90s   Speedup Speedup/Processor 
    1        24.41       210263        163.65   1.00     1.00         1.00 
 
 
 
ZEUS-3D (10 steps) 
 
GRID: 32 x 32 x 32 per processor (tile) 
 Processors tused(s) Zone-Cycles/sec   MFLOPS   C90s   Speedup Speedup/Processor 
    1         1.68       195137        184.42   1.00     1.00         1.00 
 
GRID: 64 x 64 x 64 per processor (tile) (10 steps) 
 Processors tused(s) Zone-Cycles/sec   MFLOPS   C90s   Speedup Speedup/Processor 
    1         7.70       340437        289.53   1.00     1.00         1.00 
 
GRID:128 x 64 x 64 per processor (tile) (10 steps) 
 Processors tused(s) Zone-Cycles/sec   MFLOPS   C90s   Speedup Speedup/Processor 
    1        12.52       418737        350.14   1.00     1.00         1.00 

Back to Scaling Comparison Main


Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: