Personal tools
You are here: Home Codes ZEUS 3D ZEUS-3D Benchmarks for Problem Blast w/MoC Scheme
Document Actions

ZEUS-3D Benchmarks for Problem Blast w/MoC Scheme

by streeter last modified 2007-03-30 04:54
  • CODE: ZEUS-3D version 3.4 or 3.4.1, as noted.
  • PROBLEM: Blast -- the expansion of a hot sphere of plasma into an initially uniform magnetic medium.
  • GEOMETRY: Cartesian XYZ
  • GRID: 32 to 128 zones in each direction centered on the origin. The ratio of neighboring zone dimensions is 1.02. The smallest zones are near the origin.
  • ALGORITHM: van Leer advection, Method of Characteristics scheme to evolve magnetic fields
  • PRECISION: Single precision on Crays (64-bits), DOUBLE PRECISION on others.
  • DATA: In the table below, "tused" is the number of CPU seconds used by the master thread in computing the evolution (some system and ZEUS-3D overhead is excluded). The Zone-Cycles/sec is the number of mesh zones times the number of time steps divided by tused.
  • LIST: This list has performance data tables and graphs (on multiprocessors) for each machine. Click here for a direct comparison of the two SGI multiprocessors and a 4-HYPERnode Convex Exemplar SPP-1000.
Cray C90


Cray UNICOS machines have a hardware performance monitor (hpm), which gives the number of floating point operations per CPU second (FLOPS) performed by a given process. The FLOPS for other machines are determined from C90 FLOPS and the ratios of the Zone-Cycles/sec.

All routines were compiled with: cf77 -c
 
 Processors tused(s) Zone-Cycles/sec   MFLOPS   C90s   Speedup Speedup/Processor 
GRID: 32 x 32 x 32 
    1        2.2169       59125        137.24   1.00     1.00         1.00 
GRID: 64 x 64 x 64 
    1        16.716      109773        252.08   1.00     1.00         1.00 
GRID:128 x 64 x 64 
    1        46.931      122886        288.00   1.00     1.00         1.00 
GRID: 64 x 64 x128 
    1        49.685      116074        268.00   1.00     1.00         1.00 


SGI Power Challenge

  • The EDITOR preprocessor was used to automatically insert parallelization directives above each loop nest in ZEUS-3D.
  • These runs were performed in dedicated mode.
  • Wall clock time resolution is 1 second.
  • Routines DISPLAYR and DISPLAYI were compiled with f77 -c -w1 -g3
  • Routine NMLSTS and the NAMELIST library were compiled with f77 -O2 -w1 -g3 Others: f77 -c -O3 -w1 -g3 -pfa list -WK,-ro=3,-so=3,-o=5,-as=l,-chs=16
 
GRID: 32 x 32 x 32 

                                                                        Speedup/
 Processors  Wall Clock tused(s)Zone-Cycles/sec MFLOPS   C90s   Speedup Processor
      1         7.00     6.02        21772       50.45   0.37     1.00    1.00 
      2         5.00     3.44        38102       88.29   0.64     1.75    0.88 
      3         4.00     2.88        45523      105.48   0.77     2.09    0.70 
      4         3.00     2.33        56267      130.38   0.95     2.58    0.65 
      5         3.00     2.17        60478      140.13   1.02     2.78    0.56 
      6         3.00     1.99        66013      152.96   1.12     3.03    0.51 
      7         2.00     1.87        69964      162.12   1.18     3.21    0.46 
      8         3.00     1.65        79204      183.53   1.34     3.64    0.45 
      9         3.00     1.68        77948      180.62   1.32     3.58    0.40 
     10         3.00     1.52        86112      199.53   1.46     3.96    0.40 
     11         4.00     1.54        84935      196.80   1.44     3.90    0.35 
     12         3.00     1.54        85246      197.53   1.44     3.92    0.33 
     13         3.00     1.57        83225      192.84   1.41     3.82    0.29 
     14         3.00     1.44        90975      210.80   1.54     4.18    0.30 
     15         3.00     1.39        94043      217.91   1.59     4.32    0.29 
     16         3.00     1.39        94573      219.14   1.60     4.34    0.27 
 
GRID: 64 x 64 x 64 
                                                                         Speedup/
Processors  Wall Clock  tused(s) Zone-Cycles/sec  MFLOPS   C90s  Speedup Processor
      1       105.00      96.61        18994       43.60   0.17    1.00    1.00 
      2        57.00      47.74        38440       88.24   0.35    2.02    1.01 
      3        42.00      33.19        55290      126.93   0.50    2.91    0.97 
      4        34.00      23.98        76521      175.67   0.70    4.03    1.01 
      5        34.00      20.35        90180      207.02   0.82    4.75    0.95 
      6        31.00      17.07       107486      246.75   0.98    5.66    0.94 
      7        26.00      15.64       117315      269.31   1.07    6.18    0.88 
      8        28.00      14.18       129412      297.08   1.18    6.81    0.85 
      9        27.00      13.20       139049      319.21   1.27    7.32    0.81 
     10        22.00      12.15       150982      346.60   1.38    7.95    0.79 
     11        25.00      11.30       162364      372.73   1.48    8.55    0.78 
     12        21.00      11.45       160236      367.85   1.46    8.44    0.70 
     13        25.00      10.41       176237      404.58   1.61    9.28    0.71 
     14        25.00      10.41       176325      404.78   1.61    9.28    0.66 
     15        22.00      10.57       173666      398.68   1.58    9.14    0.61 
     16        24.00      10.18       180243      413.77   1.64    9.49    0.59 
 
GRID: 128 x 64 x 64 
                                                                          Speedup/
 Processors  Wall Clock  tused(s) Zone-Cycles/sec MFLOPS   C90s   Speedup Processor
      1        309.00     291.57       19780       46.36   0.16     1.00    1.00 
      2        166.00     148.29       38892       91.15   0.32     1.97    0.98 
      3        125.00     102.48       56275      131.89   0.46     2.85    0.95 
      4         93.00      74.89       77007      180.48   0.63     3.89    0.97 
      5         79.00      61.80       93325      218.72   0.76     4.72    0.94 
      6         72.00      51.42      112161      262.86   0.91     5.67    0.95 
      7         65.00      46.74      123377      289.15   1.00     6.24    0.89 
      8         59.00      41.08      140371      328.98   1.14     7.10    0.89 
      9         56.00      38.65      149235      349.75   1.21     7.54    0.84 
     10         52.00      34.84      165525      387.93   1.35     8.37    0.84 
     11         53.00      32.27      178710      418.83   1.45     9.04    0.82 
     12         50.00      31.36      183911      431.02   1.50     9.30    0.77 
     13         46.00      27.97      206201      483.26   1.68    10.42    0.80 
     14         46.00      28.05      205601      481.85   1.67    10.39    0.74 
     15         47.00      28.79      200349      469.55   1.63    10.13    0.68 
     16         49.00      28.04      205706      482.10   1.67    10.40    0.65 
 
GRID: 64 x 64 x 128 
                                                                         Speedup/
 Processors  Wall Clock  tused(s) Zone-Cycles/sec MFLOPS  C90s   Speedup Processor
      1        331.00     313.05      18422       42.53   0.16     1.00    1.00 
      2        177.00     159.02      36266       83.73   0.31     1.97    0.98 
      3        128.00     109.13      52848      122.02   0.46     2.87    0.96 
      4        102.00      80.25      71860      165.92   0.62     3.90    0.98 
      5         87.00      65.47      88092      203.39   0.76     4.78    0.96 
      6         76.00      55.66     103617      239.24   0.89     5.62    0.94 
      7         68.00      49.94     115472      266.61   0.99     6.27    0.90 
      8         63.00      44.28     130249      300.73   1.12     7.07    0.88 
      9         60.00      40.88     141064      325.70   1.22     7.66    0.85 
     10         59.00      37.41     154164      355.94   1.33     8.37    0.84 
     11         53.00      34.39     167699      387.20   1.44     9.10    0.83 
     12         52.00      32.97     174920      403.87   1.51     9.49    0.79 
     13         50.00      29.98     192373      444.16   1.66    10.44    0.80 
     14         52.00      30.72     187723      433.43   1.62    10.19    0.73 
     15         50.00      29.71     194122      448.20   1.67    10.54    0.70 
     16         49.00      29.77     193712      447.26   1.67    10.52    0.66 


SGI Challenge

  • The data below were obtained with version 3.4.1 of ZEUS-3D.
  • The EDITOR preprocessor was used to automatically insert parallelization directives above each loop nest in ZEUS-3D.
  • These runs were performed in dedicated mode.
  • Wall clock time resolution is 1 second.
  • Compile: f77 -c -O2 -w1 -g3 -Nq9999 -pfa list -WK,-ro=3,-so=3,-as=l,-chs=16
 
GRID: 32 x 32 x 32 
                                                                          Speedup/
 Processors  Wall Clock   tused(s) Zone-Cycles/sec  MFLOPS  C90s  Speedup Processor
      1         27.00      25.76         5087       11.79   0.09    1.00    1.00 
      2         14.00      13.36         9810       22.73   0.17    1.93    0.96 
      3         12.00      11.12        11790       27.32   0.20    2.32    0.77 
      4         10.00       8.02        16351       37.89   0.28    3.21    0.80 
      5          9.00       7.72        16971       39.32   0.29    3.34    0.67 
      6          8.00       6.64        19732       45.72   0.33    3.88    0.65 
      7          8.00       6.11        21453       49.71   0.36    4.22    0.60 
      8          7.00       5.48        23927       55.44   0.40    4.70    0.59 
      9          7.00       5.44        24073       55.78   0.41    4.73    0.53 
     10          7.00       4.89        26829       62.16   0.45    5.27    0.53 
     11          6.00       4.23        30977       71.78   0.52    6.09    0.55 
     12          6.00       4.24        30929       71.67   0.52    6.08    0.51 
 
 
GRID: 64 x 64 x 64 
                                                                         Speedup/
 Processors  Wall Clock  tused(s) Zone-Cycles/sec  MFLOPS  C90s  Speedup Processor
      1         358.00     348.31       5268       12.09   0.05    1.00    1.00 
      2         190.00     179.60      10217       23.45   0.09    1.94    0.97 
      3         143.00     132.70      13828       31.74   0.13    2.62    0.87 
      4         109.00      98.10      18705       42.94   0.17    3.55    0.89 
      5          93.00      82.63      22208       50.98   0.20    4.22    0.84 
      6          81.00      69.66      26342       60.47   0.24    5.00    0.83 
      7          73.00      62.65      29290       67.24   0.27    5.56    0.79 
      8          67.00      55.97      32785       75.26   0.30    6.22    0.78 
      9          63.00      51.91      35352       81.16   0.32    6.71    0.75 
     10          57.00      45.94      39940       91.69   0.36    7.58    0.76 
     11          54.00      42.86      42814       98.29   0.39    8.13    0.74 
     12          52.00      40.92      44839      102.93   0.41    8.51    0.71 
 
GRID: 128 x 64 x 64 
                                                                         Speedup/
 Processors  Wall Clock  tused(s) Zone-Cycles/sec MFLOPS   C90s  Speedup Processor 
      1        1064.00    1047.00       5508       12.91   0.04    1.00    1.00 
      2         559.00     538.31      10714       25.11   0.09    1.94    0.97 
      3         407.00     388.48      14846       34.79   0.12    2.70    0.90 
      4         306.00     282.84      20390       47.79   0.17    3.70    0.93 
      5         261.00     240.09      24020       56.30   0.20    4.36    0.87 
      6         233.00     208.49      27661       64.83   0.23    5.02    0.84 
      7         234.00     186.47      30929       72.49   0.25    5.61    0.80 
      8         159.33     161.62      35683       83.63   0.29    6.48    0.81 
      9         164.00     146.75      39298       92.10   0.32    7.13    0.79 
     10         152.00     133.76      43115      101.05   0.35    7.83    0.78 
     11         139.00     120.92      47695      111.78   0.39    8.66    0.79 
     12         134.00     116.09      49680      116.43   0.40    9.02    0.75 
 
GRID: 64 x 64 x 128 
                                                                        Speedup/
 Processors  Wall Clock  tused(s) Zone-Cycles/sec MFLOPS  C90s  Speedup Processor
      1        1017.00    998.87         5249     12.12   0.05    1.00    1.00 
      2         571.00    549.92        10487     24.21   0.09    1.82    0.91 
      3         411.00    389.34        14813     34.20   0.13    2.57    0.86 
      4         308.00    289.83        19898     45.94   0.17    3.45    0.86 
      5         259.00    241.47        23884     55.15   0.21    4.14    0.83 
      6         232.00    209.46        27533     63.57   0.24    4.77    0.79 
      7         203.00    180.58        31938     73.74   0.28    5.53    0.79 
      8         180.00    157.39        36643     84.60   0.32    6.35    0.79 
      9         165.00    143.30        40245     92.92   0.35    6.97    0.77 
     10         158.00    135.36        42607     98.37   0.37    7.38    0.74 
     11         143.00    122.18        47201    108.98   0.41    8.18    0.74 
     12         133.00    112.10        51447    118.79   0.44    8.91    0.74 


Convex Exemplar SPP-1000

  • The data below were obtained with version 3.4.1 of ZEUS-3D, which calls system routines that time a single thread. Isom Crawford's Fortran-callable interface to the thread timing routines is available here.
  • This 4-HYPERnode system was configured with one HYPERnode devoted to processing one batch job at a time.
  • For the 128 x 64 x 64 grid, the job was run on up to 16 threads. These data are plotted on the comparison with the SGI multiprocessors.
  • All routines were compiled with: fc -c -nw -O3 -or none.
  • Parallelization directives were placed in the code just before many loop nests to ensure that they run concurrently.
 
GRID: 32 x 32 x 32 
                                                                       Speedup/
 Processors Wall Clock  tused(s)Zone-Cycles/sec MFLOPS   C90s Speedup  Processor 
      1      28.34      25.44        5153       11.94    .09   1.00     1.00 
      2      15.47      13.16        9959       23.08    .17   1.93      .97 
      3      11.49       9.26       14161       32.81    .24   2.75      .92 
      4       9.62       6.93       18913       43.82    .32   3.67      .92 
      5       8.56       6.25       20957       48.56    .35   4.07      .81 
      6       7.89       5.45       24054       55.74    .41   4.67      .78 
      7       7.79       5.33       24570       56.93    .42   4.77      .68 
      8       7.76       4.84       27069       62.72    .46   5.25      .66 

GRID: 64 x 64 x 64 Speedup/ Processors Wall Clock tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Processor 1 379.23 355.74 5158 11.84 .05 1.00 1.00 2 205.73 183.35 10008 22.98 .09 1.94 .97 3 149.87 128.45 14286 32.80 .13 2.77 .92 4 117.44 96.15 19085 43.81 .17 3.70 .92 5 101.14 79.68 23030 52.87 .21 4.46 .89 6 88.87 67.31 27264 62.59 .25 5.29 .88 7 82.77 60.30 30432 69.86 .28 5.90 .84 8 74.49 51.84 35399 81.26 .32 6.86 .86
GRID: 128 x 64 x 64 Speedup/ Processors Wall Clock tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Processor 1 1027.65 984.32 5859 13.73 .05 1.00 1.00 2 541.09 500.90 11514 26.98 .09 1.97 .98 3 386.95 347.09 16616 38.94 .14 2.84 .95 4 298.43 259.71 22206 52.04 .18 3.79 .95 5 251.12 212.82 27099 63.51 .22 4.63 .93 6 219.82 181.82 31720 74.34 .26 5.41 .90 7 203.91 163.76 35217 82.54 .29 6.01 .86 8 183.57 139.85 41238 96.65 .34 7.04 .88 9 212.36 165.06 34941 81.89 .28 6.00 .67 10 205.04 146.80 39286 92.07 .32 7.08 .71 11 171.90 128.96 44722 104.81 .36 7.68 .70 12 176.18 128.26 44965 105.38 .37 7.72 .64 13 174.11 108.90 52959 124.12 .43 9.54 .73 14 183.47 121.12 47614 111.59 .39 8.58 .61 15 174.43 110.76 52068 122.03 .42 9.38 .63 16 170.88 107.25 53771 126.02 .44 9.68 .61
GRID: 64 x 64 x 128 Speedup/ Processors Wall Clock tused(s) Zone-Cycles/sec MFLOPS C90s Speedup Processor 1 1085.98 1041.40 5538 12.79 .05 1.00 1.00 2 569.62 526.17 10961 25.31 .09 1.98 .99 3 404.27 363.49 15866 36.63 .14 2.87 .96 4 313.72 274.23 21030 48.56 .18 3.80 .95 5 264.40 225.66 25557 59.01 .22 4.61 .92 6 234.73 194.69 29622 68.39 .26 5.35 .89 7 209.53 170.40 33845 78.14 .29 6.11 .87 8 189.91 149.48 38581 89.08 .33 6.97 .87
HP 715/80

  • The data below were obtained with ZEUS-3D version 3.4. This workstation has 32MB of memory and can handle up to about 64 * 32 * 32 zones without paging to disk very much.
  • All routines were compiled with: f77 -c +O3
 
GRID: 32 x 32 x 32 
 Processors tused(s) Zone-Cycles/sec   MFLOPS   C90s   Speedup Speedup/Processor 
    1        23.480        5582         12.33   0.09     1.00         1.00 
Convex C3880

  • The data below were obtained with ZEUS-3D version 3.4.
  • All routines were compiled with: fc -c -fi -O2 -nw -or none -cxdb
 
 Processors tused(s) Zone-Cycles/sec   MFLOPS   C90s   Speedup Speedup/Processor 
GRID: 32 x 32 x 32 
    1        10.270       12762         28.77   0.21     1.00         1.00 
GRID: 64 x 64 x 64 
    1        131.54       13950         30.24   0.12     1.00         1.00 
GRID:128 x 64 x 64 
    1        453.65       12712         28.80   0.10     1.00         1.00 
GRID: 64 x 64 x128 
    1        407.86       14140         32.16   0.12     1.00         1.00 
IBM RS/6000 Model 530 (128MB)

  • The data below were obtained with ZEUS-3D version 3.4.1.
  • All routines were compiled with: xlf -c -O3
 
GRID: 32 x 32 x 32 
 Processors tused(s) Zone-Cycles/sec   MFLOPS   C90s   Speedup Speedup/Processor 
    1         49.45        2651          6.14   0.04     1.00         1.00 


Back to ZEUS-3D Main


Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: