Personal tools
You are here: Home Codes ZEUS 3D Shkset
Document Actions

Shkset

by streeter last modified 2007-03-30 04:35
ZEUS-3D Benchmarks for Problem Shkset
  • Problem: Shkset -- 1-D MHD Sod shock tube
  • Geometry: Cartesian XYZ
  • Grid: 800 (except as noted) zones uniformly spaced in the X direction. The other two directions are treated as symmetry axes.
  • Algorithm: van Leer advection, original MoC scheme to evolve magnetic fields
  • Precision: Native single precision was used on all machines (64-bits for Crays, 32-bits for the rest). The job can be run in double precision on most 32-bit machines simply by using the appropriate compiler flags. The use of 64-bit arithmetic on such machines typically slows the calculation by about 10 percent.
  • Data: In the table below, "tused" is the number of CPU seconds used by the master thread in computing the evolution (some ZEUS-3D overhead is excluded). The Zone-Cycles/sec is the number of mesh zones times the number of time steps divided by tused.

Cray Y-MP

The data below were obtained for ZEUS-3D version 3.2.1. The Y-MP data is used below as a standard of comparison for other machines. A few CALMATH library BLAS routines boost performance by several percent. All routines were compiled with: cft77 -ez
                                                               Speedup/
 Processors tused(s) Zone-Cycles/sec   MFLOPS   YMPs   Speedup Processor 
    1        3.6212      183144        178.72   1.00     1.00    1.00 
For 6400 zones, the Y-MP performed 190293 zone-cycles/sec, or 190.18 MFLOPS.


Cray C90

The data below were obtained for ZEUS-3D version 3.2.1. All routines were compiled with: cft77 -ez
                                                               Speedup/
 Processors tused(s) Zone-Cycles/sec   MFLOPS   YMPs   Speedup Processor 
    1        1.6012      414189        403.40   2.26     1.00    1.00 

SGI Power Challenge

The data below was obtained with a version of ZEUS-3D modified by Robert Fiedler to run in parallel on this shared-memory machine. The performance analyzer recognizes the parallelism in the algorithm at the loop level and automatically inserts the appropriate directives. These runs were performed in multi-user mode under a relatively light system load. Single-user mode data should be similar, except for improved performance when the job requires nearly all available processors.

Compiled: f77 -c -O3 -w1 -g3 -pfa list -WK,-ROUNDOFF=3,-SO=3,-AS=L

1) FIXED PROBLEM SIZE

                                                                Speedup/
Procs Zones tused(s) Zone-Cycles/sec   MFLOPS   YMPs   Speedup  Processor 
  1   6400   7.8889       89239         87.40   0.46     1.00     1.00 
  2   6400   4.6746      150603        150.10   0.79     1.68     0.84 
  3   6400   3.4797      202318        201.40   1.06     2.26     0.75 
  4   6400   3.1080      226510        226.10   1.19     2.53     0.63 
  5   6400   2.6594      264717        264.10   1.39     2.96     0.59 
  6   6400   2.4133      291722        290.70   1.53     3.26     0.54 
  7   6400   2.2889      307571        305.90   1.61     3.44     0.49 
  8   6400   2.2591      311623        309.70   1.63     3.49     0.43 
  9   6400   2.1876      321808        321.10   1.69     3.60     0.40 
 10   6400   2.3504      299523        298.30   1.57     3.35     0.33 
 11   6400   2.3048      305455        304.00   1.60     3.42     0.31 
 12   6400   2.4835      283466        281.20   1.48     3.17     0.26 
 13   6400   2.4182      291124        288.80   1.52     3.26     0.25 
 14   6400   2.5368      277519        275.50   1.45     3.10     0.22 
 15   6400   4.4041      159851        159.60   0.84     1.79     0.11 
 16   6400   4.7874      147051        146.30   0.77     1.64     0.10 
2) FIXED AMOUNT OF WORK PER PROCESSOR

The numerical solution varies wildly with the number of zones
(but not the number of processors). Perhaps the MoC algorithm should be used only with 64-bit arithmetic. However, on the SGI Challenge, the solution converges as expected when the number of zones increases.
                                                                Speedup/
Procs Zones tused(s) Zone-Cycles/sec   MFLOPS   YMPs   Speedup  Processor 
  1    400   2.6958       62764         60.80   0.32     1.00     1.00 
  2    800   6.7306       98535         96.90   0.51     1.56     0.78 
  3   1200   20.006      134061        133.00   0.70     2.13     0.71 
  4   1600   20.003      167174        165.30   0.87     2.66     0.66 
  5   2000   20.007      199928        199.50   1.05     3.18     0.63 
  6   2400   1.3618      158614        157.70   0.83     2.52     0.42 
  7   2800   20.008      258473        256.50   1.35     4.11     0.58 
  8   3200   11.714      265806        264.10   1.39     4.23     0.52 
  9   3600   1.4390      197639        195.70   1.03     3.14     0.34 
 10   4000   3.7464      239165        237.50   1.25     3.81     0.38 
 11   4400   1.7949      220625        218.50   1.15     3.51     0.31 
 12   4800   3.6850      257910        256.50   1.35     4.10     0.34 
 13   5200   20.007      287986        286.90   1.51     4.58     0.35 
 14   5600   20.009      274557        273.60   1.44     4.37     0.31 
 15   6000   20.003      176970        174.80   0.92     2.81     0.18 
 16   6400   4.8083      146412        144.40   0.76     2.3      0.14 

SGI Challenge

The data below was obtained with a version of ZEUS-3D modified by Robert Fiedler to run in parallel on this shared-memory machine. The performance analyzer recognizes the parallelism in the algorithm at the loop level and automatically inserts the appropriate directives. These runs were performed in single-user mode. Multi-user mode data obtined under a relatively light system load is essentialy similar. Compile: f77 -c -O2 -pfa list, -WK,-roundoff=2,-AS=L -w1 -g3 -Nq9999

1) FIXED PROBLEM SIZE
 
 Processors   Zones   MFLOPS    Y-MPs   Speedup   Speedup/Processor 
     1         6400    20.16     0.11     1.00          1.00 
     2         6400    38.17     0.20     1.89          0.95 
     3         6400    57.77     0.30     2.87          0.96 
     4         6400    74.69     0.39     3.71          0.93 
     5         6400    87.14     0.46     4.32          0.86 
     6         6400   100.93     0.53     5.01          0.83 
     7         6400   122.49     0.64     6.08          0.87 
     8         6400   126.73     0.67     6.29          0.79 
     9         6400   137.07     0.72     6.80          0.76 
    10         6400   156.54     0.82     7.77          0.78 
    11         6400   161.82     0.85     8.03          0.73 
    12         6400   160.28     0.84     7.95          0.66 
    13         6400   170.71     0.90     8.47          0.65 
    14         6400   190.21     1.00     9.44          0.67 
    15         6400   198.94     1.05     9.87          0.66 
    16         6400   203.11     1.07    10.08          0.63 
    17         6400   197.50     1.04     9.80          0.58 
    18         6400   214.53     1.13    10.64          0.59 
    19         6400   220.55     1.16    10.94          0.58 
    20         6400   207.26     1.09    10.28          0.51 
    21         6400   224.74     1.18    11.15          0.53 
    22         6400   211.21     1.11    10.48          0.48 
    23         6400   222.64     1.17    11.05          0.48 
    24         6400   240.34     1.26    11.92          0.50 
    25         6400   217.03     1.14    10.77          0.43 
    26         6400   247.21     1.30    12.26          0.47 
    27         6400   239.24     1.26    11.87          0.44 
    28         6400   238.62     1.25    11.84          0.42 
    29         6400   227.63     1.20    11.29          0.39 
    30         6400   236.36     1.24    11.73          0.39 
    31         6400   233.87     1.23    11.60          0.37 
    32         6400   225.93     1.19    11.21          0.35 
     
2) FIXED AMOUNT OF WORK PER PROCESSOR

 
 Processors   Zones   MFLOPS    Y-MPs   Speedup   Speedup/Processor 
     1          200    18.38     0.10     1.00          1.00 
     2          400    32.03     0.17     1.74          0.87 
     3          600    46.00     0.24     2.50          0.83 
     4          800    62.04     0.33     3.38          0.84 
     5         1000    64.48     0.34     3.51          0.70 
     6         1200    82.99     0.44     4.52          0.75 
     7         1400    93.13     0.49     5.07          0.72 
     8         1600   104.57     0.55     5.69          0.71 
     9         1800   115.70     0.61     6.30          0.70 
    10         2000   117.83     0.62     6.41          0.64 
    11         2200   128.20     0.67     6.98          0.63 
    12         2400   134.71     0.71     7.33          0.61 
    13         2600   114.63     0.60     6.24          0.48 
    14         2800   157.43     0.83     8.57          0.61 
    15         3000   161.24     0.85     8.77          0.58 
    16         3200   172.30     0.91     9.38          0.59 
    17         3400   181.08     0.95     9.85          0.58 
    18         3600   175.22     0.92     9.54          0.53 
    19         3800   136.40     0.72     7.42          0.39 
    20         4000   182.99     0.96     9.96          0.50 
    21         4200   141.61     0.74     7.71          0.37 
    22         4400   117.05     0.62     6.37          0.29 
    23         4600   214.03     1.13    11.65          0.51 
    24         4800   213.75     1.12    11.63          0.48 
    25         5000   190.21     1.00    10.35          0.41 
    26         5200   158.60     0.83     8.63          0.33 
    27         5400   153.34     0.81     8.34          0.31 
    28         5600   223.41     1.17    12.16          0.43 
    29         5800   173.34     0.91     9.43          0.33 
    30         6000   226.62     1.19    12.33          0.41 
    31         6200   123.13     0.65     6.70          0.22 
    32         6400   242.22     1.27    13.18          0.41 
     

SGI Indigo 2 Extreme

The data below was obtained with ZEUS-3D version 3.2.1.

All routines were compiled with: f77 -c -O2 -w1 -g3 -Nq9999
                                                                Speedup/
 Processors tused(s) Zone-Cycles/sec   MFLOPS   YMPs   Speedup  Processor 
    1        27.760       23891         23.31   0.13     1.00     1.00 

HP 715/80

The data below was obtained with ZEUS-3D version 3.2.1 ported to HP-UX. All routines were compiled with: f77 -c +O3
                                                                Speedup/
 Processors tused(s) Zone-Cycles/sec   MFLOPS   YMPs   Speedup  Processor 
    1        26.090       25420         24.81   0.14     1.00     1.00 

Convex C3880

The data below was obtained with ZEUS-3D version 3.2.1.

All routines were compiled with: fc -c -fi -O2 -nw -or none -db
                                                                Speedup/
 Processors tused(s) Zone-Cycles/sec   MFLOPS   YMPs   Speedup  Processor 
    1        8.2914       79986         78.05   0.44     1.00     1.00 



Back to Zeus-3D Main


Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: