PathScale EKOPath Compiler Suite 是以AMD
Opteron處理器運作的Linux作業系統上具有最高效能的64-bit編譯器,通過測試比較,我們的效能優勢將是顯然易見的!
通常,由您手上的特定源碼所得的測試結果當然會有所差異,但我們相信使用PathScale C、C++及Fortran
編譯器將會為AMD Opteron及Athlon64用戶帶來立即的效益!
你手上有PathScale EKO編譯器的測試結果(benchmark)想要分享給其他用戶知道嗎?請不要忘記告知
Pathscale
Benchmark Bounty Program.
| AM2 atmosphere model code |
 |
This code used at the University of Utah's Meteorology Department for climate research. The code consists of several closely coupled modules and is parallelized with MPI. It is written with Fortran 95 constructs.
Results for this benchmark were run independently at the University of Utah and published with their permission.
|
1CPU
(時間愈少愈好)
|
2CPU |
4CPU |
| PathScale v1.2 |
368.89 sec |
201.88 sec |
99.11 sec |
| PGI v5.2 |
483.45 sec. |
253.38 sec |
135.53 sec |
| % Faster for PathScale |
31.1% |
25.5% |
36.7% |
|
| HimenoBMT "The Performance Evaluation" |
 |
The Pathscale EKOPath Compiler Suite produces the excellent single CPU and 4-CPU OpenMP results using the popular Himeno benchmark:
http://w3cic.riken.go.jp/E/HPC_e/HimenoBMT_e/index_e.html
Serial results on Opteron 2.2 GHz, PC3200
|
F77
|
F90
MFLOPS
|
C |
| PathScaleEKOPath 2.0 |
1584 |
1189 |
267 |
| 64-bit Commercial Compiler |
1419 |
1125 |
141 |
| GNU compilers(3.4.3 & g95) |
1002 |
588 |
213 |
EKOPath Advantage |
| Commercial 64-bit compiler |
+12% |
+6% |
+89% |
| GNU compilers(3.4.3 & g95) |
+58% |
+102% |
+25% |
|
4-thread OpenMP Results on 4-CPU (Microway) 2.2 GHz Opteron, PC3200 server
|
4 thread MFLOPS |
EKOPath Advantage |
| Original Himeno F77 OpenMP code EKOPath 2.0 |
1696 |
|
| Commercial 64-bit compiler |
1691 |
+16% |
| PathScale-modified* Himeno F77 OpenMP code EKOPath 2.0 |
5155 |
|
| Commercial 64-bit compiler |
4309 |
+20% |
|
* System & Compiler Flag & source code Details_
SPEC® and the benchmark names SPECfp® and SPECint® are registered trademarks of the Standard Performance Evaluation Corporation. AMD, AMD Opteron, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Linux is a registered trademark of Linus Torvalds. All other trademarks and company names mentioned are the property of their respective owners.
| POLYHEDRON |
 |
The PathScale EKOPath Compiler Suite produces the fastest and most accurate results for AMD Opteron systems for the Polyhedron 2004 Fortran 90 and Fortran 77 benchmarks.
PathScale is the only compiler vendor with no red squares for 64-bit results (meaning that on no code is 64-bit EKOPath 50% or more slower than the fastest compiler on a benchmark code).
Here are some comparisons obtained from those web pages:
Polyhedron 2004 F77 Benchmarks 64-bit Compiler Comparisons |
|
Geometric mean time in seconds
|
PathScale % Faster |
| PathScale EKOPath 2.1 |
19.16 |
|
| Commercial Compiler A |
25.54 |
+33% |
| Commercial Compiler B |
22.75 |
+19% |
| Commercial Compiler C |
29.26 |
+53% |
|
olyhedron 2004 F90 Benchmarks 64-bit Compiler Comparisons |
|
Geometric mean time in seconds
|
PathScale % Faster |
| PathScale EKOPath 2.1 |
22.76 |
|
| Commercial Compiler A |
26.25 |
+15% |
| Commercial Compiler B |
28.41 |
+25% |
| Commercial Compiler C |
35.30 |
+55% |
|
EKOPath 2.1 64-bit optimization flags:
F77: -O3 -LNO:fu=9 -OPT:div_split:fast_math:fast_sqrt -IPA:plimit=3500
F90: -Ofast -OPT:fast_math=on -WOPT:if_conv=off -LNO:fu=9:full_unroll_size=7000
| QUANTUM MONTE CARLO |
 |
Monte Carlo methods are extremely important in computational physics and related applied fields, and have many diverse applications. PathScale compilers do particularly well in Monte Carlo codes.
Results for this benchmark were run independently at Los Alamos National Laboratory and published with their permission.
|
Time
(時間愈少愈好)
|
PathScale % Faster |
| PathScale v1.0 |
78.08 sec |
|
| PGI v5.1 |
135.80 sec. |
73.93% |
| GCC v3.4.0 |
111.01 sec. |
42.20% |
|
Compiler settings for Quantum Monte Carlo C++ application
PathScale v1.0 : pathCC -64 -ansiE -Ofast -ffast-math
PGI v5.1 : pgCC -Kieee -fastsse -O3 -Minline=levels:10 -Msafeptr=global -Mvect=sse -Mvect=assoc -Mvect=cachesize:1048576 -Mvect=prefetch
GCC v3.4.0 : g++ -O3 -ffast-math -mtune=opteron -mfpmath=sse,387 -mieee-fp -m64
For more information on this benchmark go to: http://sourceforge.net/projects/qmcbeaver .
| SPEC®
CPU2000 |
 |
The PathScale EKOPath Compiler Suite enables the highest performance results for both integer and floating point SPEC CPU2000 speed benchmarks for any AMD64-based LinuxR system. The best evidence for this is that since October 2004 through August 26, 2005, on AMD processors and Linux operating systems, there have been 186 CPU2000 results published at www.spec.org using PathScale compilers and none with other compilers.
Since there are no results with competitive compilers published recently on the SPEC web site, we ran our own comparison to a competitive compiler using latest compilers for each with the following results:
|
SPECint®2000 |
SPECfp®2000 |
| PathScale
EKOPath™ v2.2.1 |
1598 |
1984 |
| PGI® Workstation 6.0-5 |
1269 |
1779 |
| % Faster for PathScale |
+26% |
+12% |
|
Benchmarks were run on a 2.2 Ghz 1-CPU system with DDR400/PC3200 memory. Full details on the compiler flags and configuration used are available here . If anyone can provide us with improved base or peak optimization flags for the competitive compiler, we will be happy to use them and update these results.
Results Published by Our Partners Using the PathScale EKOPath Compiler Suite -- Including Dual-Core Opteron Results
Recently, AMD and HP have chosen to submit SPEC CPU2000 results for dual-core Opteron (for example, Opteron Models 275 and 875) systems with PathScale EKOPath Compilers. Also, IBM, HP, Fujitsu-Siemens, Sun, and AMD continue to choose the PathScale EKOPath Compiler Suite to get the highest level of performance from their AMD64-based Linux® systems.
| SPEC® OMP2001 |
 |
"Sun tested 2-way Sun Fire V20z and 4-way Sun Fire V40z servers using multiple SPEC benchmarks, including the SPECR ompM2001 suite of OpenMPR benchmarks. The PathScale EKOPath Compiler Suite helped Sun's AMDR OpteronR processor-based servers set world records for SPEC ompM2001 on two-processor and four-processor systems. The Sun/PathScale two-processor results were 29 percent faster (footnote 1) than previous-best Linux ompM2001 benchmarks using non-PathScale Fortran and C compilers. This 29 percent advantage, enabled in large part by PathScale compilers, far exceeds the eight percent faster clock rate of the newer Sun systems."
Footnote 1:
(1) About the SPEC OMPM2001 Results Reported Above:
Two-Processor Results: The Sun V40z server with PathScale EKOPath Compiler Suite and 2.6 GHz AMD Opteron CPUs achieved a result of 6486 on a system with two cores, two chips and two threads. This comparison is based on the best performing two-processor Linux servers currently shipping, including previous results with competitor's compiler on a 2.4 GHz Sun Java Workstation W2100z system [SPECompM2001 5085, two cores, two chips, two threads].
Four-Processor Results: The Sun V40z server with PathScale EKOPath Compiler Suite and 2.6 GHz AMD Opteron CPUs achieved a result of 11223 on a system with four cores, four chips and four threads. This comparison is based on the best performing four-processor Linux servers currently shipping, including previous results on a 2.4 GHz Sun V40z system with a non-PathScale compiler [SPECompM2001 8694, four cores, four chips, four threads].
| STREAM |
 |
The Pathscale EKOPath Compiler Suite produces the highest single-CPU and OpenMP Parallel STREAM results for any system powered by AMD CPUs.
OpenMP
Machine
ID (數值愈高愈好) |
ncpus |
COPY |
SCALE |
ADD |
TRIAD |
| AMD_Opteron_848 (EKOPath 2.0) |
4 |
15378 |
15845 |
15618 |
15921 |
| PathScale EKOPath 2.2 |
4 |
16872 |
16932 |
16543 |
16545 |
| % Faster for PathScale 2.2 vs. 2.0 |
|
+8% |
+7% |
+6% |
+4% |
|
Single CPU
Machine ID (數值愈高愈好) |
ncpus |
COPY |
SCALE |
ADD |
TRIAD |
ASUS_SK8N_Opteron248
(EKOPath 2.0) |
1 |
4811 |
4782 |
4685 |
4682 |
ASUS_SK8N_Opteron248
(Comm'l 64bit Compiler) |
1 |
4304 |
4251 |
4497 |
4458 |
| PathScale EKOPath2.2 |
1 |
4902 |
4871 |
4979 |
4987 |
| % Faster for PathScal |
|
+14% |
+15% |
+11% |
+12% |
|
The above results are for STREAM Benchmarks run on Opteron 248 (2.2 Ghz) machines with DDR400 memory and are posted at http://www.cs.virginia.edu/stream/standard/Bandwidth.html . Results for both PathScale EKOPath and our competitor are both identified as as 'ASUS_SK8N_Opteron248' and 'ASUS_SK8N_Opteron248 (1 CPU)'. Click on the data link at the right of those lines for more details on the submission.
* Results with 'PathScale EKOPath 2.1' are on the same system as the 2.0 results on the STREAM web site and use the following optimization flags:
OpenMP: pathf90/pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp
Serial: pathf90/pathcc -O3 -CG:use_prefetchnta .
|