TSUBAME2.5 to 3.0 and Convergence with Extreme Big Data Satoshi Matsuoka Professor Global Scientific Information and Computing (GSIC) Center Tokyo Institute of Technology Fellow, Association for Computing Machinery (ACM) Rakuten Technology Conference 2013 2013/10/26 Tokyo, Japan
82
Embed
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Big Data
Rakuten Technology Conference 2013 "TSUBAME2.5 to 3.0 and Convergence with Extreme Big Data" Satoshi Matsuoka Professor Global Scientific Information and Computing (GSIC) Center Tokyo Institute of Technology Fellow, Association for Computing Machinery (ACM)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
TSUBAME2.5 to 3.0 and Convergence with Extreme
Big DataSatoshi Matsuoka
ProfessorGlobal Scientific Information and Computing (GSIC) Center
Tokyo Institute of TechnologyFellow, Association for Computing Machinery (ACM)
Rakuten Technology Conference 20132013/10/26Tokyo, Japan
Supercomputers from the Past
Fast, Big, Special, Inefficient, Evil device to conquer the world…
Let us go back to the mid ’70sBirth of “microcomputers” and arrival of
commodity computing (start of my career)• Commodity 8-bit CPUs…
• Lead to hobbyist computing…– Evaluation boards: Intel SDK-80,
Motorola MEK6800D2, MOS Tech. KIM-1, (in Japan) NEC TK-80, Fujitsu Lkit-8, …
– System Kits: MITS Altair 8800/680b, IMSAI 8080, Proc. Tech. SOL-20, SWTPC 6800, …
• & Lead to early personal computers– Commodore PET, Tandy TRS-80,
Apple II– (in Japan): Hitachi Basic Master,
NEC CompoBS / PC8001, Fujitsu FM-8, …
Supercomputing vs. Personal Computing in the late 1970s.• Hitachi Basic Master
(1978)– “The first PC in Japan”– Motorola 6802--1Mhz,
16KB ROM, 16KB RAM– Linpack in BASIC: Approx.
70-80 FLOPS (1/1,000,000)• We got “simulation” done
(in assembly language)– Nintendo NES (1982)
• MOS Technology 6502 1Mhz (Same as Apple II)
– “Pinball” by Matsuoka & Iwata (now CEO Nintendo)• Realtime dynamics +
collision + lots of shortcuts• Average ~a few KFLOPS
Cf. Cray-1 (1976)Linpack
80-90MFlops(est.)
Running Linpack 10
Then things got accelerated around the mid 80s to mid 90s
(rapid commoditization towards what we use now)• PC CPUs: Intel 8086/286/386/486/Pentium (Superscalar&fast FP
x86), Motorola 68000/020/030/040, … to Xeons, GPUs, Xeon Phi’s– C.f. RISCs: SPARC, MIPS, PA-RISC, IBM Power, DEC Alpha, …
• Storage Evolution: Cassettes, Floppies to HDDs, optical disk to Flash• Network Evolution: RS-232C to Ethernet now to FDR Infinininband• PC (incl. I/O): IBM PC “Clones” and Macintoshes: ISA to VLB to PCIe• Software Evolution: CP/M to MS-DOS to Windows, Linux, • WAN Evolution: RS-232+Modem+BBS to Modem+Internet to
ISDN/ADSL/FTTH broadband, DWDM Backbone, LTE, …• Internet Evolution: email + ftp to Web, Java, Ruby, …
• Then Clusters, Grid/Clouds, 3-D Gaming, and Top500 all started in the mid 90s(!), and commoditized supercomputing
NEC Confidential
Modern Day Supercomputers
Now supercomputers “look like” IDC servers
High-End COTS dominate
Linux based machine with standard + HPC OSS Software Stack
“Greenest Production Supercomputer in the World”the Green 500Nov. 2010, June 2011(#4 Top500 Nov. 2010)
TSUBAME Wins Awards…
3 times more power efficient than a laptop!
ACM Gordon Bell Prize 20112.0 Petaflops Dendrite Simulation
TSUBAME Wins Awards…
Special Achievements in Scalability and Time-to-Solution“Peta-Scale Phase-Field Simulation for Dendritic Solidification on the TSUBAME 2.0 Supercomputer”
Commendation for Sci &Tech by Ministry of Education 2012
(文部科学大臣表彰)Prize for Sci & Tech, Development CategoryDevelopment of Greenest Production Peta-scale Supercomputer
Satoshi Matsuoka, Toshio Endo, Takayuki Aoki
TSUBAME Wins Awards…
Precise Bloodflow Simulation of Artery on TSUBAME2.0
Checkpointing, Fault Prediction, Hybrid Algorithms• Scientific “Extreme” Big Data – Ultra Fast I/O, Hadoop
Acceleration, Large Graphs• New memory systems – Pushing the envelops of low
Power vs. Capacity vs. BW, exploit the deep hierarchy with new algorithms to decrease Bytes/Flops
• Post Petascale Programming – OpenACC and other many-core programming substrates, Task Parallel
• Scalable Algorithms for Many Core –Apps/System/HW Co-Design
モデルと実測の Bayes 的融合
• Bayes モデルと事前分布
• n 回実測後の事後予測分布
),(-Inv~
)/,(~,|
),(~
200
220
22
2
v
xN
Ny
i
iTiii
iii
nTiiimnn
niTinnn
nninininiii
xynyy
ynxnn
tyyyyn
/)()(
/)(,,
)/,(~),,,(|
20
2200
2000
12
21
モデルによる所要時間の推定
所要時間の実測データ
imi yn
y 1
!ABCLib$ static select region start!ABCLib$ parameter (in CacheS, in NB, in NPrc) !ABCLib$ select sub region start!ABCLib$ according estimated!ABCLib$ (2.0d0*CacheS*NB)/(3.0d0*NPrc)
対象1(アルゴリズム1)
!ABCLib$ select sub region end!ABCLib$ select sub region start!ABCLib$ according estimated!ABCLib$ (4.0d0*ChcheS*dlog(NB))/(2.0d0*NPrc)
対象2(アルゴリズム2)
!ABCLib$ select sub region end!ABCLib$ static select region end
1
実行起動前自動チューニング指定、アルゴリズム選択処理の指定
コスト定義関数で使われる入力変数
コスト定義関数
対象領域1、2
ABCLibScript: アルゴリズム選択
JST‐CREST “Ultra Low Power (ULP)‐HPC” Project 2007‐2012
MRAMPRAMFlashetc.
Ultra Multi-CoreSlow & Parallel
(& ULP)ULP‐HPC SIMD‐
Vector(GPGPU, etc.)
ULP‐HPCNetworks
Power Optimize using Novel Componentsin HPC
Power‐Aware and Optimizable Applications
Perf. ModelAlgorithms
0
Low Power High Perf Model
0
x10 Power Efficiencty
Optimization Point
Power Perf
x1000Improvement in 10 years
Auto‐Tuning for Perf. & Power
Aggressive Power Saving in HPCMethodologies Enterprise/Business
Clouds HPCServer Consolidation Good NG!
DVFS(Dynamic Voltage/Frequency
Scaling)Good Poor
New Devices Poor(Cost & Continuity)
GoodNew HW &SW Architecture
Poor(Cost & Continuity)
Good
Novel Cooling Limited(Cost & Continuity)
Good(high thermal density)
How do we achive x1000?Process Shrink x100
XMany-Core GPU Usage x5
XDVFS & Other LP SW x1.5
XEfficient Cooling x1.4
x1000 !!!
ULP-HPCProject2007-12
Ultra GreenSupercomputingProject 2011-15
Statistical Power Modeling of GPUs[IEEE IGCC10]
i
n
iicp
1
• Prevents overtraining by ridge regression• Determines optimal parameters by cross fitting
Average power consumption
GPU performance counters • Estimates GPU power consumption GPU statistically
• Linear regression model using performance counters as explanatory variables
Power meter withhigh resolution
High accuracy(Avg Err 4.7%)
Accurate even with DVFSFuture: Model‐based power opt.
Linear model shows sufficient accuracy
Possibility of optimization of Exascale systems with O(10^8) processors
Power Efficiency in Denderite Applicationson TSUBAME1.0 thru JST‐CREST ULPHPC Prototype running Gordon Bell Denderite App
Infrastructure Technologies Towards Yottabyte/Year
Principal InvesigatorSatoshi Matsuoka
Global Scientific Information and Computing Center
Tokyo Institute of Technolgoy
The current “Big Data” are not really that Big…
• Typical “real” definition: “Mining people’s privacy data to make money”
• Corporate data are usually in data warehoused silo -> limited volume, in Gigabytes~Terabytes, seldom Petabytes.
• Processing involve simple O(n) algorithms, or those that can be accelerated with DB-inherited indexingalgorithms
• Executed on re-purposed commodity “web” servers linked with 1Gbps networks running Hadoop/HDFS
• Vicious cycle of stagnation in innovations…• NEW: Breaking down of Silos ⇒ Convergence
with Supercomputing with Extreme Big Data
But “Extreme Big Data” will change everything
• “Breaking down of Silos” (Rajeeb Harza, Intel VP of Technical Computing)
• Already happening in Science & Engineering due to Open Data movement
• More complex analysis algorithms: O(n log n), O(m x n), …
• Will become the NORM for competitiveness reasons.
We will have tons of unknown genes
• Directly sequencing uncultured microbiomes obtained from target environment and analyzing the sequence data– Finding novel genes from unculturable microorganism– Elucidating composition of species/genes of environments
Human body
Sea
Gut microbiome
Examples of microbiome
Soil
Metagenome analysis
62
[Slide Courtesy Yutaka Akiyama @ Tokyo Tech.]
Results from Akiyama group@Tokyo TechUltra high‐sensitive “big data” metagenomesequence analysis of human oral microbiome
63歯列の内側 歯列の外側 歯垢
Metabolic Pathway Map
‐ Required > 1 million node*hour product on K‐computer‐ World’s most sensitive sequence analysis (based on amino acid similarity matrix)
‐ Discovered at least three microbiome clusters with functional differences.(Integrated 422 experiment samples taken from 9 different oral parts)
572.8 M Reads / hour 82,944 node (663,552 Cores)K‐computer (2012)
Extreme Big Data in Genomics
Lincoln Stein, Genome Biology, vol. 11(5), 2010
Sequencing data (bp)/$becomes x4000 per 5 yearsc.f., HPC x33 in 5 years
Impact of new generation sequencers
64
[Slide Courtesy Yutaka Akiyama @ Tokyo Tech.]
Extremely “Big” Graphs• Large scale graphs in various fields
– US Road network : 58 million edges– Twitter follow‐ship : 1.47 billion edges– Neuronal network : 100 trillion edges
Towards Continuous Billion-Scale Social Simulation with Real-Time Streaming Data (Toyotaro Suzumura/IBM-Tokyo Tech) Applications
– Target Area: Planet (Open Street Map) – 7 billion people
Input Data – Road Network (Open Street Map) for
Planet: 300 GB (XML) – Trip data for 7 billion people
• 10 KB (1 trip) x 7 billion = 70 TB– Real-Time Streaming Data (e.g. Social
sensor, physical data)
Simulated Output for 1 Iteration
– 700 TB
Graph500 “Big Data” BenchmarkKronecker graph BSP Problem
A: 0.57, B: 0.19C: 0.19, D: 0.05
November 15, 2010Graph 500 Takes Aim at a New Kind of HPCRichard Murphy (Sandia NL => Micron)“ I expect that this ranking may at times look very different from the TOP500 list. Cloud architectures will almost certainly dominate a major chunk of part of the list.”
Reality: Top500 Supercomputers DominateNo Cloud IDCs at all
TSUBAME2.0 #3(Nov.2011) #4(Jun.2012)
A Major Northern Japanese Cloud Datacenter (2013)
Juniper EX8208 Juniper EX8208
2 zone switches (Virtual Chassis)
Juniper EX4200
Zone (700 nodes)
Juniper EX4200
Juniper EX4200
Zone (700 nodes)
Juniper EX4200
Juniper EX4200
Zone (700 nodes)
Juniper EX4200
Juniper MX480 Juniper MX480
10GbE10GbE
10GbE
10GbELACP
the Internet
8 zones, Total 5600 nodes, Injection 1GBps/NodeBisection 160Gigabps
– Tsubame 2.5 Number 1 in Japan, 17 Petaflops SFP– Template for future supercomputers and IDC machines
• TSUBAME3.0 Early 2016– New supercomputing leadership– Tremendous power efficiency, extreme big data,
extremely high reliability• Lots of background R&D for TSUBAME3.0 and
towards Exascale– Green Computing: ULP-HPC & TSUBAME-KFC– Extreme Big Data – Convergence of HPC and IDC!– Exascale Resilience– Programming with Millions of Cores– …