- 1. COSMIC: Middleware for Xeon Phi Servers and Clusters
Pre-commercialization, name subject to changeS Cadambi, G Coviello,
C Li, K Rao, M Sankaradas, S ChakradharComputing Systems
Architecture NEC Laboratories America Princeton, NJ January
2014www.nec-labs.com
2. The Xeon Phi Coprocessor (MIC) Launched by Intel at ISC 2012
x86-based coprocessor with 60+ cores HOST MulticorePCIe60+ cores,
240+ threads 512b vector units 8+GB memory (7120P) Supports OpenMP
Runs Linux: allows multi-processing, memory management Good for
scientific applications 2 3. Xeon Phi Servers and Clusters Fast
ramp-up: Many hardware vendors Many clusters already
commissionedNEC also offers a Xeon Phi serverExpress5800/HR120b-1
Some very high performance ones too! Top500 #1: Tianhe-2 Top500 #7:
Stampede1U form factor with 2 Xeon Phi coprocessors3 4. Managing
Xeon Phi Clusters Most clusters follow an exclusive allocation
policy for the Xeon Phi 1 Phi dedicated to one unique user until
job completes BOBNeeds 1 Xeon PhiHas to wait for Phi to become
availableAMYCHARLIE Needs 1 Xeon PhiACTIVE USERSNeeds 3 Xeon Phis4
node clusterHOSTXEON PHI 60 cores, 8GBHOSTXEON PHI 60 cores,
8GBHOSTXEON PHI 60 cores, 8GBHOSTXEON PHI 60 cores, 8GB 5. Why the
Conservative Policy? Avoids resource oversubscription5 6. What is
Resource Oversubscription? Say Amy and Bob each want to run a
program that uses a single Xeon Phi intermittently (coprocessor
offload model) Do they each need a device, or can they share? AMYS
PROGRAM BeginBOBS PROGRAM BeginXeon PhiHostHostXeon PhiXeon
PhiSHARE?HostEnd HOST PROCESSORXEON PHI COPROCESSOREnd6 7. What is
Resource Oversubscription? First problem of sharing Phi the
programs together oversubscribe hardware threads This can cause
2-3x slowdown! AMYS PROGRAM BeginBOBS PROGRAM BeginXeon
PhiHostHostXeon PhiXeon PhiSHARE?HostEnd HOST PROCESSORXEON PHI
COPROCESSOREnd7 8. What is Resource Oversubscription? Second
problem of sharing Phi the programs can oversubscribe physical
device memory This causes random crashes AMYS PROGRAM BeginBOBS
PROGRAM BeginXeon PhiHostHostXeon PhiXeon PhiSHARE?HostEnd HOST
PROCESSORXEON PHI COPROCESSOREnd8 9. Why the Conservative Policy?
Avoids resource oversubscription Safe no crashes Easier management
BUT9 10. Downsides of Conservative Policy Poorly utilized Xeon Phi
coprocessors Dynamic utilization. Averages around 40%! Only 40% of
cores are doing useful work on average due to intermittent use,
conservative scheduling policy, 10 11. Downsides of Conservative
Policy Need larger cluster than necessary THIS CAN GET
EXPENSIVE!Capital cost Power Maintenance Administration 11 12.
Downsides of Conservative Policy Long wait times if all Xeon Phis
are busy Annoyed users: have to wait even if their jobs are short
Cannot pre-empt running jobs Even though Phis may be underutilized
or intermittently used, they must waitRUNNING PROGRAMS HAVE
OCCUPIED ALL XEON PHIS IN CLUSTERXEON PHI CLUSTER 12 13. COSMIC
Middleware that allows safe Xeon Phi sharing Transparently
discovers resource requirements and schedules jobs to maximally
share Xeon PhisAPPLICATIONSU S E R K E R N E LCOSMIC (invisible to
apps, kernel)LINUXMPSS : MODIFIED LINUX + DRIVERS +HOST
PROCESSORXEON PHI COPROCESSOR13 14. COSMIC lets users share the Phi
AMYS PROGRAM Begin Xeon Phi HostBOBS PROGRAM BeginInstead of making
them wait for each other, COSMIC co-runs them by interspersing host
and Phi portionsXeon Phi HostXeon PhiHostXeon PhiHostHost Xeon Phi
HostXeon PhiEndDevice sharing: users dont wait, better
utilizationEnd14 15. COSMIC also resolves conflicting user
directives WITHOUT COSMIC User 1s Xeon Phi portion User-specified
coreUser 2s Xeon Phi portionaffinity may conflict during sharing
Xeon Phi coresWITH COSMICCOSMIC transparently resolves conflicts
and Xeon spreads Phi load across cores cores15 16. Utilization:
1-device serverAverage Utilization (%)100WITH COSMIC (BLACK)
AVERAGE UTILIZATION 70.6%90 80 70 60 50 4030 20 10 0TimeWITHOUT
COSMIC (BLUE) AVERAGE UTILIZATION 41.7%16 17. Performance: 2-device
server64 jobs, randomly arriving Average Latency (s)Makespan
(s)Average Core UtilizationWithout COSMICWith COSMICWithout
COSMICWith COSMICWithout COSMICWith
COSMIC10991193144123819.9%56.9%Major improvements through device
sharing, load balancing17 18. COSMIC Demo18 19. Easy to Use on
Clusters Easy to interface with third party software Optional
COSMIC cluster component for even better utilization Up to 50%
footprint reduction by Phi sharing! COSMIC CLUSTER COMPONENTCOSMIC
HOSTXEON PHI 60 cores, 8GBTHIRD PARTY CLUSTER MANAGEMENT
SOFTWARECOSMIC HOSTXEON PHI 60 cores, 8GBCOSMIC HOSTXEON PHI 60
cores, 8GBCOSMIC HOSTXEON PHI 60 cores, 8GB19 20. COSMIC Summary We
are ready to engage with beta customers Do you manage Xeon Phi
servers or clusters? Do you use off-the-shelf cluster management
software with exclusive allocation policies? If so, you likely will
benefit from COSMIC Improves Xeon Phi utilization by sharing
Transparent to users Transparent to underlying system software Easy
to add-on to third-party cluster tools20 21. How to Get More Info
Contact us: NEC Japan: Y Hirotani, [email protected] NEC
Labs America: S Cadambi, [email protected] We make onsite
presentations / demos If interested in evaluating COSMIC, just ask
us See our demo online:
http://www.nec-labs.com/research/system/systems_arch-website/cosmic.php21