U.C. Berkeley and LBNL U.C. Berkeley and LBNL • Global address space languages support • Global pointers and distributed arrays • User controls layout of data across nodes • Direct read and write to remote memory • Single Program Multiple Data (SPMD) control • Similar to using threads, but with remote accesses • Global synchronization, barriers • Languages: UPC, Co-Array Fortran, Titanium • GASNet - A common communication system tailored for global address space languages Distributed Data Structures Latency Performance GASNet Goals • Language-independence: Compatibility with several global-address space languages and compilers • UPC, Titanium, Co-array Fortran, possibly others.. • Hide language- or compiler-specific details, such as shared-pointer representation • Hardware-independence: variety of parallel architectures & OS's • SMP: Linux/UNIX SMP's, Origin 2000, etc. • Clusters of uniprocessors or SMP's: IBM SP, Compaq AlphaServer, Linux/UNIX clusters, etc. • Support many high-performance networks: MPI, Myrinet/GM, Quadrics/elan, IBM/LAPI, Infiniband • Ease of implementation on new hardware • Allow quick prototype implementations • Implementations can leverage performance features of hardware • Provide both portability & performance GASNet Core API Global Address Space Languages GASNet Extended API Bandwidth Performance Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Kathy Yelick Parry Husbands, Costin Iancu, Mike Welcome, Kathy Yelick Compiler-generated code Compiler-specific runtime system GASNet Extended API GASNet Core API Network Hardware • Wider interface that includes more complicated operations • We provide a reference implementation of the extended API in terms of the core API • Implementors can choose to directly implement any subset for performance - leverage hardware support for higher-level operations • Most basic required network primitives • Implemented directly on each platform • Minimal set of network functions needed to support a working implementation • General enough to implement everything else • Based heavily on active messages paradigm • Provides powerful extensibility mechanism G ASN etPut/G etLatency (m in overm sg sz) 0 5 10 15 20 25 30 35 40 mpi- refext elan- refext elan- elan mpi- refext elan- refext elan- elan mpi- refext gm -gm mpi- refext gm -gm m icroseconds put_nb get_nb quadrics -falcon quadrics -lem ieux myrinet-millennium myrinet-alvarez G ASN etP ut/G etB ulk B andwidth (m ax overm sg sz) 0 50 100 150 200 250 300 mpi- refext elan- refext elan- elan mpi- refext elan- refext elan- elan mpi- refext gm -gm mpi- refext gm -gm M B/sec put_nb_bulk get_nb_bulk quadrics -falcon quadrics -lem ieux myrinet-millennium myrinet-alvarez G ASNetM yrinet/G M Bandw idth 0 20 40 60 80 100 120 140 160 180 200 0 10000 20000 30000 40000 50000 60000 70000 size (bytes) M B/sec get_bulk (blocking) get_bulk (non-blocking) put_bulk (blocking) put_bulk (non-blocking) G ASNetM yrinet/G M Latency 0 10 20 30 40 50 60 1 10 100 1000 10000 size (bytes) m icroseconds get(blocking) get(non-blocking) put(blocking) put(non-blocking) G ASNetQ uadrics/elan B andw idth 0 50 100 150 200 250 300 0 10000 20000 30000 40000 50000 60000 70000 size (bytes) M B/sec get_bulk (blocking) get_bulk (non-blocking) put_bulk (blocking) put_bulk (non-blocking) G ASNetQ uadrics/elan Latency 0 2 4 6 8 10 12 14 16 18 20 1 10 100 1000 10000 size (bytes) m icroseconds get(blocking) get(non-blocking) put(blocking) put(non-blocking) http:// http:// upc.nersc.gov upc.nersc.gov [email protected] [email protected]