Top Banner
A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington Avenue Boston, MA 02115 USA {gene, xindong}@ccs.neu.edu Jointed with Geant4 Team John Apostolakis Supported by Openlab program Sverre Jarp
31

A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Jan 18, 2016

Download

Documents

Karin Lindsey
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

A Thread-Parallel Geant4 with Shared Geometry

Gene Cooperman and Xin DongCollege of Computer and Information Science

Northeastern University360 Huntington Avenue

Boston, MA 02115USA

{gene, xindong}@ccs.neu.edu

Jointed with Geant4 TeamJohn Apostolakis

…Supported by Openlab program

Sverre Jarp

Page 2: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Outline

• Concept

• Methodology

• Implementation

Page 3: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Memory layout for multiple threads

• TLS: thread local storage• At compile time, for any static data declared using __thread, the

compiler will reserve space in the TLS of each new thread that is created.

Page 4: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

TLS syntax and effect

• static type variable -> static __thread type variable• (global) type variable -> (global) __thread type variable• extern type variable -> extern __thread type variableEach thread initializes and holds its own data

Page 5: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

First implementation: data replicated for each thread

• Image size is huge because of multiple copies of data

Page 6: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Outline

• Concept• Methodology

• Implementation

Page 7: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Multi-threaded Geant4: current implementation

• Data that is not changed by ProcessOneEvent should be shared.

Page 8: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Three questions for the shared data model

1. Which data can be safely shared? – Data initialized dynamically. – Geant4 source code does not explicitly declare shared

data.

2. How do we share the data?– Each instance may contain read-only data members

(sharable) and read-write data members– For read-write data members (unshared), C++ does not

allow __thread if the data member is non-static.

3. What is the correct way to initialize the worker thread? – Shared data is allocated and initialized by main (master)

thread.– Workers make thread-private copies of read-write data

members.

Page 9: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

1. Which data can be safely shared?

• Expand ProcessOneEvent until variable access. Unavailable.– Complicated inheritance relationship– Virtual methods

• Use valgrind to check memory accesses dynamically at runtime.– valgrind --tool=helgrind a.out for checking data races– If two threads pass through and change the same variable

without adequate locking, this tool issues an error message.– In the case of fullCMS, it is not practical to check how many

data is changed by ProcessOneEvent.– Use unit tests for each module of Geant4 -- especially for the

case geometry and navigation.

Page 10: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

2. How do we share the data?

An example – class G4PVReplica

1 2 3 4 50 . . .

1 2 3 4 50 . . .

1 2 3 4 50 . . .

G4PVReplica instance

copyno: 0, 1, 2, 3, 4, 5…

thread worker 1

thread worker 2

1

4

Physical volumes:

Page 11: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Multi-threaded Geant4: first implem.

No shared instances; each instance has a unique copyno

Page 12: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Multi-threaded Geant4: current implem.

Shared G4PVReplica instances; each thread sees private copyno

Page 13: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

3. What is the correct way to initialize the worker thread?

• For main (master) thread case, initialize data in the standard way.• The worker thread begins initialization only after main thread has finished

its own initialization.• For worker thread case

– Run manager skips some initialization routines. For example, it skips construct method of detector construction class.

– The worker thread initialize thread-private data only. (For example copyno in the case of G4PVReplica.)

Page 14: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Outline

• Concept

• Methodology• Implementation

Page 15: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

TestG4Navigation1.cc with multiple threads

G4VPhysicalVolume *myTopNode;int sleepTime = 10;

void *my_worker_thread1(void *waitTime_ptr){ //wait until the first thread finish sleep(*(int *)waitTime_ptr); testG4Navigator1(myTopNode); testG4Navigator2(myTopNode); //sleep forever, so valgrind can analyze it sleep(sleepTime);}

Page 16: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

TestG4Navigation1.cc (continued)

int main(){ myTopNode=BuildGeometry(); // Build the geometry G4GeometryManager::GetInstance()->CloseGeometry(false);

pthread_create( &tid1, NULL, my_worker_thread1, &waitTime1); pthread_create( &tid2, NULL, my_worker_thread1, &waitTime2); pthread_join(tid1, NULL); pthread_join(tid2, NULL);}

Page 17: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Start and analyze output

• Start– valgrind --tool=helgrind --log-file=testG4Navigator1output

testG4Navigator1• Analyze output example 1

– ==538== Possible data race during write of size 4 at 0x56360A0– ==538== at 0x42B944: G4PVReplica::SetCopyNo(int)

(G4PVReplica.cc:180)– ==538== by 0x4191E7:

G4ParameterisedNavigation::LevelLocate(G4NavigationHistory&, G4VPhysicalVolume const*, int, CLHEP::Hep3Vector const&, CLHEP::Hep3Vector const*, bool, CLHEP::Hep3Vector&) (G4ParameterisedNavigation.cc:636)

– ==538== Old state: owned exclusively by thread #2– ==538== New state: shared-modified by threads #2, #3– ==538== Reason: this thread, #3, holds no locks at all

Page 18: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Start and analyze output (continued)

• Analyze output example 2– ==538== Possible data race during write of size 8 at 0x5635F68– ==538== at 0x415218: G4LogicalVolume::SetSolid(G4VSolid*)

(G4LogicalVolume.icc:217)– ==538== by 0x419201:

G4ParameterisedNavigation::LevelLocate(G4NavigationHistory&, G4VPhysicalVolume const*, int, CLHEP::Hep3Vector const&, CLHEP::Hep3Vector const*, bool, CLHEP::Hep3Vector&) (G4ParameterisedNavigation.cc:641)

– ==538== Old state: shared-readonly by threads #2, #3– ==538== New state: shared-modified by threads #2, #3– ==538== Reason: this thread, #3, holds no consistent locks– ==538== Location 0x5635F68 has never been protected by any lock

Page 19: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Start and analyze output (continued)

• Analyze output example 3– ==538== Possible data race during write of size 8 at

0x5634E18– ==538== at 0x40B1FD: G4Box::SetXHalfLength(double)

(G4Box.cc:118)– ==538== by 0x407E6D:

G4LinScale::ComputeDimensions(G4Box&, int, G4VPhysicalVolume const*) const (testG4Navigator1.cc:67)

– ==538== Old state: owned exclusively by thread #2– ==538== New state: shared-modified by threads #2, #3– ==538== Reason: this thread, #3, holds no locks at all

Page 20: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Shared instances in geometry

Only these three geometry classes are currently shared

• Physical volumes– G4VPhysicalVolume

• Thread private data members: G4RotationMatrix *frot; G4ThreeVector ftrans;– G4PVReplica

• Thread private data members: G4int fcopyNo;• Logical volumes

– Thread private data members: G4Material* fMaterial; G4VSolid* fSolid; G4MaterialCutsCouple* fCutsCouple; G4VSensitiveDetector* fSensitiveDetector; G4Region* fRegion;

• Solids– We may need more copies for each solid used by G4Parameterised.

Page 21: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Share logical volumes: step 1

ADD A NEW CLASS

ADDED:class G4LogicalVolumePrivateData{ public: G4Material* fMaterial; G4VSolid* fSolid; G4MaterialCutsCouple* fCutsCouple; G4VSensitiveDetector* fSensitiveDetector; G4Region* fRegion;};

class G4LogicalVolume{…}

In class G4LogicalVolume, delete all thread private data members.

Page 22: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Share logical volumes: step 2

CREATE NEW CLASS

class G4LogicalVolumeObjectCounter{public: PrivateObjectManager* shadowOffset; //shadow pointer for offset static __thread PrivateObjectManager* offset; int AddNew() {...} void WorkerCopy() {...} void FreeWorker() {...}}

Page 23: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Share logical volumes: step 3

ADD TWO DATA MEMBERS TO G4LogicalVolume static G4LogicalVolumeObjectCounter G4LogicalVolume::objectCounter; int G4LogicalVolume::objectOrder;

MODIFY ALL CONSTRUCTORS OF G4LogicalVolumeG4LogicalVolume::G4LogicalVolume(…) { objectOrder = objectCounter.AddNew(); //allocatePrivateData … //initialize in similar way to constructor …}

Page 24: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Share logical volumes: step 4

Redefine the read-write data members to make them thread- private

#define fMaterial (objectCounter.offset[objectOrder]->fMaterial)

We create a new static, thread local array: objectCounter.offset.

objectOrder is the unique instance ID described in the concept slides.

Page 25: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Worker logical volumes: step 5

Worker starts after master has initialized all data.1. When a worker starts, it copies offset content from main thread using method WorkerCopy() of G4LogicalVolumeObjectCounter 2. For each logical volume, call worker constructor to

allocate memory space for thread-private datainitialize them.

3. In some cases, thread-private data is constant and can be shared by all threads. Then, one just skips the above step.

Page 26: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Share logical volumes final results

Page 27: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Share physical volumes

Page 28: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

The solid for a G4Parameterised instance

Page 29: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Physics tables: the other large consumer of memory in Geant4

Page 30: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Questions?

Page 31: A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington.

Thank you!