High-Performance Quantum Simulation: A challenge to Schr ö dinger equation on 256^4 grids. * Toshiyuki Imamura 13 今村俊幸 , Thanks to Susumu Yamada 23 , Takuma Kano 2 , and Masahiko Machida 23 1. UEC (University of Electro-Communications 電気通信大学 ) 2. CCSE JAEA (Japan Atomic Energy Agency) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
High-Performance High-Performance Quantum Simulation: A Quantum Simulation: A challenge to Schrchallenge to Schröödinger dinger equation on 256^4 gridsequation on 256^4 grids
**Toshiyuki ImamuraToshiyuki Imamura13 13 今村俊幸今村俊幸 , , Thanks to Susumu YamadaThanks to Susumu Yamada2323,,
Takuma KanoTakuma Kano22, and Masahiko Machida, and Masahiko Machida2323
1.1. UEC (University of Electro-Communications UEC (University of Electro-Communications 電気通信大電気通信大学学 ))
Lanczos Lanczos (Traditional method)(Traditional method) Krylov+GSKrylov+GS : Simple, but shift+invert version is needed: Simple, but shift+invert version is needed
Costly! Since the block is updated at Costly! Since the block is updated at every iteration, MV operation is also every iteration, MV operation is also required!!required!!
1*MV / every iteration
3*MV / every iteration
Other Difficulties in implementationOther Difficulties in implementation• Breakdown of linear independencyBreakdown of linear independency make our own DSYGV using LDL and deflation (not Cholesky)make our own DSYGV using LDL and deflation (not Cholesky)• Growth of numerical error in {W,X,P}Growth of numerical error in {W,X,P} detect numerical error and recalculate them automaticallydetect numerical error and recalculate them automatically• Choice of the shiftChoice of the shift• Portability Portability
Core implementation is MATRIX-VECTOR mult.Core implementation is MATRIX-VECTOR mult. 3-level parallelism is carefully done in our implementation.3-level parallelism is carefully done in our implementation. In Inter-node parallelization, communication pipelining is used. In Inter-node parallelization, communication pipelining is used. In the Rayleigh-Ritz part, SCALAPACK is used.In the Rayleigh-Ritz part, SCALAPACK is used.
LOBPCG
do l=1,256 :: inter-node parallelisminter-node parallelism do k=1,256 :: inter-node parallelisminter-node parallelism do j=1,256 :: intra-node (thread) parallelismintra-node (thread) parallelism do i=1,256 :: vectorizationvectorization w(i,j,k,l)=a(i,j,k,l)*v(i,j,k,l)& +b*(v(i+1,j,k,l)+ ・・・ ) +c*(v(i+1,j+1,k,l)+ ・・・ ) enddo enddo enddo enddo
Collective MQT in Intrinsic Josephson JunctionCollective MQT in Intrinsic Josephson Junctions via parallel computing on ESs via parallel computing on ES Direct Quantum Simulation (4-Junctions)Direct Quantum Simulation (4-Junctions) Quantum (Sychronus) vs Classical (Localized)Quantum (Sychronus) vs Classical (Localized) Quantum Assisted SynchronizationQuantum Assisted Synchronization