Top Banner
IAP09 CUDA@MIT / 6.963 Supercomputing on your desktop: Programming the next generation of cheap and massively parallel hardware using CUDA Lecture 03 CUDA Basics #2 - Nicolas Pinto (MIT) Tuesday, January 13, 2009
127

IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

Nov 29, 2014

Download

Education

npinto

More at http://sites.google.com/site/cudaiap2009 and http://pinto.scripts.mit.edu/Classes/CUDAIAP2009

Note that some slides were borrowed from Matthew Bolitho (John Hopkins) and NVIDIA.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

IAP09 CUDA@MIT / 6.963

Supercomputing on your desktop:Programming the next generation of cheap

and massively parallel hardware using CUDA

Lecture 03

CUDA Basics #2-

Nicolas Pinto (MIT)

Tuesday, January 13, 2009

Page 2: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

During this course,

we’ll try to

and use existing material ;-)

“ ”

adapted for 6.963

Tuesday, January 13, 2009

Page 3: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

Todayyey!!

Tuesday, January 13, 2009

Page 4: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

LanguageCompilation

APIThreading Model

Memory Model

IAP09 CUDA@MIT / 6.963

Tuesday, January 13, 2009

Page 5: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

CUDA Language

IAP09 CUDA@MIT / 6.963

Tuesday, January 13, 2009

Page 6: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

G

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! $--47+%834.3,22'3+%04%',+)-9%24:'%';)+0)*.%

<4&'%04%!"#$

! ='++'*+%-',3*)*.%</3:'

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! !"#$%&$'()*$'+',,$-%../0/12$.0"3$$

&241-40-$'+',,5

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! >9*0,<0)<%';0'*+)4*+?

! #'<-,3,0)4*%@/,-)()'3+

! A/)-0B)*%C,3),D-'+

! A/)-0B)*%E98'+

! F;'</0)4*%!4*()./3,0)4*

! #'<-+8'< G%&'<-,3,0)4*%+8'<)()'3 5%&'<-,3,0)4*%

H/,-)()'3

! $%24&)()'3%,88-)'&%04%&'<-,3,0)4*+%4(?

! C,3),D-'+

! I/*<0)4*+

! F;,28-'+?%%!"#$%J%&'%&(#J%$%)%*!

! !"#$%/+'+%01'%(4--47)*.%&'<-,3,0)4*%

H/,-)()'3+%(43%:,3),D-'+?

! ++,&-*!&++

! ++$.)(&,++

! ++!"#$%)#%++

! K*-9%,88-9%04%.-4D,-%:,3),D-'+

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 7: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

G

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! $--47+%834.3,22'3+%04%',+)-9%24:'%';)+0)*.%

<4&'%04%!"#$

! ='++'*+%-',3*)*.%</3:'

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! !"#$%&$'()*$'+',,$-%../0/12$.0"3$$

&241-40-$'+',,5

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! >9*0,<0)<%';0'*+)4*+?

! #'<-,3,0)4*%@/,-)()'3+

! A/)-0B)*%C,3),D-'+

! A/)-0B)*%E98'+

! F;'</0)4*%!4*()./3,0)4*

! #'<-+8'< G%&'<-,3,0)4*%+8'<)()'3 5%&'<-,3,0)4*%

H/,-)()'3

! $%24&)()'3%,88-)'&%04%&'<-,3,0)4*+%4(?

! C,3),D-'+

! I/*<0)4*+

! F;,28-'+?%%!"#$%J%&'%&(#J%$%)%*!

! !"#$%/+'+%01'%(4--47)*.%&'<-,3,0)4*%

H/,-)()'3+%(43%:,3),D-'+?

! ++,&-*!&++

! ++$.)(&,++

! ++!"#$%)#%++

! K*-9%,88-9%04%.-4D,-%:,3),D-'+

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 8: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

G

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! $--47+%834.3,22'3+%04%',+)-9%24:'%';)+0)*.%

<4&'%04%!"#$

! ='++'*+%-',3*)*.%</3:'

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! !"#$%&$'()*$'+',,$-%../0/12$.0"3$$

&241-40-$'+',,5

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! >9*0,<0)<%';0'*+)4*+?

! #'<-,3,0)4*%@/,-)()'3+

! A/)-0B)*%C,3),D-'+

! A/)-0B)*%E98'+

! F;'</0)4*%!4*()./3,0)4*

! #'<-+8'< G%&'<-,3,0)4*%+8'<)()'3 5%&'<-,3,0)4*%

H/,-)()'3

! $%24&)()'3%,88-)'&%04%&'<-,3,0)4*+%4(?

! C,3),D-'+

! I/*<0)4*+

! F;,28-'+?%%!"#$%J%&'%&(#J%$%)%*!

! !"#$%/+'+%01'%(4--47)*.%&'<-,3,0)4*%

H/,-)()'3+%(43%:,3),D-'+?

! ++,&-*!&++

! ++$.)(&,++

! ++!"#$%)#%++

! K*-9%,88-9%04%.-4D,-%:,3),D-'+

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 9: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

G

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! $--47+%834.3,22'3+%04%',+)-9%24:'%';)+0)*.%

<4&'%04%!"#$

! ='++'*+%-',3*)*.%</3:'

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! !"#$%&$'()*$'+',,$-%../0/12$.0"3$$

&241-40-$'+',,5

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! >9*0,<0)<%';0'*+)4*+?

! #'<-,3,0)4*%@/,-)()'3+

! A/)-0B)*%C,3),D-'+

! A/)-0B)*%E98'+

! F;'</0)4*%!4*()./3,0)4*

! #'<-+8'< G%&'<-,3,0)4*%+8'<)()'3 5%&'<-,3,0)4*%

H/,-)()'3

! $%24&)()'3%,88-)'&%04%&'<-,3,0)4*+%4(?

! C,3),D-'+

! I/*<0)4*+

! F;,28-'+?%%!"#$%J%&'%&(#J%$%)%*!

! !"#$%/+'+%01'%(4--47)*.%&'<-,3,0)4*%

H/,-)()'3+%(43%:,3),D-'+?

! ++,&-*!&++

! ++$.)(&,++

! ++!"#$%)#%++

! K*-9%,88-9%04%.-4D,-%:,3),D-'+

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 10: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

G

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! $--47+%834.3,22'3+%04%',+)-9%24:'%';)+0)*.%

<4&'%04%!"#$

! ='++'*+%-',3*)*.%</3:'

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! !"#$%&$'()*$'+',,$-%../0/12$.0"3$$

&241-40-$'+',,5

! !"#$%&'()*'+%,%-,*./,.'%01,0%)+%+)2)-,3%04%

!5!66

! >9*0,<0)<%';0'*+)4*+?

! #'<-,3,0)4*%@/,-)()'3+

! A/)-0B)*%C,3),D-'+

! A/)-0B)*%E98'+

! F;'</0)4*%!4*()./3,0)4*

! #'<-+8'< G%&'<-,3,0)4*%+8'<)()'3 5%&'<-,3,0)4*%

H/,-)()'3

! $%24&)()'3%,88-)'&%04%&'<-,3,0)4*+%4(?

! C,3),D-'+

! I/*<0)4*+

! F;,28-'+?%%!"#$%J%&'%&(#J%$%)%*!

! !"#$%/+'+%01'%(4--47)*.%&'<-,3,0)4*%

H/,-)()'3+%(43%:,3),D-'+?

! ++,&-*!&++

! ++$.)(&,++

! ++!"#$%)#%++

! K*-9%,88-9%04%.-4D,-%:,3),D-'+

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 11: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

&

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1(+$,-%$(3"3,&4

! 5%'($/6")/3"(,6()*"("1)/&"(%77$/#%)/,1

! 8##"''/-$"(),(%$$(9:;()*&"%0'! 8##"''/-$"(),()*"(<:;(./%(8:=

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1('*%&"0(3"3,&4

! 5%'($/6")/3"(,6()*"()*&"%0(-$,#>

! 8##"''/-$"(),(%$$()*&"%0'?(,1"(#,74(7"&()*&"%0(-$,#>

! =6(1,)(0"#$%&"0(%'(!"#$%&#'?(&"%0'(6&,3(

0/66"&"1)()*&"%0'(%&"(1,)(./'/-$"(@1$"''(%(

'41#*&,1/A%)/,1(-%&&/"&(@'"0

! B,)(%##"''/-$"(6&,3(<:;

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1(#,1')%1)(3"3,&4

! 5%'($/6")/3"(,6("1)/&"(%77$/#%)/,1

! 8##"''/-$"(),(%$$(9:;()*&"%0'(C&"%0(,1$4D! 8##"''/-$"(),(<:;(./%(8:=(C&"%0EF&/)"D

! <;!8(@'"'()*"(6,$$,F/1+(0"#$'7"#' 6,&(

.%&/%-$"'G

! (()'!&*'((

! ((+",%((

! ((-#".$#((

! !"#$%&"'()*%)(%(6@1#)/,1(/'(#,37/$"0(),?(%10(

"H"#@)"'(,1()*"(0"./#"

! <%$$%-$"(,1$4(6&,3(%1,)*"&(6@1#)/,1(,1()*"(

0"./#"

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 12: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

&

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1(+$,-%$(3"3,&4

! 5%'($/6")/3"(,6()*"("1)/&"(%77$/#%)/,1

! 8##"''/-$"(),(%$$(9:;()*&"%0'! 8##"''/-$"(),()*"(<:;(./%(8:=

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1('*%&"0(3"3,&4

! 5%'($/6")/3"(,6()*"()*&"%0(-$,#>

! 8##"''/-$"(),(%$$()*&"%0'?(,1"(#,74(7"&()*&"%0(-$,#>

! =6(1,)(0"#$%&"0(%'(!"#$%&#'?(&"%0'(6&,3(

0/66"&"1)()*&"%0'(%&"(1,)(./'/-$"(@1$"''(%(

'41#*&,1/A%)/,1(-%&&/"&(@'"0

! B,)(%##"''/-$"(6&,3(<:;

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1(#,1')%1)(3"3,&4

! 5%'($/6")/3"(,6("1)/&"(%77$/#%)/,1

! 8##"''/-$"(),(%$$(9:;()*&"%0'(C&"%0(,1$4D! 8##"''/-$"(),(<:;(./%(8:=(C&"%0EF&/)"D

! <;!8(@'"'()*"(6,$$,F/1+(0"#$'7"#' 6,&(

.%&/%-$"'G

! (()'!&*'((

! ((+",%((

! ((-#".$#((

! !"#$%&"'()*%)(%(6@1#)/,1(/'(#,37/$"0(),?(%10(

"H"#@)"'(,1()*"(0"./#"

! <%$$%-$"(,1$4(6&,3(%1,)*"&(6@1#)/,1(,1()*"(

0"./#"

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 13: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

&

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1(+$,-%$(3"3,&4

! 5%'($/6")/3"(,6()*"("1)/&"(%77$/#%)/,1

! 8##"''/-$"(),(%$$(9:;()*&"%0'! 8##"''/-$"(),()*"(<:;(./%(8:=

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1('*%&"0(3"3,&4

! 5%'($/6")/3"(,6()*"()*&"%0(-$,#>

! 8##"''/-$"(),(%$$()*&"%0'?(,1"(#,74(7"&()*&"%0(-$,#>

! =6(1,)(0"#$%&"0(%'(!"#$%&#'?(&"%0'(6&,3(

0/66"&"1)()*&"%0'(%&"(1,)(./'/-$"(@1$"''(%(

'41#*&,1/A%)/,1(-%&&/"&(@'"0

! B,)(%##"''/-$"(6&,3(<:;

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1(#,1')%1)(3"3,&4

! 5%'($/6")/3"(,6("1)/&"(%77$/#%)/,1

! 8##"''/-$"(),(%$$(9:;()*&"%0'(C&"%0(,1$4D! 8##"''/-$"(),(<:;(./%(8:=(C&"%0EF&/)"D

! <;!8(@'"'()*"(6,$$,F/1+(0"#$'7"#' 6,&(

.%&/%-$"'G

! (()'!&*'((

! ((+",%((

! ((-#".$#((

! !"#$%&"'()*%)(%(6@1#)/,1(/'(#,37/$"0(),?(%10(

"H"#@)"'(,1()*"(0"./#"

! <%$$%-$"(,1$4(6&,3(%1,)*"&(6@1#)/,1(,1()*"(

0"./#"

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 14: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

&

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1(+$,-%$(3"3,&4

! 5%'($/6")/3"(,6()*"("1)/&"(%77$/#%)/,1

! 8##"''/-$"(),(%$$(9:;()*&"%0'! 8##"''/-$"(),()*"(<:;(./%(8:=

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1('*%&"0(3"3,&4

! 5%'($/6")/3"(,6()*"()*&"%0(-$,#>

! 8##"''/-$"(),(%$$()*&"%0'?(,1"(#,74(7"&()*&"%0(-$,#>

! =6(1,)(0"#$%&"0(%'(!"#$%&#'?(&"%0'(6&,3(

0/66"&"1)()*&"%0'(%&"(1,)(./'/-$"(@1$"''(%(

'41#*&,1/A%)/,1(-%&&/"&(@'"0

! B,)(%##"''/-$"(6&,3(<:;

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1(#,1')%1)(3"3,&4

! 5%'($/6")/3"(,6("1)/&"(%77$/#%)/,1

! 8##"''/-$"(),(%$$(9:;()*&"%0'(C&"%0(,1$4D! 8##"''/-$"(),(<:;(./%(8:=(C&"%0EF&/)"D

! <;!8(@'"'()*"(6,$$,F/1+(0"#$'7"#' 6,&(

.%&/%-$"'G

! (()'!&*'((

! ((+",%((

! ((-#".$#((

! !"#$%&"'()*%)(%(6@1#)/,1(/'(#,37/$"0(),?(%10(

"H"#@)"'(,1()*"(0"./#"

! <%$$%-$"(,1$4(6&,3(%1,)*"&(6@1#)/,1(,1()*"(

0"./#"

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 15: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

&

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1(+$,-%$(3"3,&4

! 5%'($/6")/3"(,6()*"("1)/&"(%77$/#%)/,1

! 8##"''/-$"(),(%$$(9:;()*&"%0'! 8##"''/-$"(),()*"(<:;(./%(8:=

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1('*%&"0(3"3,&4

! 5%'($/6")/3"(,6()*"()*&"%0(-$,#>

! 8##"''/-$"(),(%$$()*&"%0'?(,1"(#,74(7"&()*&"%0(-$,#>

! =6(1,)(0"#$%&"0(%'(!"#$%&#'?(&"%0'(6&,3(

0/66"&"1)()*&"%0'(%&"(1,)(./'/-$"(@1$"''(%(

'41#*&,1/A%)/,1(-%&&/"&(@'"0

! B,)(%##"''/-$"(6&,3(<:;

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1(#,1')%1)(3"3,&4

! 5%'($/6")/3"(,6("1)/&"(%77$/#%)/,1

! 8##"''/-$"(),(%$$(9:;()*&"%0'(C&"%0(,1$4D! 8##"''/-$"(),(<:;(./%(8:=(C&"%0EF&/)"D

! <;!8(@'"'()*"(6,$$,F/1+(0"#$'7"#' 6,&(

.%&/%-$"'G

! (()'!&*'((

! ((+",%((

! ((-#".$#((

! !"#$%&"'()*%)(%(6@1#)/,1(/'(#,37/$"0(),?(%10(

"H"#@)"'(,1()*"(0"./#"

! <%$$%-$"(,1$4(6&,3(%1,)*"&(6@1#)/,1(,1()*"(

0"./#"

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 16: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

&

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1(+$,-%$(3"3,&4

! 5%'($/6")/3"(,6()*"("1)/&"(%77$/#%)/,1

! 8##"''/-$"(),(%$$(9:;()*&"%0'! 8##"''/-$"(),()*"(<:;(./%(8:=

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1('*%&"0(3"3,&4

! 5%'($/6")/3"(,6()*"()*&"%0(-$,#>

! 8##"''/-$"(),(%$$()*&"%0'?(,1"(#,74(7"&()*&"%0(-$,#>

! =6(1,)(0"#$%&"0(%'(!"#$%&#'?(&"%0'(6&,3(

0/66"&"1)()*&"%0'(%&"(1,)(./'/-$"(@1$"''(%(

'41#*&,1/A%)/,1(-%&&/"&(@'"0

! B,)(%##"''/-$"(6&,3(<:;

! !"#$%&"'()*%)(%(+$,-%$(.%&/%-$"(/'('),&"0(,1(

)*"(0"./#"

! 2*"(0%)%(&"'/0"'(/1(#,1')%1)(3"3,&4

! 5%'($/6")/3"(,6("1)/&"(%77$/#%)/,1

! 8##"''/-$"(),(%$$(9:;()*&"%0'(C&"%0(,1$4D! 8##"''/-$"(),(<:;(./%(8:=(C&"%0EF&/)"D

! <;!8(@'"'()*"(6,$$,F/1+(0"#$'7"#' 6,&(

.%&/%-$"'G

! (()'!&*'((

! ((+",%((

! ((-#".$#((

! !"#$%&"'()*%)(%(6@1#)/,1(/'(#,37/$"0(),?(%10(

"H"#@)"'(,1()*"(0"./#"

! <%$$%-$"(,1$4(6&,3(%1,)*"&(6@1#)/,1(,1()*"(

0"./#"

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 17: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

G

! !"#$%&"'()*%)(%(+,-#)./-(.'(#/01.$"2()/(%-2(

"3"#,)"'(/-()*"(*/')

! 4%$$%5$"(/-$6(+&/0(%-/)*"&()*"(*/')

! 7,-#)./-'(8.)*/,)(%-6(49!:(2"#$'1"# %&"(

*/')(56(2"+%,$)

! 4%-(,'"(!!"#$%!! %-2(!!&'()*'!!+

)/;")*"&

! !"#$%&"'()*%)(%(+,-#)./-(.'(#/01.$"2()/(%-2(

"3"#,)"'(/-()*"(2"<.#"

! 4%$$%5$"(+&/0()*"(*/')

! 9'"2(%'()*"("-)&6(1/.-)(+&/0(*/')()/(2"<.#"

! 49!:(1&/<.2"'(%('")(/+(5,.$)=.-(<"#)/&()61"'>

! *",-./+0*",-./+*",-1/+0*",-1/+*",-2/+0*",-2/+*",-3/+0*",-3/+

! $"#-%./+0$"#-%./+$"#-%1/+0$"#-%1/+$"#-%2/+0$"#-%2/+$"#-%3/+0$"#-%3/

! )4%./+0)4%./+)4%1/+0)4%1/+)4%2/+0)4%2/+)4%3/+0)4%3/+

! 5#46./+05#46./+5#461/+05#461/+5#462/+05#462/+5#463/+05#463/+

! 75#,%./+75#,%1/+75#,%2/+75#,%3+

! 4%-(#/-')&,#)(%(<"#)/&()61"(8.)*('1"#.%$(

+,-#)./->

8,9'!!"#$%&'(%):(;/+(.!"#$

! 4%-(%##"''("$"0"-)'(/+(%(<"#)/&()61"(8.)*(

!"#$%&!"'$%&!"($%&!")$*

('*(,-<=

! &)82 .'(%('1"#.%$(<"#)/&()61"

! ?%0"(%'(0)4%2@("3#"1)(#%-(5"(#/-')&,#)"2(

+&/0(%('#%$%&()/(+/&0(%(<"#)/&>

:$*,5,-/+./+.>

! 49!:(1&/<.2"'(+/,&(;$/5%$@(5,.$)=.-(<%&.%5$"'

! %"-',&?&=@(@5#*9?&=@(@5#*9A)8@(

6-)&A)8

! +',-.&/0&/&1&)822&34&10)4%22&

! :##"''.5$"(/-$6(+&/0(2"<.#"(#/2"

! 4%--/)()%A"(%22&"''

! 4%--/)(%''.;-(<%$,"

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 18: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

G

! !"#$%&"'()*%)(%(+,-#)./-(.'(#/01.$"2()/(%-2(

"3"#,)"'(/-()*"(*/')

! 4%$$%5$"(/-$6(+&/0(%-/)*"&()*"(*/')

! 7,-#)./-'(8.)*/,)(%-6(49!:(2"#$'1"# %&"(

*/')(56(2"+%,$)

! 4%-(,'"(!!"#$%!! %-2(!!&'()*'!!+

)/;")*"&

! !"#$%&"'()*%)(%(+,-#)./-(.'(#/01.$"2()/(%-2(

"3"#,)"'(/-()*"(2"<.#"

! 4%$$%5$"(+&/0()*"(*/')

! 9'"2(%'()*"("-)&6(1/.-)(+&/0(*/')()/(2"<.#"

! 49!:(1&/<.2"'(%('")(/+(5,.$)=.-(<"#)/&()61"'>

! *",-./+0*",-./+*",-1/+0*",-1/+*",-2/+0*",-2/+*",-3/+0*",-3/+

! $"#-%./+0$"#-%./+$"#-%1/+0$"#-%1/+$"#-%2/+0$"#-%2/+$"#-%3/+0$"#-%3/

! )4%./+0)4%./+)4%1/+0)4%1/+)4%2/+0)4%2/+)4%3/+0)4%3/+

! 5#46./+05#46./+5#461/+05#461/+5#462/+05#462/+5#463/+05#463/+

! 75#,%./+75#,%1/+75#,%2/+75#,%3+

! 4%-(#/-')&,#)(%(<"#)/&()61"(8.)*('1"#.%$(

+,-#)./->

8,9'!!"#$%&'(%):(;/+(.!"#$

! 4%-(%##"''("$"0"-)'(/+(%(<"#)/&()61"(8.)*(

!"#$%&!"'$%&!"($%&!")$*

('*(,-<=

! &)82 .'(%('1"#.%$(<"#)/&()61"

! ?%0"(%'(0)4%2@("3#"1)(#%-(5"(#/-')&,#)"2(

+&/0(%('#%$%&()/(+/&0(%(<"#)/&>

:$*,5,-/+./+.>

! 49!:(1&/<.2"'(+/,&(;$/5%$@(5,.$)=.-(<%&.%5$"'

! %"-',&?&=@(@5#*9?&=@(@5#*9A)8@(

6-)&A)8

! +',-.&/0&/&1&)822&34&10)4%22&

! :##"''.5$"(/-$6(+&/0(2"<.#"(#/2"

! 4%--/)()%A"(%22&"''

! 4%--/)(%''.;-(<%$,"

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 19: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

G

! !"#$%&"'()*%)(%(+,-#)./-(.'(#/01.$"2()/(%-2(

"3"#,)"'(/-()*"(*/')

! 4%$$%5$"(/-$6(+&/0(%-/)*"&()*"(*/')

! 7,-#)./-'(8.)*/,)(%-6(49!:(2"#$'1"# %&"(

*/')(56(2"+%,$)

! 4%-(,'"(!!"#$%!! %-2(!!&'()*'!!+

)/;")*"&

! !"#$%&"'()*%)(%(+,-#)./-(.'(#/01.$"2()/(%-2(

"3"#,)"'(/-()*"(2"<.#"

! 4%$$%5$"(+&/0()*"(*/')

! 9'"2(%'()*"("-)&6(1/.-)(+&/0(*/')()/(2"<.#"

! 49!:(1&/<.2"'(%('")(/+(5,.$)=.-(<"#)/&()61"'>

! *",-./+0*",-./+*",-1/+0*",-1/+*",-2/+0*",-2/+*",-3/+0*",-3/+

! $"#-%./+0$"#-%./+$"#-%1/+0$"#-%1/+$"#-%2/+0$"#-%2/+$"#-%3/+0$"#-%3/

! )4%./+0)4%./+)4%1/+0)4%1/+)4%2/+0)4%2/+)4%3/+0)4%3/+

! 5#46./+05#46./+5#461/+05#461/+5#462/+05#462/+5#463/+05#463/+

! 75#,%./+75#,%1/+75#,%2/+75#,%3+

! 4%-(#/-')&,#)(%(<"#)/&()61"(8.)*('1"#.%$(

+,-#)./->

8,9'!!"#$%&'(%):(;/+(.!"#$

! 4%-(%##"''("$"0"-)'(/+(%(<"#)/&()61"(8.)*(

!"#$%&!"'$%&!"($%&!")$*

('*(,-<=

! &)82 .'(%('1"#.%$(<"#)/&()61"

! ?%0"(%'(0)4%2@("3#"1)(#%-(5"(#/-')&,#)"2(

+&/0(%('#%$%&()/(+/&0(%(<"#)/&>

:$*,5,-/+./+.>

! 49!:(1&/<.2"'(+/,&(;$/5%$@(5,.$)=.-(<%&.%5$"'

! %"-',&?&=@(@5#*9?&=@(@5#*9A)8@(

6-)&A)8

! +',-.&/0&/&1&)822&34&10)4%22&

! :##"''.5$"(/-$6(+&/0(2"<.#"(#/2"

! 4%--/)()%A"(%22&"''

! 4%--/)(%''.;-(<%$,"

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 20: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

G

! !"#$%&"'()*%)(%(+,-#)./-(.'(#/01.$"2()/(%-2(

"3"#,)"'(/-()*"(*/')

! 4%$$%5$"(/-$6(+&/0(%-/)*"&()*"(*/')

! 7,-#)./-'(8.)*/,)(%-6(49!:(2"#$'1"# %&"(

*/')(56(2"+%,$)

! 4%-(,'"(!!"#$%!! %-2(!!&'()*'!!+

)/;")*"&

! !"#$%&"'()*%)(%(+,-#)./-(.'(#/01.$"2()/(%-2(

"3"#,)"'(/-()*"(2"<.#"

! 4%$$%5$"(+&/0()*"(*/')

! 9'"2(%'()*"("-)&6(1/.-)(+&/0(*/')()/(2"<.#"

! 49!:(1&/<.2"'(%('")(/+(5,.$)=.-(<"#)/&()61"'>

! *",-./+0*",-./+*",-1/+0*",-1/+*",-2/+0*",-2/+*",-3/+0*",-3/+

! $"#-%./+0$"#-%./+$"#-%1/+0$"#-%1/+$"#-%2/+0$"#-%2/+$"#-%3/+0$"#-%3/

! )4%./+0)4%./+)4%1/+0)4%1/+)4%2/+0)4%2/+)4%3/+0)4%3/+

! 5#46./+05#46./+5#461/+05#461/+5#462/+05#462/+5#463/+05#463/+

! 75#,%./+75#,%1/+75#,%2/+75#,%3+

! 4%-(#/-')&,#)(%(<"#)/&()61"(8.)*('1"#.%$(

+,-#)./->

8,9'!!"#$%&'(%):(;/+(.!"#$

! 4%-(%##"''("$"0"-)'(/+(%(<"#)/&()61"(8.)*(

!"#$%&!"'$%&!"($%&!")$*

('*(,-<=

! &)82 .'(%('1"#.%$(<"#)/&()61"

! ?%0"(%'(0)4%2@("3#"1)(#%-(5"(#/-')&,#)"2(

+&/0(%('#%$%&()/(+/&0(%(<"#)/&>

:$*,5,-/+./+.>

! 49!:(1&/<.2"'(+/,&(;$/5%$@(5,.$)=.-(<%&.%5$"'

! %"-',&?&=@(@5#*9?&=@(@5#*9A)8@(

6-)&A)8

! +',-.&/0&/&1&)822&34&10)4%22&

! :##"''.5$"(/-$6(+&/0(2"<.#"(#/2"

! 4%--/)()%A"(%22&"''

! 4%--/)(%''.;-(<%$,"

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 21: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

G

! !"#$%&"'()*%)(%(+,-#)./-(.'(#/01.$"2()/(%-2(

"3"#,)"'(/-()*"(*/')

! 4%$$%5$"(/-$6(+&/0(%-/)*"&()*"(*/')

! 7,-#)./-'(8.)*/,)(%-6(49!:(2"#$'1"# %&"(

*/')(56(2"+%,$)

! 4%-(,'"(!!"#$%!! %-2(!!&'()*'!!+

)/;")*"&

! !"#$%&"'()*%)(%(+,-#)./-(.'(#/01.$"2()/(%-2(

"3"#,)"'(/-()*"(2"<.#"

! 4%$$%5$"(+&/0()*"(*/')

! 9'"2(%'()*"("-)&6(1/.-)(+&/0(*/')()/(2"<.#"

! 49!:(1&/<.2"'(%('")(/+(5,.$)=.-(<"#)/&()61"'>

! *",-./+0*",-./+*",-1/+0*",-1/+*",-2/+0*",-2/+*",-3/+0*",-3/+

! $"#-%./+0$"#-%./+$"#-%1/+0$"#-%1/+$"#-%2/+0$"#-%2/+$"#-%3/+0$"#-%3/

! )4%./+0)4%./+)4%1/+0)4%1/+)4%2/+0)4%2/+)4%3/+0)4%3/+

! 5#46./+05#46./+5#461/+05#461/+5#462/+05#462/+5#463/+05#463/+

! 75#,%./+75#,%1/+75#,%2/+75#,%3+

! 4%-(#/-')&,#)(%(<"#)/&()61"(8.)*('1"#.%$(

+,-#)./->

8,9'!!"#$%&'(%):(;/+(.!"#$

! 4%-(%##"''("$"0"-)'(/+(%(<"#)/&()61"(8.)*(

!"#$%&!"'$%&!"($%&!")$*

('*(,-<=

! &)82 .'(%('1"#.%$(<"#)/&()61"

! ?%0"(%'(0)4%2@("3#"1)(#%-(5"(#/-')&,#)"2(

+&/0(%('#%$%&()/(+/&0(%(<"#)/&>

:$*,5,-/+./+.>

! 49!:(1&/<.2"'(+/,&(;$/5%$@(5,.$)=.-(<%&.%5$"'

! %"-',&?&=@(@5#*9?&=@(@5#*9A)8@(

6-)&A)8

! +',-.&/0&/&1&)822&34&10)4%22&

! :##"''.5$"(/-$6(+&/0(2"<.#"(#/2"

! 4%--/)()%A"(%22&"''

! 4%--/)(%''.;-(<%$,"!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 22: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

G

! !"#$%&"'()*%)(%(+,-#)./-(.'(#/01.$"2()/(%-2(

"3"#,)"'(/-()*"(*/')

! 4%$$%5$"(/-$6(+&/0(%-/)*"&()*"(*/')

! 7,-#)./-'(8.)*/,)(%-6(49!:(2"#$'1"# %&"(

*/')(56(2"+%,$)

! 4%-(,'"(!!"#$%!! %-2(!!&'()*'!!+

)/;")*"&

! !"#$%&"'()*%)(%(+,-#)./-(.'(#/01.$"2()/(%-2(

"3"#,)"'(/-()*"(2"<.#"

! 4%$$%5$"(+&/0()*"(*/')

! 9'"2(%'()*"("-)&6(1/.-)(+&/0(*/')()/(2"<.#"

! 49!:(1&/<.2"'(%('")(/+(5,.$)=.-(<"#)/&()61"'>

! *",-./+0*",-./+*",-1/+0*",-1/+*",-2/+0*",-2/+*",-3/+0*",-3/+

! $"#-%./+0$"#-%./+$"#-%1/+0$"#-%1/+$"#-%2/+0$"#-%2/+$"#-%3/+0$"#-%3/

! )4%./+0)4%./+)4%1/+0)4%1/+)4%2/+0)4%2/+)4%3/+0)4%3/+

! 5#46./+05#46./+5#461/+05#461/+5#462/+05#462/+5#463/+05#463/+

! 75#,%./+75#,%1/+75#,%2/+75#,%3+

! 4%-(#/-')&,#)(%(<"#)/&()61"(8.)*('1"#.%$(

+,-#)./->

8,9'!!"#$%&'(%):(;/+(.!"#$

! 4%-(%##"''("$"0"-)'(/+(%(<"#)/&()61"(8.)*(

!"#$%&!"'$%&!"($%&!")$*

('*(,-<=

! &)82 .'(%('1"#.%$(<"#)/&()61"

! ?%0"(%'(0)4%2@("3#"1)(#%-(5"(#/-')&,#)"2(

+&/0(%('#%$%&()/(+/&0(%(<"#)/&>

:$*,5,-/+./+.>

! 49!:(1&/<.2"'(+/,&(;$/5%$@(5,.$)=.-(<%&.%5$"'

! %"-',&?&=@(@5#*9?&=@(@5#*9A)8@(

6-)&A)8

! +',-.&/0&/&1&)822&34&10)4%22&

! :##"''.5$"(/-$6(+&/0(2"<.#"(#/2"

! 4%--/)()%A"(%22&"''

! 4%--/)(%''.;-(<%$,"!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 23: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

#

! !"#$%&'()*+,-%-./0120*2%-341'%0(%513/26%06,%

,7,230*(/%(8%9,'/,5-

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./! !"#$ *-%1%%%&'()*'%%+83/20*(/

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./

! ,-./0.1 !"#$#%/.12&#'()*+#*,)-*""!./#0!1!/0#

06,%-*:,%(8%06,%4'*+%;*<,<%&'(=5,>%+(>1*/?

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./

! 3'($40.1 !"#$#%/.12&#'()*+#*,)-*""!./#0!1!/0#

06,%-*:,%(8%1%06',1+%=5(29

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./

! @6,%2(>&*5,'%03'/-%06*-%0.&,%(8%-010,>,/0%

*/0(%1%=5(29%(8%2(+,%0610%2(/8*43',-A%1/+%

513/26,-%06,%9,'/,5

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./ ! !"#$%+,8*/,-%1%51/4314,%0610%*-%-*>*51'%0(%

!B!CC

! D>&('01/0%#*88,',/2,-E

! F3/0*>,%G*='1'.

! H3/20*(/-

! !51--,-A%I0'320-A%"/*(/-

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

#

! !"#$%&'()*+,-%-./0120*2%-341'%0(%513/26%06,%

,7,230*(/%(8%9,'/,5-

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./! !"#$ *-%1%%%&'()*'%%+83/20*(/

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./

! ,-./0.1 !"#$#%/.12&#'()*+#*,)-*""!./#0!1!/0#

06,%-*:,%(8%06,%4'*+%;*<,<%&'(=5,>%+(>1*/?

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./

! 3'($40.1 !"#$#%/.12&#'()*+#*,)-*""!./#0!1!/0#

06,%-*:,%(8%1%06',1+%=5(29

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./

! @6,%2(>&*5,'%03'/-%06*-%0.&,%(8%-010,>,/0%

*/0(%1%=5(29%(8%2(+,%0610%2(/8*43',-A%1/+%

513/26,-%06,%9,'/,5

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./ ! !"#$%+,8*/,-%1%51/4314,%0610%*-%-*>*51'%0(%

!B!CC

! D>&('01/0%#*88,',/2,-E

! F3/0*>,%G*='1'.

! H3/20*(/-

! !51--,-A%I0'320-A%"/*(/-

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

#

! !"#$%&'()*+,-%-./0120*2%-341'%0(%513/26%06,%

,7,230*(/%(8%9,'/,5-

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./! !"#$ *-%1%%%&'()*'%%+83/20*(/

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./

! ,-./0.1 !"#$#%/.12&#'()*+#*,)-*""!./#0!1!/0#

06,%-*:,%(8%06,%4'*+%;*<,<%&'(=5,>%+(>1*/?

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./

! 3'($40.1 !"#$#%/.12&#'()*+#*,)-*""!./#0!1!/0#

06,%-*:,%(8%1%06',1+%=5(29

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./

! @6,%2(>&*5,'%03'/-%06*-%0.&,%(8%-010,>,/0%

*/0(%1%=5(29%(8%2(+,%0610%2(/8*43',-A%1/+%

513/26,-%06,%9,'/,5

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./ ! !"#$%+,8*/,-%1%51/4314,%0610%*-%-*>*51'%0(%

!B!CC

! D>&('01/0%#*88,',/2,-E

! F3/0*>,%G*='1'.

! H3/20*(/-

! !51--,-A%I0'320-A%"/*(/-Tuesday, January 13, 2009

Page 24: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

#

! !"#$%&'()*+,-%-./0120*2%-341'%0(%513/26%06,%

,7,230*(/%(8%9,'/,5-

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./! !"#$ *-%1%%%&'()*'%%+83/20*(/

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./

! ,-./0.1 !"#$#%/.12&#'()*+#*,)-*""!./#0!1!/0#

06,%-*:,%(8%06,%4'*+%;*<,<%&'(=5,>%+(>1*/?

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./

! 3'($40.1 !"#$#%/.12&#'()*+#*,)-*""!./#0!1!/0#

06,%-*:,%(8%1%06',1+%=5(29

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./

! @6,%2(>&*5,'%03'/-%06*-%0.&,%(8%-010,>,/0%

*/0(%1%=5(29%(8%2(+,%0610%2(/8*43',-A%1/+%

513/26,-%06,%9,'/,5

!"#$%%%&'()*(+,-./0$1*(+!!!"#$%&'()*+,-./ ! !"#$%+,8*/,-%1%51/4314,%0610%*-%-*>*51'%0(%

!B!CC

! D>&('01/0%#*88,',/2,-E

! F3/0*>,%G*='1'.

! H3/20*(/-

! !51--,-A%I0'320-A%"/*(/-

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 25: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

'

! !"#$%&'(&%)*+,%"+%&'$%#$-./$%/(+0&%*,$%+")1(2%

!"!##$%&'()*+$,)-./.0$1&'2()3'4

! 53$!"#$%&6$&"'()6$*(++,-6$+(2

! 734($*/(8$1&'2()3'4$8/9+$:+9)2+$+;&)9/<+'(

! =8+.+$/.+$/$'&*-+.$31$:+9)2+$4>+2)1)2$

1&'2()3'4")'(.)'4)24 /9/)</-<+?

! ../0$-12"'()/

! ..*3+45

! (%,*#-6))6$(%,*#-7686$/(3*)27)'3%4

! =8+.+$)4$'3$*(++,- 3.$&"'' 1&'2()3'$(8/($2/'$

-+$2/<<+:$1.3*$:+9)2+$23:+

"@3A$2/'$A+$/<<32/(+$*+*3.0B

! =8+.+$)4$'3$*(++,- 3.$&"'' 1&'2()3'$(8/($2/'$

-+$2/<<+:$1.3*$:+9)2+$23:+

"@3A$2/'$A+$/<<32/(+$*+*3.0B

! C.3*$(8+$834(

! D4)'E$!DFG$HIH$/(3*)246$A.)(+$/$2&4(3*$/<<32/(3.

! J'$/$!DFG$:+9)2+6$(8+.+$)4$'3$4(/2K

! L0$:+1/&<(6$/<<$1&'2()3'$2/<<4$/.+$)'<)'+:

! !/'$&4+$!!"#$"%$"&!! (3$>.+9+'($M!DFG$HIHN

! G<<$<32/<$9/.)/-<+46$1&'2()3'$/.E&*+'(4$/.+$

4(3.+:$)'$.+E)4(+.4! '( 1&'2()3'$.+2&.4)3'

! 53$1&'2()3'$>3)'(+.4

! !DFG$4&>>3.(4$43*+$!##$1+/(&.+4$13.$:+9)2+$

23:+I$$OIE?

! =+*></(+$1&'2()3'4

! !</44+4$/.+$4&>>3.(+:$)'4):+$I2&$43&.2+6$-&($

*&4($-+$834($3'<0

! P(.&2(4"D')3'4$A3.K$3'$:+9)2+$23:+$/4$>+.$!

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 26: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

'

! !"#$%&'(&%)*+,%"+%&'$%#$-./$%/(+0&%*,$%+")1(2%

!"!##$%&'()*+$,)-./.0$1&'2()3'4

! 53$!"#$%&6$&"'()6$*(++,-6$+(2

! 734($*/(8$1&'2()3'4$8/9+$:+9)2+$+;&)9/<+'(

! =8+.+$/.+$/$'&*-+.$31$:+9)2+$4>+2)1)2$

1&'2()3'4")'(.)'4)24 /9/)</-<+?

! ../0$-12"'()/

! ..*3+45

! (%,*#-6))6$(%,*#-7686$/(3*)27)'3%4

! =8+.+$)4$'3$*(++,- 3.$&"'' 1&'2()3'$(8/($2/'$

-+$2/<<+:$1.3*$:+9)2+$23:+

"@3A$2/'$A+$/<<32/(+$*+*3.0B

! =8+.+$)4$'3$*(++,- 3.$&"'' 1&'2()3'$(8/($2/'$

-+$2/<<+:$1.3*$:+9)2+$23:+

"@3A$2/'$A+$/<<32/(+$*+*3.0B

! C.3*$(8+$834(

! D4)'E$!DFG$HIH$/(3*)246$A.)(+$/$2&4(3*$/<<32/(3.

! J'$/$!DFG$:+9)2+6$(8+.+$)4$'3$4(/2K

! L0$:+1/&<(6$/<<$1&'2()3'$2/<<4$/.+$)'<)'+:

! !/'$&4+$!!"#$"%$"&!! (3$>.+9+'($M!DFG$HIHN

! G<<$<32/<$9/.)/-<+46$1&'2()3'$/.E&*+'(4$/.+$

4(3.+:$)'$.+E)4(+.4! '( 1&'2()3'$.+2&.4)3'

! 53$1&'2()3'$>3)'(+.4

! !DFG$4&>>3.(4$43*+$!##$1+/(&.+4$13.$:+9)2+$

23:+I$$OIE?

! =+*></(+$1&'2()3'4

! !</44+4$/.+$4&>>3.(+:$)'4):+$I2&$43&.2+6$-&($

*&4($-+$834($3'<0

! P(.&2(4"D')3'4$A3.K$3'$:+9)2+$23:+$/4$>+.$!

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 27: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

'

! !"#$%&'(&%)*+,%"+%&'$%#$-./$%/(+0&%*,$%+")1(2%

!"!##$%&'()*+$,)-./.0$1&'2()3'4

! 53$!"#$%&6$&"'()6$*(++,-6$+(2

! 734($*/(8$1&'2()3'4$8/9+$:+9)2+$+;&)9/<+'(

! =8+.+$/.+$/$'&*-+.$31$:+9)2+$4>+2)1)2$

1&'2()3'4")'(.)'4)24 /9/)</-<+?

! ../0$-12"'()/

! ..*3+45

! (%,*#-6))6$(%,*#-7686$/(3*)27)'3%4

! =8+.+$)4$'3$*(++,- 3.$&"'' 1&'2()3'$(8/($2/'$

-+$2/<<+:$1.3*$:+9)2+$23:+

"@3A$2/'$A+$/<<32/(+$*+*3.0B

! =8+.+$)4$'3$*(++,- 3.$&"'' 1&'2()3'$(8/($2/'$

-+$2/<<+:$1.3*$:+9)2+$23:+

"@3A$2/'$A+$/<<32/(+$*+*3.0B

! C.3*$(8+$834(

! D4)'E$!DFG$HIH$/(3*)246$A.)(+$/$2&4(3*$/<<32/(3.

! J'$/$!DFG$:+9)2+6$(8+.+$)4$'3$4(/2K

! L0$:+1/&<(6$/<<$1&'2()3'$2/<<4$/.+$)'<)'+:

! !/'$&4+$!!"#$"%$"&!! (3$>.+9+'($M!DFG$HIHN

! G<<$<32/<$9/.)/-<+46$1&'2()3'$/.E&*+'(4$/.+$

4(3.+:$)'$.+E)4(+.4! '( 1&'2()3'$.+2&.4)3'

! 53$1&'2()3'$>3)'(+.4

! !DFG$4&>>3.(4$43*+$!##$1+/(&.+4$13.$:+9)2+$

23:+I$$OIE?

! =+*></(+$1&'2()3'4

! !</44+4$/.+$4&>>3.(+:$)'4):+$I2&$43&.2+6$-&($

*&4($-+$834($3'<0

! P(.&2(4"D')3'4$A3.K$3'$:+9)2+$23:+$/4$>+.$!

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Language

Tuesday, January 13, 2009

Page 28: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$%&'"(&)*+,-.#./"$0'"120342&"15"678

9):$0$;".<<&0=&>;"/8?8>@"AB3CC;"CDDB

16

Common Runtime Component:

Mathematical Functions• pow, sqrt, cbrt, hypot

• exp, exp2, expm1

• log, log2, log10, log1p

• sin, cos, tan, asin, acos, atan, atan2

• sinh, cosh, tanh, asinh, acosh, atanh

• ceil, floor, trunc, round

• Etc.

– When executed on the host, a given function uses

the C runtime implementation if available

– These functions are only supported for scalar types,

not vector types

Language

Tuesday, January 13, 2009

Page 29: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$%&'"(&)*+,-.#./"$0'"120342&"15"678

9):$0$;".<<&0=&>;"/8?8>@"AB3CC;"CDDB

17

Device Runtime Component:

Mathematical Functions• Some mathematical functions (e.g. sin(x))

have a less accurate, but faster device-only version (e.g. __sin(x))

– __pow

– __log, __log2, __log10

– __exp

– __sin, __cos, __tan

Language

Tuesday, January 13, 2009

Page 30: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

CUDA Compilation

IAP09 CUDA@MIT / 6.963

Tuesday, January 13, 2009

Page 31: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

0

! !"#$%&'()*+%,-.+&%+/0%-/%12*(3

! !"#$%&#'%'(&)'"*'+,-&.,'%#+'/"0$'."+,1+%$%

! !"(2&3,+'45'!"##

! !"## &0'6,%335'%'76%22,6'%6"8#+'%'("6,'

."(23,)'."(2&3%$&"#'26".,00

!"#$%

! 9"6(%3':.;':.22 0"86.,'*&3,0

! !<=>':.8'0"86.,'."+,'*&3,0

&$%#$%

! ?4@,.$1,),.8$%43,'."+,'*"6'/"0$! :.84&# ,),.8$%43,'."+,'*"6'$/,'+,-&.,

! A"6':.'%#+':.22 *&3,0;'#-.. &#-"B,0'$/,'#%$&-,'

!1!CC'."(2&3,6'*"6'$/,'050$,('D,EF'E..1.3G

! 4')%2*(%,-.+&5%-6%-&%7%.-66.+%8')+%*'89.-*76+0:

'($

'($

')#$'(

'( '*

'#%+ '($,-"

'*

.22

.8+%*, .22 3&#B,6

#-"2,#.. 2$)%0 .84&#

.22 3&#B,6'.%$,'(

! H"'0,,'$/,'0$,20'2,6*"6(,+'45'#-..;'80,'$/,'

//0121$" %#+'//344#5."((%#+'3&#,'"2$&"#0

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Compilation

Tuesday, January 13, 2009

Page 32: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

0

! !"#$%&'()*+%,-.+&%+/0%-/%12*(3

! !"#$%&#'%'(&)'"*'+,-&.,'%#+'/"0$'."+,1+%$%

! !"(2&3,+'45'!"##

! !"## &0'6,%335'%'76%22,6'%6"8#+'%'("6,'

."(23,)'."(2&3%$&"#'26".,00

!"#$%

! 9"6(%3':.;':.22 0"86.,'*&3,0

! !<=>':.8'0"86.,'."+,'*&3,0

&$%#$%

! ?4@,.$1,),.8$%43,'."+,'*"6'/"0$! :.84&# ,),.8$%43,'."+,'*"6'$/,'+,-&.,

! A"6':.'%#+':.22 *&3,0;'#-.. &#-"B,0'$/,'#%$&-,'

!1!CC'."(2&3,6'*"6'$/,'050$,('D,EF'E..1.3G

! 4')%2*(%,-.+&5%-6%-&%7%.-66.+%8')+%*'89.-*76+0:

'($

'($

')#$'(

'( '*

'#%+ '($,-"

'*

.22

.8+%*, .22 3&#B,6

#-"2,#.. 2$)%0 .84&#

.22 3&#B,6'.%$,'(

! H"'0,,'$/,'0$,20'2,6*"6(,+'45'#-..;'80,'$/,'

//0121$" %#+'//344#5."((%#+'3&#,'"2$&"#0

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Compilation

Tuesday, January 13, 2009

Page 33: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

0

! !"#$%&'()*+%,-.+&%+/0%-/%12*(3

! !"#$%&#'%'(&)'"*'+,-&.,'%#+'/"0$'."+,1+%$%

! !"(2&3,+'45'!"##

! !"## &0'6,%335'%'76%22,6'%6"8#+'%'("6,'

."(23,)'."(2&3%$&"#'26".,00

!"#$%

! 9"6(%3':.;':.22 0"86.,'*&3,0

! !<=>':.8'0"86.,'."+,'*&3,0

&$%#$%

! ?4@,.$1,),.8$%43,'."+,'*"6'/"0$! :.84&# ,),.8$%43,'."+,'*"6'$/,'+,-&.,

! A"6':.'%#+':.22 *&3,0;'#-.. &#-"B,0'$/,'#%$&-,'

!1!CC'."(2&3,6'*"6'$/,'050$,('D,EF'E..1.3G

! 4')%2*(%,-.+&5%-6%-&%7%.-66.+%8')+%*'89.-*76+0:

'($

'($

')#$'(

'( '*

'#%+ '($,-"

'*

.22

.8+%*, .22 3&#B,6

#-"2,#.. 2$)%0 .84&#

.22 3&#B,6'.%$,'(

! H"'0,,'$/,'0$,20'2,6*"6(,+'45'#-..;'80,'$/,'

//0121$" %#+'//344#5."((%#+'3&#,'"2$&"#0

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Compilation

Tuesday, January 13, 2009

Page 34: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

0

! !"#$%&'()*+%,-.+&%+/0%-/%12*(3

! !"#$%&#'%'(&)'"*'+,-&.,'%#+'/"0$'."+,1+%$%

! !"(2&3,+'45'!"##

! !"## &0'6,%335'%'76%22,6'%6"8#+'%'("6,'

."(23,)'."(2&3%$&"#'26".,00

!"#$%

! 9"6(%3':.;':.22 0"86.,'*&3,0

! !<=>':.8'0"86.,'."+,'*&3,0

&$%#$%

! ?4@,.$1,),.8$%43,'."+,'*"6'/"0$! :.84&# ,),.8$%43,'."+,'*"6'$/,'+,-&.,

! A"6':.'%#+':.22 *&3,0;'#-.. &#-"B,0'$/,'#%$&-,'

!1!CC'."(2&3,6'*"6'$/,'050$,('D,EF'E..1.3G

! 4')%2*(%,-.+&5%-6%-&%7%.-66.+%8')+%*'89.-*76+0:

'($

'($

')#$'(

'( '*

'#%+ '($,-"

'*

.22

.8+%*, .22 3&#B,6

#-"2,#.. 2$)%0 .84&#

.22 3&#B,6'.%$,'(

! H"'0,,'$/,'0$,20'2,6*"6(,+'45'#-..;'80,'$/,'

//0121$" %#+'//344#5."((%#+'3&#,'"2$&"#0

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Compilation

Tuesday, January 13, 2009

Page 35: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

0

! !"#$%&'()*+%,-.+&%+/0%-/%12*(3

! !"#$%&#'%'(&)'"*'+,-&.,'%#+'/"0$'."+,1+%$%

! !"(2&3,+'45'!"##

! !"## &0'6,%335'%'76%22,6'%6"8#+'%'("6,'

."(23,)'."(2&3%$&"#'26".,00

!"#$%

! 9"6(%3':.;':.22 0"86.,'*&3,0

! !<=>':.8'0"86.,'."+,'*&3,0

&$%#$%

! ?4@,.$1,),.8$%43,'."+,'*"6'/"0$! :.84&# ,),.8$%43,'."+,'*"6'$/,'+,-&.,

! A"6':.'%#+':.22 *&3,0;'#-.. &#-"B,0'$/,'#%$&-,'

!1!CC'."(2&3,6'*"6'$/,'050$,('D,EF'E..1.3G

! 4')%2*(%,-.+&5%-6%-&%7%.-66.+%8')+%*'89.-*76+0:

'($

'($

')#$'(

'( '*

'#%+ '($,-"

'*

.22

.8+%*, .22 3&#B,6

#-"2,#.. 2$)%0 .84&#

.22 3&#B,6'.%$,'(

! H"'0,,'$/,'0$,20'2,6*"6(,+'45'#-..;'80,'$/,'

//0121$" %#+'//344#5."((%#+'3&#,'"2$&"#0

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Compilation

Tuesday, January 13, 2009

Page 36: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

F

! !"#$%&$'$()*+%, -%,./0$#%12$12/$3/&1$"4$12/$

53"63'78

! 9',$+/:

! ;"'0/0$'&$'$4%-/$'1$3*,1%7/

! <7+/00/0$%,$0'1'$&/67/,1

! <7+/00/0$'&$'$3/&"*3)/

! !"#$"%&&'($)'*)+,(-),(.'/0

! =2/$53"63'7$)3'&2/&

! >1$53"0*)/&$12/$#3",6$3/&*-1

! 0

! ?*1@$12/3/$'3/$7',A$0/+*66%,6$1/)2,%B*/&

! C/+*66%,6$&"41#'3/$D/6:$60+@$E%&*'-$F1*0%"G

! !"#$%&

! 9HCI$53"63'77%,6$%&$/J/,$-/&&$4*,

! =2/3/$%&$,"$0/+*66/3

! =2/3/$%&$,"$!"#$%&

! 9HCI$53"63'77%,6$%&$/J/,$-/&&$4*,

! =2/3/$%&$,"$0/+*66/3

! =2/3/$%&$,"$!"#$%&

! C/+*66%,6$)"0/$",$12/$0/J%)/$%&$J/3A$2'30

! 9',$13A$1"$#3%1/$%,1/37/0%'1/$3/&*-1&$1"$7/7"3A$

',0$)"5A$+').$1"$2"&1$1"$/K'7%,/

! <7*-'1%",$7"0/

! ?A$*&%,6$'$)"75%-/3$4-'6@$A"*$)',$/7*-'1/$+A$

3*,,%,6$!"" )"0/$",$12/$2"&1

! 9"75%-/3$L-'6: ##$%&'(%#%)*"!+',-

! M""0$4"3$7"&1$0/+*66%,6:$)',$*&/$60+N53%,14

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Compilation

Tuesday, January 13, 2009

Page 37: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

F

! !"#$%&$'$()*+%, -%,./0$#%12$12/$3/&1$"4$12/$

53"63'78

! 9',$+/:

! ;"'0/0$'&$'$4%-/$'1$3*,1%7/

! <7+/00/0$%,$0'1'$&/67/,1

! <7+/00/0$'&$'$3/&"*3)/

! !"#$"%&&'($)'*)+,(-),(.'/0

! =2/$53"63'7$)3'&2/&

! >1$53"0*)/&$12/$#3",6$3/&*-1

! 0

! ?*1@$12/3/$'3/$7',A$0/+*66%,6$1/)2,%B*/&

! C/+*66%,6$&"41#'3/$D/6:$60+@$E%&*'-$F1*0%"G

! !"#$%&

! 9HCI$53"63'77%,6$%&$/J/,$-/&&$4*,

! =2/3/$%&$,"$0/+*66/3

! =2/3/$%&$,"$!"#$%&

! 9HCI$53"63'77%,6$%&$/J/,$-/&&$4*,

! =2/3/$%&$,"$0/+*66/3

! =2/3/$%&$,"$!"#$%&

! C/+*66%,6$)"0/$",$12/$0/J%)/$%&$J/3A$2'30

! 9',$13A$1"$#3%1/$%,1/37/0%'1/$3/&*-1&$1"$7/7"3A$

',0$)"5A$+').$1"$2"&1$1"$/K'7%,/

! <7*-'1%",$7"0/

! ?A$*&%,6$'$)"75%-/3$4-'6@$A"*$)',$/7*-'1/$+A$

3*,,%,6$!"" )"0/$",$12/$2"&1

! 9"75%-/3$L-'6: ##$%&'(%#%)*"!+',-

! M""0$4"3$7"&1$0/+*66%,6:$)',$*&/$60+N53%,14

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Compilation

Tuesday, January 13, 2009

Page 38: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

F

! !"#$%&$'$()*+%, -%,./0$#%12$12/$3/&1$"4$12/$

53"63'78

! 9',$+/:

! ;"'0/0$'&$'$4%-/$'1$3*,1%7/

! <7+/00/0$%,$0'1'$&/67/,1

! <7+/00/0$'&$'$3/&"*3)/

! !"#$"%&&'($)'*)+,(-),(.'/0

! =2/$53"63'7$)3'&2/&

! >1$53"0*)/&$12/$#3",6$3/&*-1

! 0

! ?*1@$12/3/$'3/$7',A$0/+*66%,6$1/)2,%B*/&

! C/+*66%,6$&"41#'3/$D/6:$60+@$E%&*'-$F1*0%"G

! !"#$%&

! 9HCI$53"63'77%,6$%&$/J/,$-/&&$4*,

! =2/3/$%&$,"$0/+*66/3

! =2/3/$%&$,"$!"#$%&

! 9HCI$53"63'77%,6$%&$/J/,$-/&&$4*,

! =2/3/$%&$,"$0/+*66/3

! =2/3/$%&$,"$!"#$%&

! C/+*66%,6$)"0/$",$12/$0/J%)/$%&$J/3A$2'30

! 9',$13A$1"$#3%1/$%,1/37/0%'1/$3/&*-1&$1"$7/7"3A$

',0$)"5A$+').$1"$2"&1$1"$/K'7%,/

! <7*-'1%",$7"0/

! ?A$*&%,6$'$)"75%-/3$4-'6@$A"*$)',$/7*-'1/$+A$

3*,,%,6$!"" )"0/$",$12/$2"&1

! 9"75%-/3$L-'6: ##$%&'(%#%)*"!+',-

! M""0$4"3$7"&1$0/+*66%,6:$)',$*&/$60+N53%,14

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Debug

Tuesday, January 13, 2009

Page 39: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DEFD/$$0

E$

! !"#$%&'(#)#*+,-&./0#1.)(2#"+$#*)'#/,$.)3/#4"#

0$''&'(#!"" *+5/#+'#36/#6+%3

! 7+,-&./0#8.)(9 ##$%&'(%#%)*"!+',-

! :++5#1+0#,+%3#5/4$((&'(9#*)'#$%/#(54;-0&'31

! <+3#)#30$/#/,$.)3&+'9

! =)*/#7+'5&3&+'%2#>/,+0"#,+5/.#5&11/0/'*/%2#/3*

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Emu

Tuesday, January 13, 2009

Page 40: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$%&'"(&)*+,-.#./"$0'"120342&"15"678

9):$0$;".<<&0=&>;"/8?8>@"AB3CC;"CDDB

Device Emulation Mode Pitfalls

• Emulated device threads execute sequentially, so simultaneous accesses of the same memorylocation by multiple threads could produce different results.

• Dereferencing device pointers on the host or host pointers on the device can produce correct results in device emulation mode, but will generate an error in device execution mode

Emu

Tuesday, January 13, 2009

Page 41: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$%&'"(&)*+,-.#./"$0'"120342&"15"678

9):$0$;".<<&0=&>;"/8?8>@"AB3CC;"CDDB

Floating Point

• Results of floating-point computations will slightly

differ because of:

– Different compiler outputs, instruction sets

– Use of extended precision for intermediate results

• There are various options to force strict single precision on

the host

Emu

Tuesday, January 13, 2009

Page 42: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

3M02: High Performance Computing with CUDA

4 cores

Libraries

!"##$ !"%&'( !")**

CUDA Compiler

+ !"#$#%&

CUDA Tools

'()*++(#,,*-./01-

GPU:card, system

Application SoftwareIndustry Standard C Language

Multicore CPU

CUDA ToolkitToolkit

Tuesday, January 13, 2009

Page 43: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

5M02: High Performance Computing with CUDA

CUDA Many-core + Multi-core supportCUDA Many-core + Multi-core support

C CUDA Application

Multi-core

CPU C code

Multi-core

gcc and

MSVC

Many-core

PTX code

PTX to Target

Compiler

Many-core

NVCC

--multicoreNVCC

Toolkit

Tuesday, January 13, 2009

Page 44: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

6M02: High Performance Computing with CUDA

CUDA Compiler: CUDA Compiler: nvccnvcc

Any source file containing CUDA language extensions (.cu)must be compiled with nvcc

NVCC is a compiler driver

Works by invoking all the necessary tools and compilers likecudacc, g++, cl, ...

NVCC can output:

Either C code (CPU Code)That must then be compiled with the rest of the application using another tool

Or PTX or object code directly

An executable with CUDA code requires:

The CUDA core library (cuda)

The CUDA runtime library (cudart)

Toolkit

Tuesday, January 13, 2009

Page 45: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

7M02: High Performance Computing with CUDA

CUDA Compiler: CUDA Compiler: nvccnvcc

Important flags:

-arch sm_13 Enable double precision ( on

compatible hardware)

-G Enable debug for device code

--ptxas-options=-v Show register and memory usage

--maxrregcount <N> Limit the number of registers

-use_fast_math Use fast math library

Toolkit

Tuesday, January 13, 2009

Page 46: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

8M02: High Performance Computing with CUDA

Compiling CUDA for Multi-CoreCompiling CUDA for Multi-Core

Using “—multicore” compile

switch with the NVCC

compiler generates C code

for multi-core CPU

Performance scales linearly

with more cores

Control numbers of cores

with environment variable

CUDA_NROF_CORES=n

NVCC --multicore

C/C++ CUDA

Application

Multicore CPU C Code

Multicore Optimized Application

gcc / MSVC

Toolkit

Tuesday, January 13, 2009

Page 47: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

35M02: High Performance Computing with CUDA

GPU ToolsGPU Tools

ProfilerAvailable now for all supported OSs

Command-line or GUI

Sampling signals on GPU for:

Memory access parameters

Execution (serialization, divergence)

DebuggerRuns on the GPU

Emulation modeCompile and execute in emulation on CPU

Allows CPU-style debugging in GPU source

Toolkit

Tuesday, January 13, 2009

Page 48: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

CUDA API

IAP09 CUDA@MIT / 6.963

Tuesday, January 13, 2009

Page 49: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

E

! !"#$%&"'()*+(,"(*--&"''"-(./($01()1-"'2(

!"#$%&'()* */-(!"!+!"#$%&'()*

! 31/4/1&)*5.6"-(./-.7"'(&*/8"(9&1)(:;<;=($1(

:3<>=

! 31&)*5.6"-(./-.7"'(&*/8"(9&1)(:;<;=($1(:?<?=

! @A*$(A*BB"/'(0A"/(*($"#$%&"(711&-./*$"(.'(

1%$'.-"($A"(-1)*./(19($A"($"#$%&"C

! D*/('"5"7$(*/(*--&"''./8()1-"2

! !"#$%&'( E(7A11'"(/"*&"'$(9&1)(,1%/-*&+

! )*#%%&'(+E($1&1.-*5 -1)*./

! !"#$%&"(&"9"&"/7"'(*&"(-"75*&"-(*'(*('B"7.*5(

$+B"(19(851,*5(F*&.*,5"2

!!"#$%&#!!'(#)(*+#,!"#$-'%&'$()&*()-''*%$.'/#)(*+#0#123

! ,-./2(D1)B%$"(G/.9."-(H"F.7"(I&7A.$"7$%&"

! D&"*$"-(,+(3JKHKI

! I(0*+($1(B"&91&)(71)B%$*$.1/(1/($A"(LMG

! NB"7.9.7*$.1/(91&2

! I(71)B%$"&(*&7A.$"7$%&"

! I(5*/8%*8"

! I/(*BB5.7*$.1/(./$"&9*7"(:IMK=

! !A"(DGHI(IMK(71/'.'$'(19($A&""(B*&$'2

! !A"(A1'$(IMK

! !A"(-"F.7"(IMK

! !A"(71))1/(IMK

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 50: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

&

! !"#$%&'($)*+,$(-.$/0*123#+$4567,2*6+$4*08

! '#127#$9:6:;#9#6,

! <#9*0=$9:6:;#9#6,

! >,0#:9$9:6:;#9#6,

! ?1#6,$9:6:;#9#6,

! !#@,50#$9:6:;9#6,

! A/#6BCD'20#7,E$26,#0*/#0:F2G2,=

! !"#$)*+,$(-.$2+$#@/*+#3$:+$,H*$3244#0#6,$

!"#$%&

! !"#$G*H$G#1#G$'#127#$(-.$I/0#42@8$75J

! !"#$"2;"$G#1#G$K56,29#$(-.$I/0#42@8$753:J

! >*9#$,"26;+$7:6$F#$3*6#$,"0*5;"$F*,"$(-.+L$*,"#0+$:0#$+/#72:G2M#3

! %:6$F#$92@#3$,*;#,"#0$IH2,"$7:0#J

! (GG$B-&$7*9/5,26;$2+$/#04*09#3$*6$:$3#127#! !*$:GG*7:,#$9#9*0=L$056$:$/0*;0:9L$#,7$*6$,"#$":03H:0#L$H#$6##3$:$!"#$%"&%'()"*)

! '#127#$7*6,#@,+$:0#$F*563$N8N$H2,"$"*+,$,"0#:3+$IO5+,$G2P#$A/#6BCQJ! >*L$#:7"$"*+,$,"0#:3$9:=$":1#$:,$9*+,$*6#$3#127#$7*6,#@,

! (63L$#:7"$3#127#$7*6,#@,$2+$:77#++2FG#$40*9$*6G=$*6#$"*+,$,"0#:3

! (GG$3#127#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$

7*3#$*4$,=/#8$+,-"./0)

! (GG$056,29#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$

7*3#$*4$,=/#$%/!12--'-3)

! (6$26,#;#0$1:G5#$H2,"$M#0*$R$6*$#00*0

! %/!14")51.)2--'-L$%/!14")2--'-6)-$(7

! K56,29#$(-.$7:GG+$:5,*9:,27:GG=$262,2:G2M#

! '#127#$(-.$7:GG+$95+,$7:GG$%/8($)

! !"#$420+,$I*/,2*6:GSJ$+,#/$2+$,*$#659#0:,#$,"#$

:1:2G:FG#$3#127#+

! %/9"#$%"4")+'/()

! %/9"#$%"4")

! %/9"#$%"4"):1;"

! %/9"#$%"4")<')10=";'->

! %/9"#$%"4")?))-$@/)"

! !

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 51: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

&

! !"#$%&'($)*+,$(-.$/0*123#+$4567,2*6+$4*08

! '#127#$9:6:;#9#6,

! <#9*0=$9:6:;#9#6,

! >,0#:9$9:6:;#9#6,

! ?1#6,$9:6:;#9#6,

! !#@,50#$9:6:;9#6,

! A/#6BCD'20#7,E$26,#0*/#0:F2G2,=

! !"#$)*+,$(-.$2+$#@/*+#3$:+$,H*$3244#0#6,$

!"#$%&

! !"#$G*H$G#1#G$'#127#$(-.$I/0#42@8$75J

! !"#$"2;"$G#1#G$K56,29#$(-.$I/0#42@8$753:J

! >*9#$,"26;+$7:6$F#$3*6#$,"0*5;"$F*,"$(-.+L$*,"#0+$:0#$+/#72:G2M#3

! %:6$F#$92@#3$,*;#,"#0$IH2,"$7:0#J

! (GG$B-&$7*9/5,26;$2+$/#04*09#3$*6$:$3#127#! !*$:GG*7:,#$9#9*0=L$056$:$/0*;0:9L$#,7$*6$,"#$":03H:0#L$H#$6##3$:$!"#$%"&%'()"*)

! '#127#$7*6,#@,+$:0#$F*563$N8N$H2,"$"*+,$,"0#:3+$IO5+,$G2P#$A/#6BCQJ! >*L$#:7"$"*+,$,"0#:3$9:=$":1#$:,$9*+,$*6#$3#127#$7*6,#@,

! (63L$#:7"$3#127#$7*6,#@,$2+$:77#++2FG#$40*9$*6G=$*6#$"*+,$,"0#:3

! (GG$3#127#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$

7*3#$*4$,=/#8$+,-"./0)

! (GG$056,29#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$

7*3#$*4$,=/#$%/!12--'-3)

! (6$26,#;#0$1:G5#$H2,"$M#0*$R$6*$#00*0

! %/!14")51.)2--'-L$%/!14")2--'-6)-$(7

! K56,29#$(-.$7:GG+$:5,*9:,27:GG=$262,2:G2M#

! '#127#$(-.$7:GG+$95+,$7:GG$%/8($)

! !"#$420+,$I*/,2*6:GSJ$+,#/$2+$,*$#659#0:,#$,"#$

:1:2G:FG#$3#127#+

! %/9"#$%"4")+'/()

! %/9"#$%"4")

! %/9"#$%"4"):1;"

! %/9"#$%"4")<')10=";'->

! %/9"#$%"4")?))-$@/)"

! !

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 52: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

&

! !"#$%&'($)*+,$(-.$/0*123#+$4567,2*6+$4*08

! '#127#$9:6:;#9#6,

! <#9*0=$9:6:;#9#6,

! >,0#:9$9:6:;#9#6,

! ?1#6,$9:6:;#9#6,

! !#@,50#$9:6:;9#6,

! A/#6BCD'20#7,E$26,#0*/#0:F2G2,=

! !"#$)*+,$(-.$2+$#@/*+#3$:+$,H*$3244#0#6,$

!"#$%&

! !"#$G*H$G#1#G$'#127#$(-.$I/0#42@8$75J

! !"#$"2;"$G#1#G$K56,29#$(-.$I/0#42@8$753:J

! >*9#$,"26;+$7:6$F#$3*6#$,"0*5;"$F*,"$(-.+L$*,"#0+$:0#$+/#72:G2M#3

! %:6$F#$92@#3$,*;#,"#0$IH2,"$7:0#J

! (GG$B-&$7*9/5,26;$2+$/#04*09#3$*6$:$3#127#! !*$:GG*7:,#$9#9*0=L$056$:$/0*;0:9L$#,7$*6$,"#$":03H:0#L$H#$6##3$:$!"#$%"&%'()"*)

! '#127#$7*6,#@,+$:0#$F*563$N8N$H2,"$"*+,$,"0#:3+$IO5+,$G2P#$A/#6BCQJ! >*L$#:7"$"*+,$,"0#:3$9:=$":1#$:,$9*+,$*6#$3#127#$7*6,#@,

! (63L$#:7"$3#127#$7*6,#@,$2+$:77#++2FG#$40*9$*6G=$*6#$"*+,$,"0#:3

! (GG$3#127#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$

7*3#$*4$,=/#8$+,-"./0)

! (GG$056,29#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$

7*3#$*4$,=/#$%/!12--'-3)

! (6$26,#;#0$1:G5#$H2,"$M#0*$R$6*$#00*0

! %/!14")51.)2--'-L$%/!14")2--'-6)-$(7

! K56,29#$(-.$7:GG+$:5,*9:,27:GG=$262,2:G2M#

! '#127#$(-.$7:GG+$95+,$7:GG$%/8($)

! !"#$420+,$I*/,2*6:GSJ$+,#/$2+$,*$#659#0:,#$,"#$

:1:2G:FG#$3#127#+

! %/9"#$%"4")+'/()

! %/9"#$%"4")

! %/9"#$%"4"):1;"

! %/9"#$%"4")<')10=";'->

! %/9"#$%"4")?))-$@/)"

! !

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 53: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

&

! !"#$%&'($)*+,$(-.$/0*123#+$4567,2*6+$4*08

! '#127#$9:6:;#9#6,

! <#9*0=$9:6:;#9#6,

! >,0#:9$9:6:;#9#6,

! ?1#6,$9:6:;#9#6,

! !#@,50#$9:6:;9#6,

! A/#6BCD'20#7,E$26,#0*/#0:F2G2,=

! !"#$)*+,$(-.$2+$#@/*+#3$:+$,H*$3244#0#6,$

!"#$%&

! !"#$G*H$G#1#G$'#127#$(-.$I/0#42@8$75J

! !"#$"2;"$G#1#G$K56,29#$(-.$I/0#42@8$753:J

! >*9#$,"26;+$7:6$F#$3*6#$,"0*5;"$F*,"$(-.+L$*,"#0+$:0#$+/#72:G2M#3

! %:6$F#$92@#3$,*;#,"#0$IH2,"$7:0#J

! (GG$B-&$7*9/5,26;$2+$/#04*09#3$*6$:$3#127#! !*$:GG*7:,#$9#9*0=L$056$:$/0*;0:9L$#,7$*6$,"#$":03H:0#L$H#$6##3$:$!"#$%"&%'()"*)

! '#127#$7*6,#@,+$:0#$F*563$N8N$H2,"$"*+,$,"0#:3+$IO5+,$G2P#$A/#6BCQJ! >*L$#:7"$"*+,$,"0#:3$9:=$":1#$:,$9*+,$*6#$3#127#$7*6,#@,

! (63L$#:7"$3#127#$7*6,#@,$2+$:77#++2FG#$40*9$*6G=$*6#$"*+,$,"0#:3

! (GG$3#127#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$

7*3#$*4$,=/#8$+,-"./0)

! (GG$056,29#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$

7*3#$*4$,=/#$%/!12--'-3)

! (6$26,#;#0$1:G5#$H2,"$M#0*$R$6*$#00*0

! %/!14")51.)2--'-L$%/!14")2--'-6)-$(7

! K56,29#$(-.$7:GG+$:5,*9:,27:GG=$262,2:G2M#

! '#127#$(-.$7:GG+$95+,$7:GG$%/8($)

! !"#$420+,$I*/,2*6:GSJ$+,#/$2+$,*$#659#0:,#$,"#$

:1:2G:FG#$3#127#+

! %/9"#$%"4")+'/()

! %/9"#$%"4")

! %/9"#$%"4"):1;"

! %/9"#$%"4")<')10=";'->

! %/9"#$%"4")?))-$@/)"

! !

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 54: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

&

! !"#$%&'($)*+,$(-.$/0*123#+$4567,2*6+$4*08

! '#127#$9:6:;#9#6,

! <#9*0=$9:6:;#9#6,

! >,0#:9$9:6:;#9#6,

! ?1#6,$9:6:;#9#6,

! !#@,50#$9:6:;9#6,

! A/#6BCD'20#7,E$26,#0*/#0:F2G2,=

! !"#$)*+,$(-.$2+$#@/*+#3$:+$,H*$3244#0#6,$

!"#$%&

! !"#$G*H$G#1#G$'#127#$(-.$I/0#42@8$75J

! !"#$"2;"$G#1#G$K56,29#$(-.$I/0#42@8$753:J

! >*9#$,"26;+$7:6$F#$3*6#$,"0*5;"$F*,"$(-.+L$*,"#0+$:0#$+/#72:G2M#3

! %:6$F#$92@#3$,*;#,"#0$IH2,"$7:0#J

! (GG$B-&$7*9/5,26;$2+$/#04*09#3$*6$:$3#127#! !*$:GG*7:,#$9#9*0=L$056$:$/0*;0:9L$#,7$*6$,"#$":03H:0#L$H#$6##3$:$!"#$%"&%'()"*)

! '#127#$7*6,#@,+$:0#$F*563$N8N$H2,"$"*+,$,"0#:3+$IO5+,$G2P#$A/#6BCQJ! >*L$#:7"$"*+,$,"0#:3$9:=$":1#$:,$9*+,$*6#$3#127#$7*6,#@,

! (63L$#:7"$3#127#$7*6,#@,$2+$:77#++2FG#$40*9$*6G=$*6#$"*+,$,"0#:3

! (GG$3#127#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$

7*3#$*4$,=/#8$+,-"./0)

! (GG$056,29#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$

7*3#$*4$,=/#$%/!12--'-3)

! (6$26,#;#0$1:G5#$H2,"$M#0*$R$6*$#00*0

! %/!14")51.)2--'-L$%/!14")2--'-6)-$(7

! K56,29#$(-.$7:GG+$:5,*9:,27:GG=$262,2:G2M#

! '#127#$(-.$7:GG+$95+,$7:GG$%/8($)

! !"#$420+,$I*/,2*6:GSJ$+,#/$2+$,*$#659#0:,#$,"#$

:1:2G:FG#$3#127#+

! %/9"#$%"4")+'/()

! %/9"#$%"4")

! %/9"#$%"4"):1;"

! %/9"#$%"4")<')10=";'->

! %/9"#$%"4")?))-$@/)"

! !

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 55: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

&

! !"#$%&'($)*+,$(-.$/0*123#+$4567,2*6+$4*08

! '#127#$9:6:;#9#6,

! <#9*0=$9:6:;#9#6,

! >,0#:9$9:6:;#9#6,

! ?1#6,$9:6:;#9#6,

! !#@,50#$9:6:;9#6,

! A/#6BCD'20#7,E$26,#0*/#0:F2G2,=

! !"#$)*+,$(-.$2+$#@/*+#3$:+$,H*$3244#0#6,$

!"#$%&

! !"#$G*H$G#1#G$'#127#$(-.$I/0#42@8$75J

! !"#$"2;"$G#1#G$K56,29#$(-.$I/0#42@8$753:J

! >*9#$,"26;+$7:6$F#$3*6#$,"0*5;"$F*,"$(-.+L$*,"#0+$:0#$+/#72:G2M#3

! %:6$F#$92@#3$,*;#,"#0$IH2,"$7:0#J

! (GG$B-&$7*9/5,26;$2+$/#04*09#3$*6$:$3#127#! !*$:GG*7:,#$9#9*0=L$056$:$/0*;0:9L$#,7$*6$,"#$":03H:0#L$H#$6##3$:$!"#$%"&%'()"*)

! '#127#$7*6,#@,+$:0#$F*563$N8N$H2,"$"*+,$,"0#:3+$IO5+,$G2P#$A/#6BCQJ! >*L$#:7"$"*+,$,"0#:3$9:=$":1#$:,$9*+,$*6#$3#127#$7*6,#@,

! (63L$#:7"$3#127#$7*6,#@,$2+$:77#++2FG#$40*9$*6G=$*6#$"*+,$,"0#:3

! (GG$3#127#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$

7*3#$*4$,=/#8$+,-"./0)

! (GG$056,29#$(-.$7:GG+$0#,506$:6$#00*0D+577#++$

7*3#$*4$,=/#$%/!12--'-3)

! (6$26,#;#0$1:G5#$H2,"$M#0*$R$6*$#00*0

! %/!14")51.)2--'-L$%/!14")2--'-6)-$(7

! K56,29#$(-.$7:GG+$:5,*9:,27:GG=$262,2:G2M#

! '#127#$(-.$7:GG+$95+,$7:GG$%/8($)

! !"#$420+,$I*/,2*6:GSJ$+,#/$2+$,*$#659#0:,#$,"#$

:1:2G:FG#$3#127#+

! %/9"#$%"4")+'/()

! %/9"#$%"4")

! %/9"#$%"4"):1;"

! %/9"#$%"4")<')10=";'->

! %/9"#$%"4")?))-$@/)"

! !

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 56: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

E

! !"#$%&$%#'(()$%*%+$,-#$%&-.'%!"#$%&!$'$(

&$%/$.%*%+$,-#$%'*"+0$%(1%.23$%)*+$%&!$

! 4*"%"(&%#5$*.$%*%#(".$6.%&-.'%!")(,)-$.($

! 78".-9$%:;<%35(,-+$)%*%)-930-1-$+%-".$51*#$%

1(5%#5$*.-"/%*%#(".$6.=

! !"+.'$(#$%&!$)/"0(

! !"+.1$(#$%&!$

! :"+%.'$%8)$180=

! !"+.)2//3$#$%&!$

! !"#$%&$%'*,$%*%#(".$6.%>)*!/0($,(?%#*"%

*00(#*.$%9$9(52@%#*00%*%A;B%18"#.-("%$.#C%%

! 4(".$6.%-)%-930-#-.02%*))(#-*.$+%&-.'%#5$*.-"/%

.'5$*+

! D(%)2"#'5("-E$%*00%.'5$*+)%>4;B%'().%&-.'%

A;B%.'5$*+)?%#*00%!")(,140!2-/0&5$

! F*-.)%1(5%*00%A;B%.*)G)%.(%1-"-)'%

! :00(#*.$HI5$$%9$9(52=

! !"6$7899/!:;!"6$7<-$$

! <"-.-*0-E$%9$9(52=

! !"6$73$(

! 4(32%9$9(52=

! !"6$7!=4>(/#:;!"6$7!=4#(/>:;

!"6$7!=4#(/#

! F'$"%*00(#*.-"/%9$9(52%1(5%.'$%2/3(@%#*"%

8)$%!"##$% H%&'( H%!!")

! !5%8)$%!"6$7899/!>/3(@%!"6$7<-$$>/3(

! D'$)$%18"#.-(")%*00(#*.$%'().%9$9(52%.'*.%-)%

)"*'+#$%,'-

! ;$51(59*"#$%-935(,$+%1(5%#(32%.(H15(9%

3*/$J0(#G$+%'().%9$9(52

! :00(#*.$HI5$$%9$9(52=

! !"+.6.99/!@%!"+.<-$$

! <"-.-*0-E$%9$9(52=

! !"+.6$73$(

! 4(32%9$9(52=

! !"+.6$7!=4

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 57: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

E

! !"#$%&$%#'(()$%*%+$,-#$%&-.'%!"#$%&!$'$(

&$%/$.%*%+$,-#$%'*"+0$%(1%.23$%)*+$%&!$

! 4*"%"(&%#5$*.$%*%#(".$6.%&-.'%!")(,)-$.($

! 78".-9$%:;<%35(,-+$)%*%)-930-1-$+%-".$51*#$%

1(5%#5$*.-"/%*%#(".$6.=

! !"+.'$(#$%&!$)/"0(

! !"+.1$(#$%&!$

! :"+%.'$%8)$180=

! !"+.)2//3$#$%&!$

! !"#$%&$%'*,$%*%#(".$6.%>)*!/0($,(?%#*"%

*00(#*.$%9$9(52@%#*00%*%A;B%18"#.-("%$.#C%%

! 4(".$6.%-)%-930-#-.02%*))(#-*.$+%&-.'%#5$*.-"/%

.'5$*+

! D(%)2"#'5("-E$%*00%.'5$*+)%>4;B%'().%&-.'%

A;B%.'5$*+)?%#*00%!")(,140!2-/0&5$

! F*-.)%1(5%*00%A;B%.*)G)%.(%1-"-)'%

! :00(#*.$HI5$$%9$9(52=

! !"6$7899/!:;!"6$7<-$$

! <"-.-*0-E$%9$9(52=

! !"6$73$(

! 4(32%9$9(52=

! !"6$7!=4>(/#:;!"6$7!=4#(/>:;

!"6$7!=4#(/#

! F'$"%*00(#*.-"/%9$9(52%1(5%.'$%2/3(@%#*"%

8)$%!"##$% H%&'( H%!!")

! !5%8)$%!"6$7899/!>/3(@%!"6$7<-$$>/3(

! D'$)$%18"#.-(")%*00(#*.$%'().%9$9(52%.'*.%-)%

)"*'+#$%,'-

! ;$51(59*"#$%-935(,$+%1(5%#(32%.(H15(9%

3*/$J0(#G$+%'().%9$9(52

! :00(#*.$HI5$$%9$9(52=

! !"+.6.99/!@%!"+.<-$$

! <"-.-*0-E$%9$9(52=

! !"+.6$73$(

! 4(32%9$9(52=

! !"+.6$7!=4

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 58: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

28M02: High Performance Computing with CUDA

Device ManagementDevice Management

CPU can query and select GPU devices

cudaGetDeviceCount( int* count )

cudaSetDevice( int device )

cudaGetDevice( int *current_device )

cudaGetDeviceProperties( cudaDeviceProp* prop,

int device )

cudaChooseDevice( int *device, cudaDeviceProp* prop )

Multi-GPU setup:

device 0 is used by default

one CPU thread can control one GPUmultiple CPU threads can control the same GPU

– calls are serialized by the driver

API

Tuesday, January 13, 2009

Page 59: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

E

! !"#$%&$%#'(()$%*%+$,-#$%&-.'%!"#$%&!$'$(

&$%/$.%*%+$,-#$%'*"+0$%(1%.23$%)*+$%&!$

! 4*"%"(&%#5$*.$%*%#(".$6.%&-.'%!")(,)-$.($

! 78".-9$%:;<%35(,-+$)%*%)-930-1-$+%-".$51*#$%

1(5%#5$*.-"/%*%#(".$6.=

! !"+.'$(#$%&!$)/"0(

! !"+.1$(#$%&!$

! :"+%.'$%8)$180=

! !"+.)2//3$#$%&!$

! !"#$%&$%'*,$%*%#(".$6.%>)*!/0($,(?%#*"%

*00(#*.$%9$9(52@%#*00%*%A;B%18"#.-("%$.#C%%

! 4(".$6.%-)%-930-#-.02%*))(#-*.$+%&-.'%#5$*.-"/%

.'5$*+

! D(%)2"#'5("-E$%*00%.'5$*+)%>4;B%'().%&-.'%

A;B%.'5$*+)?%#*00%!")(,140!2-/0&5$

! F*-.)%1(5%*00%A;B%.*)G)%.(%1-"-)'%

! :00(#*.$HI5$$%9$9(52=

! !"6$7899/!:;!"6$7<-$$

! <"-.-*0-E$%9$9(52=

! !"6$73$(

! 4(32%9$9(52=

! !"6$7!=4>(/#:;!"6$7!=4#(/>:;

!"6$7!=4#(/#

! F'$"%*00(#*.-"/%9$9(52%1(5%.'$%2/3(@%#*"%

8)$%!"##$% H%&'( H%!!")

! !5%8)$%!"6$7899/!>/3(@%!"6$7<-$$>/3(

! D'$)$%18"#.-(")%*00(#*.$%'().%9$9(52%.'*.%-)%

)"*'+#$%,'-

! ;$51(59*"#$%-935(,$+%1(5%#(32%.(H15(9%

3*/$J0(#G$+%'().%9$9(52

! :00(#*.$HI5$$%9$9(52=

! !"+.6.99/!@%!"+.<-$$

! <"-.-*0-E$%9$9(52=

! !"+.6$73$(

! 4(32%9$9(52=

! !"+.6$7!=4

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 60: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

E

! !"#$%&$%#'(()$%*%+$,-#$%&-.'%!"#$%&!$'$(

&$%/$.%*%+$,-#$%'*"+0$%(1%.23$%)*+$%&!$

! 4*"%"(&%#5$*.$%*%#(".$6.%&-.'%!")(,)-$.($

! 78".-9$%:;<%35(,-+$)%*%)-930-1-$+%-".$51*#$%

1(5%#5$*.-"/%*%#(".$6.=

! !"+.'$(#$%&!$)/"0(

! !"+.1$(#$%&!$

! :"+%.'$%8)$180=

! !"+.)2//3$#$%&!$

! !"#$%&$%'*,$%*%#(".$6.%>)*!/0($,(?%#*"%

*00(#*.$%9$9(52@%#*00%*%A;B%18"#.-("%$.#C%%

! 4(".$6.%-)%-930-#-.02%*))(#-*.$+%&-.'%#5$*.-"/%

.'5$*+

! D(%)2"#'5("-E$%*00%.'5$*+)%>4;B%'().%&-.'%

A;B%.'5$*+)?%#*00%!")(,140!2-/0&5$

! F*-.)%1(5%*00%A;B%.*)G)%.(%1-"-)'%

! :00(#*.$HI5$$%9$9(52=

! !"6$7899/!:;!"6$7<-$$

! <"-.-*0-E$%9$9(52=

! !"6$73$(

! 4(32%9$9(52=

! !"6$7!=4>(/#:;!"6$7!=4#(/>:;

!"6$7!=4#(/#

! F'$"%*00(#*.-"/%9$9(52%1(5%.'$%2/3(@%#*"%

8)$%!"##$% H%&'( H%!!")

! !5%8)$%!"6$7899/!>/3(@%!"6$7<-$$>/3(

! D'$)$%18"#.-(")%*00(#*.$%'().%9$9(52%.'*.%-)%

)"*'+#$%,'-

! ;$51(59*"#$%-935(,$+%1(5%#(32%.(H15(9%

3*/$J0(#G$+%'().%9$9(52

! :00(#*.$HI5$$%9$9(52=

! !"+.6.99/!@%!"+.<-$$

! <"-.-*0-E$%9$9(52=

! !"+.6$73$(

! 4(32%9$9(52=

! !"+.6$7!=4

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 61: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

E

! !"#$%&$%#'(()$%*%+$,-#$%&-.'%!"#$%&!$'$(

&$%/$.%*%+$,-#$%'*"+0$%(1%.23$%)*+$%&!$

! 4*"%"(&%#5$*.$%*%#(".$6.%&-.'%!")(,)-$.($

! 78".-9$%:;<%35(,-+$)%*%)-930-1-$+%-".$51*#$%

1(5%#5$*.-"/%*%#(".$6.=

! !"+.'$(#$%&!$)/"0(

! !"+.1$(#$%&!$

! :"+%.'$%8)$180=

! !"+.)2//3$#$%&!$

! !"#$%&$%'*,$%*%#(".$6.%>)*!/0($,(?%#*"%

*00(#*.$%9$9(52@%#*00%*%A;B%18"#.-("%$.#C%%

! 4(".$6.%-)%-930-#-.02%*))(#-*.$+%&-.'%#5$*.-"/%

.'5$*+

! D(%)2"#'5("-E$%*00%.'5$*+)%>4;B%'().%&-.'%

A;B%.'5$*+)?%#*00%!")(,140!2-/0&5$

! F*-.)%1(5%*00%A;B%.*)G)%.(%1-"-)'%

! :00(#*.$HI5$$%9$9(52=

! !"6$7899/!:;!"6$7<-$$

! <"-.-*0-E$%9$9(52=

! !"6$73$(

! 4(32%9$9(52=

! !"6$7!=4>(/#:;!"6$7!=4#(/>:;

!"6$7!=4#(/#

! F'$"%*00(#*.-"/%9$9(52%1(5%.'$%2/3(@%#*"%

8)$%!"##$% H%&'( H%!!")

! !5%8)$%!"6$7899/!>/3(@%!"6$7<-$$>/3(

! D'$)$%18"#.-(")%*00(#*.$%'().%9$9(52%.'*.%-)%

)"*'+#$%,'-

! ;$51(59*"#$%-935(,$+%1(5%#(32%.(H15(9%

3*/$J0(#G$+%'().%9$9(52

! :00(#*.$HI5$$%9$9(52=

! !"+.6.99/!@%!"+.<-$$

! <"-.-*0-E$%9$9(52=

! !"+.6$73$(

! 4(32%9$9(52=

! !"+.6$7!=4

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 62: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

E

! !"#$%&$%#'(()$%*%+$,-#$%&-.'%!"#$%&!$'$(

&$%/$.%*%+$,-#$%'*"+0$%(1%.23$%)*+$%&!$

! 4*"%"(&%#5$*.$%*%#(".$6.%&-.'%!")(,)-$.($

! 78".-9$%:;<%35(,-+$)%*%)-930-1-$+%-".$51*#$%

1(5%#5$*.-"/%*%#(".$6.=

! !"+.'$(#$%&!$)/"0(

! !"+.1$(#$%&!$

! :"+%.'$%8)$180=

! !"+.)2//3$#$%&!$

! !"#$%&$%'*,$%*%#(".$6.%>)*!/0($,(?%#*"%

*00(#*.$%9$9(52@%#*00%*%A;B%18"#.-("%$.#C%%

! 4(".$6.%-)%-930-#-.02%*))(#-*.$+%&-.'%#5$*.-"/%

.'5$*+

! D(%)2"#'5("-E$%*00%.'5$*+)%>4;B%'().%&-.'%

A;B%.'5$*+)?%#*00%!")(,140!2-/0&5$

! F*-.)%1(5%*00%A;B%.*)G)%.(%1-"-)'%

! :00(#*.$HI5$$%9$9(52=

! !"6$7899/!:;!"6$7<-$$

! <"-.-*0-E$%9$9(52=

! !"6$73$(

! 4(32%9$9(52=

! !"6$7!=4>(/#:;!"6$7!=4#(/>:;

!"6$7!=4#(/#

! F'$"%*00(#*.-"/%9$9(52%1(5%.'$%2/3(@%#*"%

8)$%!"##$% H%&'( H%!!")

! !5%8)$%!"6$7899/!>/3(@%!"6$7<-$$>/3(

! D'$)$%18"#.-(")%*00(#*.$%'().%9$9(52%.'*.%-)%

)"*'+#$%,'-

! ;$51(59*"#$%-935(,$+%1(5%#(32%.(H15(9%

3*/$J0(#G$+%'().%9$9(52

! :00(#*.$HI5$$%9$9(52=

! !"+.6.99/!@%!"+.<-$$

! <"-.-*0-E$%9$9(52=

! !"+.6$73$(

! 4(32%9$9(52=

! !"+.6$7!=4

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 63: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

#

! !"#$%&''(!!"#$%"&''(%&$#"!"#$%&)#)(*+! ,&-"&'.("&''(%&$#"%&&%')#)(*+"/012

! 3**&+."&*#"%*#&$#4"56$7"&".8#%696%"564$7"&-4"7#6:7$"&-4"#'#)#-$"$+8#

! ;#)(*+"'&+(<$"6."(8$6)6=#4"/#>:>"8&%?6-:2"@+"*<-$6)#

! !"&))*+,)$*-$! !"&))*+.$/-)(+! !"#$%!0+.-(&! !"#$%!0+1-(&!"#

! 3")(4<'#"6."&"@'(@"(9"ABC"%(4#D4&$&"&'(-:"

56$7".()#"$+8#"6-9(*)&$6(-

! >%<@6- 96'#.

! 3")(4<'#"6."%*#&$#4"@+"'(&46-:"&"%<@6- 56$7"

!"#(2"'$,)$*-$ (*"!"#(2"'$3(*2.*-*

! ;(4<'#"%&-"@#"<-'(&4#4"56$7"

!"#(2"'$45'(*2

! E(&46-:"&")(4<'#"&'.("%(86#."6$"$("$7#"4#F6%#

! ,&-"$7#-":#$"$7#"&44*#.."(9"9<-%$6(-."&-4"

:'(@&'"F&*6&@'#.G

!"#(2"'$6$-7"5!-8(5

!"#(2"'$6$-6'(9*'

!"#(2"'$6$-:$;<$=

! H-%#"&")(4<'#"6."'(&4#4!"&-4"5#"7&F#"&"

9<-%$6(-"8(6-$#*!"5#"%&-"%&''"&"9<-%$6(-

! I#")<.$".#$<8"$7#"!"!#$%&'()!(*&+'(,!(%)

96*.$

! JK#%<$6(-"#-F6*(-)#-$"6-%'<4#.G

" L7*#&4"M'(%?"N6=#

" N7&*#4";#)(*+"N6=#

" O<-%$6(-"B&*&)#$#*.

" A*64"N6=#

! L7*#&4"M'(%?"N6=#G"

!"7"5!>$-?'(!@>A*0$

! N7&*#4";#)(*+"N6=#G

!"7"5!>$->A*)$2>8B$

! O<-%$6(-"B&*&)#$#*.G

!"C*)*%>$->8B$DE!"C*)*%>$-8DE

!"C*)*%>$-=DE!"C*)*%>$-F

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 64: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

#

! !"#$%&''(!!"#$%"&''(%&$#"!"#$%&)#)(*+! ,&-"&'.("&''(%&$#"%&&%')#)(*+"/012

! 3**&+."&*#"%*#&$#4"56$7"&".8#%696%"564$7"&-4"7#6:7$"&-4"#'#)#-$"$+8#

! ;#)(*+"'&+(<$"6."(8$6)6=#4"/#>:>"8&%?6-:2"@+"*<-$6)#

! !"&))*+,)$*-$! !"&))*+.$/-)(+! !"#$%!0+.-(&! !"#$%!0+1-(&!"#

! 3")(4<'#"6."&"@'(@"(9"ABC"%(4#D4&$&"&'(-:"

56$7".()#"$+8#"6-9(*)&$6(-

! >%<@6- 96'#.

! 3")(4<'#"6."%*#&$#4"@+"'(&46-:"&"%<@6- 56$7"

!"#(2"'$,)$*-$ (*"!"#(2"'$3(*2.*-*

! ;(4<'#"%&-"@#"<-'(&4#4"56$7"

!"#(2"'$45'(*2

! E(&46-:"&")(4<'#"&'.("%(86#."6$"$("$7#"4#F6%#

! ,&-"$7#-":#$"$7#"&44*#.."(9"9<-%$6(-."&-4"

:'(@&'"F&*6&@'#.G

!"#(2"'$6$-7"5!-8(5

!"#(2"'$6$-6'(9*'

!"#(2"'$6$-:$;<$=

! H-%#"&")(4<'#"6."'(&4#4!"&-4"5#"7&F#"&"

9<-%$6(-"8(6-$#*!"5#"%&-"%&''"&"9<-%$6(-

! I#")<.$".#$<8"$7#"!"!#$%&'()!(*&+'(,!(%)

96*.$

! JK#%<$6(-"#-F6*(-)#-$"6-%'<4#.G

" L7*#&4"M'(%?"N6=#

" N7&*#4";#)(*+"N6=#

" O<-%$6(-"B&*&)#$#*.

" A*64"N6=#

! L7*#&4"M'(%?"N6=#G"

!"7"5!>$-?'(!@>A*0$

! N7&*#4";#)(*+"N6=#G

!"7"5!>$->A*)$2>8B$

! O<-%$6(-"B&*&)#$#*.G

!"C*)*%>$->8B$DE!"C*)*%>$-8DE

!"C*)*%>$-=DE!"C*)*%>$-F

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 65: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

#

! !"#$%&''(!!"#$%"&''(%&$#"!"#$%&)#)(*+! ,&-"&'.("&''(%&$#"%&&%')#)(*+"/012

! 3**&+."&*#"%*#&$#4"56$7"&".8#%696%"564$7"&-4"7#6:7$"&-4"#'#)#-$"$+8#

! ;#)(*+"'&+(<$"6."(8$6)6=#4"/#>:>"8&%?6-:2"@+"*<-$6)#

! !"&))*+,)$*-$! !"&))*+.$/-)(+! !"#$%!0+.-(&! !"#$%!0+1-(&!"#

! 3")(4<'#"6."&"@'(@"(9"ABC"%(4#D4&$&"&'(-:"

56$7".()#"$+8#"6-9(*)&$6(-

! >%<@6- 96'#.

! 3")(4<'#"6."%*#&$#4"@+"'(&46-:"&"%<@6- 56$7"

!"#(2"'$,)$*-$ (*"!"#(2"'$3(*2.*-*

! ;(4<'#"%&-"@#"<-'(&4#4"56$7"

!"#(2"'$45'(*2

! E(&46-:"&")(4<'#"&'.("%(86#."6$"$("$7#"4#F6%#

! ,&-"$7#-":#$"$7#"&44*#.."(9"9<-%$6(-."&-4"

:'(@&'"F&*6&@'#.G

!"#(2"'$6$-7"5!-8(5

!"#(2"'$6$-6'(9*'

!"#(2"'$6$-:$;<$=

! H-%#"&")(4<'#"6."'(&4#4!"&-4"5#"7&F#"&"

9<-%$6(-"8(6-$#*!"5#"%&-"%&''"&"9<-%$6(-

! I#")<.$".#$<8"$7#"!"!#$%&'()!(*&+'(,!(%)

96*.$

! JK#%<$6(-"#-F6*(-)#-$"6-%'<4#.G

" L7*#&4"M'(%?"N6=#

" N7&*#4";#)(*+"N6=#

" O<-%$6(-"B&*&)#$#*.

" A*64"N6=#

! L7*#&4"M'(%?"N6=#G"

!"7"5!>$-?'(!@>A*0$

! N7&*#4";#)(*+"N6=#G

!"7"5!>$->A*)$2>8B$

! O<-%$6(-"B&*&)#$#*.G

!"C*)*%>$->8B$DE!"C*)*%>$-8DE

!"C*)*%>$-=DE!"C*)*%>$-F

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 66: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

#

! !"#$%&''(!!"#$%"&''(%&$#"!"#$%&)#)(*+! ,&-"&'.("&''(%&$#"%&&%')#)(*+"/012

! 3**&+."&*#"%*#&$#4"56$7"&".8#%696%"564$7"&-4"7#6:7$"&-4"#'#)#-$"$+8#

! ;#)(*+"'&+(<$"6."(8$6)6=#4"/#>:>"8&%?6-:2"@+"*<-$6)#

! !"&))*+,)$*-$! !"&))*+.$/-)(+! !"#$%!0+.-(&! !"#$%!0+1-(&!"#

! 3")(4<'#"6."&"@'(@"(9"ABC"%(4#D4&$&"&'(-:"

56$7".()#"$+8#"6-9(*)&$6(-

! >%<@6- 96'#.

! 3")(4<'#"6."%*#&$#4"@+"'(&46-:"&"%<@6- 56$7"

!"#(2"'$,)$*-$ (*"!"#(2"'$3(*2.*-*

! ;(4<'#"%&-"@#"<-'(&4#4"56$7"

!"#(2"'$45'(*2

! E(&46-:"&")(4<'#"&'.("%(86#."6$"$("$7#"4#F6%#

! ,&-"$7#-":#$"$7#"&44*#.."(9"9<-%$6(-."&-4"

:'(@&'"F&*6&@'#.G

!"#(2"'$6$-7"5!-8(5

!"#(2"'$6$-6'(9*'

!"#(2"'$6$-:$;<$=

! H-%#"&")(4<'#"6."'(&4#4!"&-4"5#"7&F#"&"

9<-%$6(-"8(6-$#*!"5#"%&-"%&''"&"9<-%$6(-

! I#")<.$".#$<8"$7#"!"!#$%&'()!(*&+'(,!(%)

96*.$

! JK#%<$6(-"#-F6*(-)#-$"6-%'<4#.G

" L7*#&4"M'(%?"N6=#

" N7&*#4";#)(*+"N6=#

" O<-%$6(-"B&*&)#$#*.

" A*64"N6=#

! L7*#&4"M'(%?"N6=#G"

!"7"5!>$-?'(!@>A*0$

! N7&*#4";#)(*+"N6=#G

!"7"5!>$->A*)$2>8B$

! O<-%$6(-"B&*&)#$#*.G

!"C*)*%>$->8B$DE!"C*)*%>$-8DE

!"C*)*%>$-=DE!"C*)*%>$-F

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 67: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

#

! !"#$%&''(!!"#$%"&''(%&$#"!"#$%&)#)(*+! ,&-"&'.("&''(%&$#"%&&%')#)(*+"/012

! 3**&+."&*#"%*#&$#4"56$7"&".8#%696%"564$7"&-4"7#6:7$"&-4"#'#)#-$"$+8#

! ;#)(*+"'&+(<$"6."(8$6)6=#4"/#>:>"8&%?6-:2"@+"*<-$6)#

! !"&))*+,)$*-$! !"&))*+.$/-)(+! !"#$%!0+.-(&! !"#$%!0+1-(&!"#

! 3")(4<'#"6."&"@'(@"(9"ABC"%(4#D4&$&"&'(-:"

56$7".()#"$+8#"6-9(*)&$6(-

! >%<@6- 96'#.

! 3")(4<'#"6."%*#&$#4"@+"'(&46-:"&"%<@6- 56$7"

!"#(2"'$,)$*-$ (*"!"#(2"'$3(*2.*-*

! ;(4<'#"%&-"@#"<-'(&4#4"56$7"

!"#(2"'$45'(*2

! E(&46-:"&")(4<'#"&'.("%(86#."6$"$("$7#"4#F6%#

! ,&-"$7#-":#$"$7#"&44*#.."(9"9<-%$6(-."&-4"

:'(@&'"F&*6&@'#.G

!"#(2"'$6$-7"5!-8(5

!"#(2"'$6$-6'(9*'

!"#(2"'$6$-:$;<$=

! H-%#"&")(4<'#"6."'(&4#4!"&-4"5#"7&F#"&"

9<-%$6(-"8(6-$#*!"5#"%&-"%&''"&"9<-%$6(-

! I#")<.$".#$<8"$7#"!"!#$%&'()!(*&+'(,!(%)

96*.$

! JK#%<$6(-"#-F6*(-)#-$"6-%'<4#.G

" L7*#&4"M'(%?"N6=#

" N7&*#4";#)(*+"N6=#

" O<-%$6(-"B&*&)#$#*.

" A*64"N6=#

! L7*#&4"M'(%?"N6=#G"

!"7"5!>$-?'(!@>A*0$

! N7&*#4";#)(*+"N6=#G

!"7"5!>$->A*)$2>8B$

! O<-%$6(-"B&*&)#$#*.G

!"C*)*%>$->8B$DE!"C*)*%>$-8DE

!"C*)*%>$-=DE!"C*)*%>$-F!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 68: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

#

! !"#$%&''(!!"#$%"&''(%&$#"!"#$%&)#)(*+! ,&-"&'.("&''(%&$#"%&&%')#)(*+"/012

! 3**&+."&*#"%*#&$#4"56$7"&".8#%696%"564$7"&-4"7#6:7$"&-4"#'#)#-$"$+8#

! ;#)(*+"'&+(<$"6."(8$6)6=#4"/#>:>"8&%?6-:2"@+"*<-$6)#

! !"&))*+,)$*-$! !"&))*+.$/-)(+! !"#$%!0+.-(&! !"#$%!0+1-(&!"#

! 3")(4<'#"6."&"@'(@"(9"ABC"%(4#D4&$&"&'(-:"

56$7".()#"$+8#"6-9(*)&$6(-

! >%<@6- 96'#.

! 3")(4<'#"6."%*#&$#4"@+"'(&46-:"&"%<@6- 56$7"

!"#(2"'$,)$*-$ (*"!"#(2"'$3(*2.*-*

! ;(4<'#"%&-"@#"<-'(&4#4"56$7"

!"#(2"'$45'(*2

! E(&46-:"&")(4<'#"&'.("%(86#."6$"$("$7#"4#F6%#

! ,&-"$7#-":#$"$7#"&44*#.."(9"9<-%$6(-."&-4"

:'(@&'"F&*6&@'#.G

!"#(2"'$6$-7"5!-8(5

!"#(2"'$6$-6'(9*'

!"#(2"'$6$-:$;<$=

! H-%#"&")(4<'#"6."'(&4#4!"&-4"5#"7&F#"&"

9<-%$6(-"8(6-$#*!"5#"%&-"%&''"&"9<-%$6(-

! I#")<.$".#$<8"$7#"!"!#$%&'()!(*&+'(,!(%)

96*.$

! JK#%<$6(-"#-F6*(-)#-$"6-%'<4#.G

" L7*#&4"M'(%?"N6=#

" N7&*#4";#)(*+"N6=#

" O<-%$6(-"B&*&)#$#*.

" A*64"N6=#

! L7*#&4"M'(%?"N6=#G"

!"7"5!>$-?'(!@>A*0$

! N7&*#4";#)(*+"N6=#G

!"7"5!>$->A*)$2>8B$

! O<-%$6(-"B&*&)#$#*.G

!"C*)*%>$->8B$DE!"C*)*%>$-8DE

!"C*)*%>$-=DE!"C*)*%>$-F!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 69: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

'

! !"#$%&#'(%#)%)(*%+*%*,(%)+-(%*#-(%+)%*,(%

./01*#20%#0321+*#204

!"#$"%!&'()*

! +,!$--. !"#$%&#'()*+,#-*#%."#&+*/#01*#2223444#

'&"%0(5"#("65%.0(5"#758*9.059:

! 5,(%12-6#7("%8(0("+*()%1+77)%*2%+77%$(3#1(%9:;%

*2%)(*/6%*,(%(<(1/*#20%(03#"20-(0*

! 9%)*"(+-%#)%+%)(=/(01(%2.%26("+*#20)%*,+*%

211/"%#0%2"$("%%>?8?

@? A26B%$+*+%."2-%,2)*%*2%$(3#1(

C? ><(1/*(%$(3#1(%./01*#20%

D? A26B%$+*+%."2-%$(3#1(%*2%,2)*

! 9%)*"(+-%#)%+%)(=/(01(%2.%26("+*#20)%*,+*%

211/"%#0%2"$("%%>?8?

@? A26B%$+*+%."2-%,2)*%*2%$(3#1(

C? ><(1/*(%$(3#1(%./01*#20%

D? A26B%$+*+%."2-%$(3#1(%*2%,2)*

! 9%)*"(+-%#)%+%)(=/(01(%2.%26("+*#20)%*,+*%

211/"%#0%2"$("

! E#..("(0*%)*"(+-)%1+0%F(%/)($%*2%-+0+8(%

1201/""(01B%%>?8?

G3("7+66#08%-(-2"B%126B%."2-%20(%)*"(+-%H#*,%*,(%./01*#20%(<(1/*#20%."2-%+02*,("

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 70: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

'

! !"#$%&#'(%#)%)(*%+*%*,(%)+-(%*#-(%+)%*,(%

./01*#20%#0321+*#204

!"#$"%!&'()*

! +,!$--. !"#$%&#'()*+,#-*#%."#&+*/#01*#2223444#

'&"%0(5"#("65%.0(5"#758*9.059:

! 5,(%12-6#7("%8(0("+*()%1+77)%*2%+77%$(3#1(%9:;%

*2%)(*/6%*,(%(<(1/*#20%(03#"20-(0*

! 9%)*"(+-%#)%+%)(=/(01(%2.%26("+*#20)%*,+*%

211/"%#0%2"$("%%>?8?

@? A26B%$+*+%."2-%,2)*%*2%$(3#1(

C? ><(1/*(%$(3#1(%./01*#20%

D? A26B%$+*+%."2-%$(3#1(%*2%,2)*

! 9%)*"(+-%#)%+%)(=/(01(%2.%26("+*#20)%*,+*%

211/"%#0%2"$("%%>?8?

@? A26B%$+*+%."2-%,2)*%*2%$(3#1(

C? ><(1/*(%$(3#1(%./01*#20%

D? A26B%$+*+%."2-%$(3#1(%*2%,2)*

! 9%)*"(+-%#)%+%)(=/(01(%2.%26("+*#20)%*,+*%

211/"%#0%2"$("

! E#..("(0*%)*"(+-)%1+0%F(%/)($%*2%-+0+8(%

1201/""(01B%%>?8?

G3("7+66#08%-(-2"B%126B%."2-%20(%)*"(+-%H#*,%*,(%./01*#20%(<(1/*#20%."2-%+02*,("

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 71: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

'

! !"#$%&#'(%#)%)(*%+*%*,(%)+-(%*#-(%+)%*,(%

./01*#20%#0321+*#204

!"#$"%!&'()*

! +,!$--. !"#$%&#'()*+,#-*#%."#&+*/#01*#2223444#

'&"%0(5"#("65%.0(5"#758*9.059:

! 5,(%12-6#7("%8(0("+*()%1+77)%*2%+77%$(3#1(%9:;%

*2%)(*/6%*,(%(<(1/*#20%(03#"20-(0*

! 9%)*"(+-%#)%+%)(=/(01(%2.%26("+*#20)%*,+*%

211/"%#0%2"$("%%>?8?

@? A26B%$+*+%."2-%,2)*%*2%$(3#1(

C? ><(1/*(%$(3#1(%./01*#20%

D? A26B%$+*+%."2-%$(3#1(%*2%,2)*

! 9%)*"(+-%#)%+%)(=/(01(%2.%26("+*#20)%*,+*%

211/"%#0%2"$("%%>?8?

@? A26B%$+*+%."2-%,2)*%*2%$(3#1(

C? ><(1/*(%$(3#1(%./01*#20%

D? A26B%$+*+%."2-%$(3#1(%*2%,2)*

! 9%)*"(+-%#)%+%)(=/(01(%2.%26("+*#20)%*,+*%

211/"%#0%2"$("

! E#..("(0*%)*"(+-)%1+0%F(%/)($%*2%-+0+8(%

1201/""(01B%%>?8?

G3("7+66#08%-(-2"B%126B%."2-%20(%)*"(+-%H#*,%*,(%./01*#20%(<(1/*#20%."2-%+02*,("

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 72: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

'

! !"#$%&#'(%#)%)(*%+*%*,(%)+-(%*#-(%+)%*,(%

./01*#20%#0321+*#204

!"#$"%!&'()*

! +,!$--. !"#$%&#'()*+,#-*#%."#&+*/#01*#2223444#

'&"%0(5"#("65%.0(5"#758*9.059:

! 5,(%12-6#7("%8(0("+*()%1+77)%*2%+77%$(3#1(%9:;%

*2%)(*/6%*,(%(<(1/*#20%(03#"20-(0*

! 9%)*"(+-%#)%+%)(=/(01(%2.%26("+*#20)%*,+*%

211/"%#0%2"$("%%>?8?

@? A26B%$+*+%."2-%,2)*%*2%$(3#1(

C? ><(1/*(%$(3#1(%./01*#20%

D? A26B%$+*+%."2-%$(3#1(%*2%,2)*

! 9%)*"(+-%#)%+%)(=/(01(%2.%26("+*#20)%*,+*%

211/"%#0%2"$("%%>?8?

@? A26B%$+*+%."2-%,2)*%*2%$(3#1(

C? ><(1/*(%$(3#1(%./01*#20%

D? A26B%$+*+%."2-%$(3#1(%*2%,2)*

! 9%)*"(+-%#)%+%)(=/(01(%2.%26("+*#20)%*,+*%

211/"%#0%2"$("

! E#..("(0*%)*"(+-)%1+0%F(%/)($%*2%-+0+8(%

1201/""(01B%%>?8?

G3("7+66#08%-(-2"B%126B%."2-%20(%)*"(+-%H#*,%*,(%./01*#20%(<(1/*#20%."2-%+02*,("

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 73: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

DEFFE/$$0

F

!"#$%&"'()'*%+,-./0'1"+%#$234

! *56-'-./'.,&$'783

! 9$&",:;

! <="4$;

! >3:"83&?'@4"'62;#%;;234

! A"#$3&BA"#$3&'-++2$234

! A"#$3&BA"#$3&'/44"&'.&3+%#$

" 1"+%#$234

! -';$&",:'2;',';"C%"4#"'3D'3E"&,$234;'$F,$'

3##%&'24'3&+"&''<GHG

IG *3EJ'+,$,'D&3:'F3;$'$3'+"=2#"

KG <L"#%$"'+"=2#"'D%4#$234'

MG *3EJ'+,$,'D&3:'+"=2#"'$3'F3;$

! -';$&",:'2;',';"C%"4#"'3D'3E"&,$234;'$F,$'

3##%&'24'3&+"&

! 62DD"&"4$';$&",:;'#,4'N"'%;"+'$3':,4,H"'

#34#%&&"4#J''<GHG

@="&O,EE24H':":3&J'#3EJ'D&3:'34"';$&",:'82$F'$F"'D%4#$234'"L"#%$234'D&3:',43$F"&

! <="4$;',&"','8,J'3D'+"$"&:2424H'$F"'E&3H&";;'

3D',';$&",:

! !"#$%&'()*"+,#'-'./-)0#)1'+$'-'&%)#-/'-%'-'

;E"#2D2#'E3;2$234

! -'F3O+"&'3D',4'"="4$'F,4+O"'#,4)

! P,2$'D3&',4'"="4$'$3'3##%&

! Q",;%&"'$F"'$2:"'$F,$'3##%&&"+'N"$8""4'$83'

"="4$;

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

API

Tuesday, January 13, 2009

Page 74: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

CUDA Execution and Threading Model

IAP09 CUDA@MIT / 6.963

Tuesday, January 13, 2009

Page 75: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Execution Model

Software Hardware

Threads are executed by thread processors

Thread

Thread Processor

Thread Block Multiprocessor

Thread blocks are executed on multiprocessors

Thread blocks do not migrate

Several concurrent thread blocks can reside on one multiprocessor - limited by multiprocessor resources (shared memory and register file)

...

Grid Device

A kernel is launched as a grid of thread blocks

Only one kernel can execute on a device at one time

Threading

Tuesday, January 13, 2009

Page 76: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

9M02: High Performance Computing with CUDA

CUDA Uses Extensive MultithreadingCUDA Uses Extensive Multithreading

• CUDA threads express fine-grained data parallelism– Map threads to GPU threads or CPU vector elements

– Virtualize the processors

– You must rethink your algorithms to be aggressively parallel

• CUDA thread blocks express coarse-grained parallelism– Map blocks to GPU thread arrays or CPU threads

– Scale transparently to any number of processors

• GPUs execute thousands of lightweight threads– One DX10 graphics thread computes one pixel fragment

– One CUDA thread computes one result (or several results)

– Provide hardware multithreading & zero-overhead scheduling

Threading

Tuesday, January 13, 2009

Page 77: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

4M02: High Performance Computing with CUDA

CUDA Programming ModelCUDA Programming Model

Parallel code (kernel) is launched and executed on a

device by many threads

Threads are grouped into thread blocks

Parallel code is written for a thread

Each thread is free to execute a unique code path

Built-in thread and block ID variables

Threading

Tuesday, January 13, 2009

Page 78: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

5M02: High Performance Computing with CUDA

Thread HierarchyThread Hierarchy

Threads launched for a parallel section are

partitioned into thread blocks

Grid = all blocks for a given launch

Thread block is a group of threads that can:

Synchronize their execution

Communicate via shared memory

Threading

Tuesday, January 13, 2009

Page 79: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

6M02: High Performance Computing with CUDA

IDs and DimensionsIDs and Dimensions

Threads:

3D IDs, unique within a block

Blocks:

2D IDs, unique within a grid

Dimensions set at launch time

Can be unique for each section

Built-in variables:

threadIdx, blockIdx

blockDim, gridDim

Device

Grid 1

Block

(0, 0)

Block

(1, 0)

Block

(2, 0)

Block

(0, 1)

Block

(1, 1)

Block

(2, 1)

Block (1, 1)

Thread

(0, 1)

Thread

(1, 1)

Thread

(2, 1)

Thread

(3, 1)

Thread

(4, 1)

Thread

(0, 2)

Thread

(1, 2)

Thread

(2, 2)

Thread

(3, 2)

Thread

(4, 2)

Thread

(0, 0)

Thread

(1, 0)

Thread

(2, 0)

Thread

(3, 0)

Thread

(4, 0)

Threading

Tuesday, January 13, 2009

Page 80: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

© NVIDIA Corporation 2006 3

Programming Model

A kernel is executed as a grid of thread blocks

A thread block is a batch of threads that can cooperate with each other by:

Sharing data through shared memory

Synchronizing their execution

Threads from different blocks cannot cooperate

Host

Kernel 1

Kernel 2

Device

Grid 1

Block(0, 0)

Block(1, 0)

Block(2, 0)

Block(0, 1)

Block(1, 1)

Block(2, 1)

Grid 2

Block (1, 1)

Thread

(0, 1)

Thread

(1, 1)

Thread

(2, 1)

Thread

(3, 1)

Thread

(4, 1)

Thread

(0, 2)

Thread

(1, 2)

Thread

(2, 2)

Thread

(3, 2)

Thread

(4, 2)

Thread

(0, 0)

Thread

(1, 0)

Thread

(2, 0)

Thread

(3, 0)

Thread

(4, 0)

Threading

Tuesday, January 13, 2009

Page 81: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

10M02: High Performance Computing with CUDA

Blocks must be independentBlocks must be independent

Any possible interleaving of blocks should be valid

presumed to run to completion without pre-emption

can run in any order

can run concurrently OR sequentially

Blocks may coordinate but not synchronize

shared queue pointer: OK

shared lock: BAD … can easily deadlock

Independence requirement gives scalability

Threading

Tuesday, January 13, 2009

Page 82: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

39M02: High Performance Computing with CUDA

Hardware MultithreadingHardware Multithreading

Hardware allocates resources to blocks

blocks need: thread slots, registers, shared

memory

blocks don’t run until resources are available

Hardware schedules threads

threads have their own registers

any thread not waiting for something can run

context switching is free – every cycle

Hardware relies on threads to hide latency

i.e., parallelism is necessary for performance

SP

SharedMemory

MT IU

SM

Threading

Tuesday, January 13, 2009

Page 83: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

40M02: High Performance Computing with CUDA

SIMT Thread ExecutionSIMT Thread Execution

Groups of 32 threads formed into warpsalways executing same instruction

shared instruction fetch/dispatch

some become inactive when code path diverges

hardware automatically handles divergence

Warps are the primitive unit of scheduling

SIMT execution is an implementation choicesharing control logic leaves more space for ALUs

largely invisible to programmer

must understand for performance, not correctness

SP

SharedMemory

MT IU

SM

Threading

Tuesday, January 13, 2009

Page 84: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Transparent Scalability

Kernel grid

Block 2 Block 3

Block 4 Block 5

Block 6 Block 7

Device Device

Block 0 Block 1 Block 2 Block 3

Block 4 Block 5 Block 6 Block 7

Block 0 Block 1

Block 2 Block 3

Block 4 Block 5

Block 6 Block 7

Block 0 Block 1

Hardware is free to schedule thread blocks on any processor

A kernel scales across parallel multiprocessors

Threading

Tuesday, January 13, 2009

Page 85: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

10M02: High Performance Computing with CUDA

GPU Sizes Require CUDA ScalabilityGPU Sizes Require CUDA Scalability

GPU

Interconnection Network

SMC

Geometry Controller

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C -Cache

SFU SFU

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C-Cache

SFU SFU

Texture Unit

Tex L1

SMC

Geometry Controller

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C -Cache

SFU SFU

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C-Cache

SFU SFU

Texture Unit

Tex L1

SMC

Geometry Controller

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C -Cache

SFU SFU

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C-Cache

SFU SFU

Texture Unit

Tex L1

SMC

Geometry Controller

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C -Cache

SFU SFU

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C-Cache

SFU SFU

Texture Unit

Tex L1

SMC

Geometry Controller

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C -Cache

SFU SFU

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C-Cache

SFU SFU

Texture Unit

Tex L1

SMC

Geometry Controller

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C -Cache

SFU SFU

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C-Cache

SFU SFU

Texture Unit

Tex L1

SMC

Geometry Controller

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C -Cache

SFU SFU

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C-Cache

SFU SFU

Texture Unit

Tex L1

SMC

Geometry Controller

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C -Cache

SFU SFU

SP

SharedM emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C-Cache

SFU SFU

Texture Unit

Tex L1

DRAM

ROP L2

DRAM

ROP L2

DRAM

ROP L 2

DRAM

ROP L2

DRAM

ROP L2

DRAM

ROP L2

Bridge System Memory

Work Distribution

Host CPU

SM

SP

SharedMemory

SP

SP SP

SP SP

SP SP

I-Cache

MT Issue

C-Cache

SFU SFU

128 SP Cores

GPU

Interconnection Network

SMC

Geometry Controller

SP

Shared

M emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C-Cache

SFU SFU

SP

Shared

M emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C- Cache

SFU SFU

Texture Unit

Tex L 1

SMC

Geometry Controller

SP

Shared

M emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C-Cache

SFU SFU

SP

Shared

M emory

SP

SP SP

SP SP

SP SP

I-Cache

M T Issue

C-Cache

SFU SFU

Texture Unit

Tex L1

DRAM

ROP L2

DRAM

ROP L2

Bridge Memory

Work Distribution

Host CPU

SM

SP

SharedMemory

SP

SP SP

SP SP

SP SP

I-Cache

MT Issue

C-Cache

SFU SFU

32 SP

Cores

GPU

Bridge System Memory

Work Distribution

`

SM C

` `

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

Textur e U nit

Tex L 1

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

`

SM C

` `

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

Textur e U nit

Tex L 1

SP

D P

SP

SP SP

SP SP

SP SP

I -C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C -C ache

SFU SFU

Shar ed

M em or y

`

SM C

` `

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

Textur e U nit

Tex L 1

SP

D P

SP

SP SP

SP SP

SP SP

I- C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C- C ache

SFU SFU

Shar ed

M em or y

`

SM C

` `

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C -C ache

SFU SFU

Shar ed

M em or y

Textur e U nit

Tex L 1

SP

D P

SP

SP SP

SP SP

SP SP

I- C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

`

SM C

` `

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C -C ache

SFU SFU

Shar ed

M em or y

Textur e U nit

Tex L 1

SP

D P

SP

SP SP

SP SP

SP SP

I- C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

`

SM C

` `

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C- C ache

SFU SFU

Shar ed

M em or y

Textur e U nit

Tex L 1

SP

D P

SP

SP SP

SP SP

SP SP

I- C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

`

SM C

` `

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

Textur e U nit

Tex L 1

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

SP

D P

SP

SP SP

SP SP

SP SP

I -C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

`

SM C

` `

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

Textur e U nit

Tex L 1

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

SP

D P

SP

SP SP

SP SP

SP SP

I- C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

`

SM C

` `

SP

D P

SP

SP SP

SP SP

SP SP

I -C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

Textur e U nit

Tex L 1

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C -C ache

SFU SFU

Shar ed

M em or y

SP

D P

SP

SP SP

SP SP

SP SP

I- C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

`

SM C

` `

SP

D P

SP

SP SP

SP SP

SP SP

I -C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

Textur e U nit

Tex L 1

SP

D P

SP

SP SP

SP SP

SP SP

I - C ache

M T Issue

C -C ache

SFU SFU

Shar ed

M em or y

SP

D P

SP

SP SP

SP SP

SP SP

I- C ache

M T Issue

C - C ache

SFU SFU

Shar ed

M em or y

DRAM

ROP L2

DRAM

ROP L2

DRAM

ROP L2

DRAM

ROP L2

DRAM

ROP L2

DRAM

ROP L 2

DRAM

ROP L2

DRAM

ROP L2

Host CPU

Interconnection Network

SM

SP

DP

SP

SP SP

SP SP

SP SP

I-Cache

MT Issue

C-Cache

SFU SFU

SharedMemory

240 SP Cores

Threading

Tuesday, January 13, 2009

Page 86: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

CUDA Memory Model

IAP09 CUDA@MIT / 6.963

Tuesday, January 13, 2009

Page 87: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Kernel Memory Access

Per-thread

Per-block

Per-device

ThreadRegisters

Local Memory

SharedMemory

Block

...Kernel 0

...Kernel 1

GlobalMemory

Time

On-chip

Off-chip, uncached

• On-chip, small

• Fast

• Off-chip, large

• Uncached

• Persistent across kernel launches

• Kernel I/O

Memory

Tuesday, January 13, 2009

Page 88: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

13M02: High Performance Computing with CUDA

Memory modelMemory model

Thread

Per-threadLocal Memory

Block

Per-blockSharedMemory

Memory

Tuesday, January 13, 2009

Page 89: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

14M02: High Performance Computing with CUDA

Memory modelMemory model

Kernel 0

. . .Per-device

GlobalMemory

. . .

Kernel 1

Sequential

Kernels

Memory

Tuesday, January 13, 2009

Page 90: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

15M02: High Performance Computing with CUDA

Memory modelMemory model

Device 0memory

Device 1memory

Host memory cudaMemcpy()

Memory

Tuesday, January 13, 2009

Page 91: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

© NVIDIA Corporation 2006 6

Programming Model:Memory Spaces

Each thread can:Read/write per-thread registers

Read/write per-thread local memory

Read/write per-block shared memory

Read/write per-grid global memory

Read only per-grid constant memory

Read only per-grid texture memory

Grid

ConstantMemory

TextureMemory

GlobalMemory

Block (0, 0)

Shared Memory

LocalMemory

Thread (0, 0)

Registers

LocalMemory

Thread (1, 0)

Registers

Block (1, 0)

Shared Memory

LocalMemory

Thread (0, 0)

Registers

LocalMemory

Thread (1, 0)

Registers

HostThe host can read/write global, constant, and texture memory (stored in DRAM)

Memory

Tuesday, January 13, 2009

Page 92: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

19M02: High Performance Computing with CUDA

Variable Qualifiers (GPU code)Variable Qualifiers (GPU code)

__device__stored in global memory (not cached, high latency)

accessible by all threads

lifetime: application

__constant__stored in global memory (cached)

read-only for threads, written by host

Lifetime: application

__shared__stored in shared memory (latency comparable to registers)

accessible by all threads in the same threadblock

lifetime: block lifetime

Unqualified variables:Stored in local memory:

scalars and built-in vector types are stored in registers

arrays are stored in device memory

Memory

Tuesday, January 13, 2009

Page 93: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

'

!"#$%&"'%

! !"#$%&'()*%#&+,&#%$-./%#.&0%#&./#%")&

0#+1%..+#&23456&-'&.+)%&7"#89"#%:

! ;%#+<1=+1>&1?1=%&"11%..

! @/+#%&%-/7%#&A5*-/&-'/%$%#&+#&A5*-/&,=+"/

()*+,-."/)'0

! B&.)"==&0+#/-+'&+,&$=+*"=&)%)+#?&/7"/&-.&

0#-C"/%&/+&"&./#%")&0#+1%..+#

! D,/%'&(.%8&".&+C%#,=+9&,#+)&#%$-./%#.

! @=+9&/+&"11%..&2.")%&".&$=+*"=&)%)+#?:

12+'"3-."/)'0

! B&*=+1>&+,&)%)+#?&/7"/&-.&.7"#%8&*?&"==&

./#%")&0#+1%..+#.&-'&"&)(=/-<0#+1%..+#

! 3EFG&0%#&*=+1>H&./+#%8&-'&3EI3FG&*"'>.

! J%#?&,"./&/+&"11%..&2-K%K&".&,"./&".&#%$-./%#.L:&&9-/7+(/&!"#$%&'#()*&+,

4,)5+,-."/)'0

! M7%&="#$%&*=+1>&+,&)%)+#?&.7"#%8&*?&"==&

)(=/-<0#+1%..+#.&+'&/7%&1+)0(/%&8%C-1%

! @-N%&8%0%'8.&+'&8%C-1%&! 5OEPG&/+&3KOQG

! R-$7&*"'89-8/7&S&344QGT.

! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&

="/%'1?K&

! 678-1"17%8

9):%&+:&-."/)'0

! B&*=+1>&+,&#%"8<+'=?&)%)+#?&.7"#%8&*?&"==&

)(=/-<0#+1%..+#.&2E6FG:

! U"17%8&C-"&VFG&1"17%&0%#&)(=/-<0#+1%..+#

! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&="/%'1?&+'&1"17%&)-..

8";&<'"-."/)'0! B&="#$%&*=+1>&+,&#%"8<+'=?&)%)+#?&.7"#%8&*?&"==&)(=/-<0#+1%..+#.

! W%"8.&,#+)&/%I/(#%&)%)+#?&1"'&*%&"#$%&'()*#+,!X%"#%./&+#&=-'%"#&-'/%#0+="/-+'&,+#&,#%%L

! U"17%8&C-"&VFG&1"17%&0%#&)(=/-<0#+1%..+#! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&="/%'1?&+'&1"17%&)-..

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 94: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

'

!"#$%&"'%

! !"#$%&'()*%#&+,&#%$-./%#.&0%#&./#%")&

0#+1%..+#&23456&-'&.+)%&7"#89"#%:

! ;%#+<1=+1>&1?1=%&"11%..

! @/+#%&%-/7%#&A5*-/&-'/%$%#&+#&A5*-/&,=+"/

()*+,-."/)'0

! B&.)"==&0+#/-+'&+,&$=+*"=&)%)+#?&/7"/&-.&

0#-C"/%&/+&"&./#%")&0#+1%..+#

! D,/%'&(.%8&".&+C%#,=+9&,#+)&#%$-./%#.

! @=+9&/+&"11%..&2.")%&".&$=+*"=&)%)+#?:

12+'"3-."/)'0

! B&*=+1>&+,&)%)+#?&/7"/&-.&.7"#%8&*?&"==&

./#%")&0#+1%..+#.&-'&"&)(=/-<0#+1%..+#

! 3EFG&0%#&*=+1>H&./+#%8&-'&3EI3FG&*"'>.

! J%#?&,"./&/+&"11%..&2-K%K&".&,"./&".&#%$-./%#.L:&&9-/7+(/&!"#$%&'#()*&+,

4,)5+,-."/)'0

! M7%&="#$%&*=+1>&+,&)%)+#?&.7"#%8&*?&"==&

)(=/-<0#+1%..+#.&+'&/7%&1+)0(/%&8%C-1%

! @-N%&8%0%'8.&+'&8%C-1%&! 5OEPG&/+&3KOQG

! R-$7&*"'89-8/7&S&344QGT.

! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&

="/%'1?K&

! 678-1"17%8

9):%&+:&-."/)'0

! B&*=+1>&+,&#%"8<+'=?&)%)+#?&.7"#%8&*?&"==&

)(=/-<0#+1%..+#.&2E6FG:

! U"17%8&C-"&VFG&1"17%&0%#&)(=/-<0#+1%..+#

! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&="/%'1?&+'&1"17%&)-..

8";&<'"-."/)'0! B&="#$%&*=+1>&+,&#%"8<+'=?&)%)+#?&.7"#%8&*?&"==&)(=/-<0#+1%..+#.

! W%"8.&,#+)&/%I/(#%&)%)+#?&1"'&*%&"#$%&'()*#+,!X%"#%./&+#&=-'%"#&-'/%#0+="/-+'&,+#&,#%%L

! U"17%8&C-"&VFG&1"17%&0%#&)(=/-<0#+1%..+#! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&="/%'1?&+'&1"17%&)-..

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 95: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

'

!"#$%&"'%

! !"#$%&'()*%#&+,&#%$-./%#.&0%#&./#%")&

0#+1%..+#&23456&-'&.+)%&7"#89"#%:

! ;%#+<1=+1>&1?1=%&"11%..

! @/+#%&%-/7%#&A5*-/&-'/%$%#&+#&A5*-/&,=+"/

()*+,-."/)'0

! B&.)"==&0+#/-+'&+,&$=+*"=&)%)+#?&/7"/&-.&

0#-C"/%&/+&"&./#%")&0#+1%..+#

! D,/%'&(.%8&".&+C%#,=+9&,#+)&#%$-./%#.

! @=+9&/+&"11%..&2.")%&".&$=+*"=&)%)+#?:

12+'"3-."/)'0

! B&*=+1>&+,&)%)+#?&/7"/&-.&.7"#%8&*?&"==&

./#%")&0#+1%..+#.&-'&"&)(=/-<0#+1%..+#

! 3EFG&0%#&*=+1>H&./+#%8&-'&3EI3FG&*"'>.

! J%#?&,"./&/+&"11%..&2-K%K&".&,"./&".&#%$-./%#.L:&&9-/7+(/&!"#$%&'#()*&+,

4,)5+,-."/)'0

! M7%&="#$%&*=+1>&+,&)%)+#?&.7"#%8&*?&"==&

)(=/-<0#+1%..+#.&+'&/7%&1+)0(/%&8%C-1%

! @-N%&8%0%'8.&+'&8%C-1%&! 5OEPG&/+&3KOQG

! R-$7&*"'89-8/7&S&344QGT.

! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&

="/%'1?K&

! 678-1"17%8

9):%&+:&-."/)'0

! B&*=+1>&+,&#%"8<+'=?&)%)+#?&.7"#%8&*?&"==&

)(=/-<0#+1%..+#.&2E6FG:

! U"17%8&C-"&VFG&1"17%&0%#&)(=/-<0#+1%..+#

! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&="/%'1?&+'&1"17%&)-..

8";&<'"-."/)'0! B&="#$%&*=+1>&+,&#%"8<+'=?&)%)+#?&.7"#%8&*?&"==&)(=/-<0#+1%..+#.

! W%"8.&,#+)&/%I/(#%&)%)+#?&1"'&*%&"#$%&'()*#+,!X%"#%./&+#&=-'%"#&-'/%#0+="/-+'&,+#&,#%%L

! U"17%8&C-"&VFG&1"17%&0%#&)(=/-<0#+1%..+#! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&="/%'1?&+'&1"17%&)-..

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 96: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

'

!"#$%&"'%

! !"#$%&'()*%#&+,&#%$-./%#.&0%#&./#%")&

0#+1%..+#&23456&-'&.+)%&7"#89"#%:

! ;%#+<1=+1>&1?1=%&"11%..

! @/+#%&%-/7%#&A5*-/&-'/%$%#&+#&A5*-/&,=+"/

()*+,-."/)'0

! B&.)"==&0+#/-+'&+,&$=+*"=&)%)+#?&/7"/&-.&

0#-C"/%&/+&"&./#%")&0#+1%..+#

! D,/%'&(.%8&".&+C%#,=+9&,#+)&#%$-./%#.

! @=+9&/+&"11%..&2.")%&".&$=+*"=&)%)+#?:

12+'"3-."/)'0

! B&*=+1>&+,&)%)+#?&/7"/&-.&.7"#%8&*?&"==&

./#%")&0#+1%..+#.&-'&"&)(=/-<0#+1%..+#

! 3EFG&0%#&*=+1>H&./+#%8&-'&3EI3FG&*"'>.

! J%#?&,"./&/+&"11%..&2-K%K&".&,"./&".&#%$-./%#.L:&&9-/7+(/&!"#$%&'#()*&+,

4,)5+,-."/)'0

! M7%&="#$%&*=+1>&+,&)%)+#?&.7"#%8&*?&"==&

)(=/-<0#+1%..+#.&+'&/7%&1+)0(/%&8%C-1%

! @-N%&8%0%'8.&+'&8%C-1%&! 5OEPG&/+&3KOQG

! R-$7&*"'89-8/7&S&344QGT.

! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&

="/%'1?K&

! 678-1"17%8

9):%&+:&-."/)'0

! B&*=+1>&+,&#%"8<+'=?&)%)+#?&.7"#%8&*?&"==&

)(=/-<0#+1%..+#.&2E6FG:

! U"17%8&C-"&VFG&1"17%&0%#&)(=/-<0#+1%..+#

! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&="/%'1?&+'&1"17%&)-..

8";&<'"-."/)'0! B&="#$%&*=+1>&+,&#%"8<+'=?&)%)+#?&.7"#%8&*?&"==&)(=/-<0#+1%..+#.

! W%"8.&,#+)&/%I/(#%&)%)+#?&1"'&*%&"#$%&'()*#+,!X%"#%./&+#&=-'%"#&-'/%#0+="/-+'&,+#&,#%%L

! U"17%8&C-"&VFG&1"17%&0%#&)(=/-<0#+1%..+#! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&="/%'1?&+'&1"17%&)-..

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 97: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

'

!"#$%&"'%

! !"#$%&'()*%#&+,&#%$-./%#.&0%#&./#%")&

0#+1%..+#&23456&-'&.+)%&7"#89"#%:

! ;%#+<1=+1>&1?1=%&"11%..

! @/+#%&%-/7%#&A5*-/&-'/%$%#&+#&A5*-/&,=+"/

()*+,-."/)'0

! B&.)"==&0+#/-+'&+,&$=+*"=&)%)+#?&/7"/&-.&

0#-C"/%&/+&"&./#%")&0#+1%..+#

! D,/%'&(.%8&".&+C%#,=+9&,#+)&#%$-./%#.

! @=+9&/+&"11%..&2.")%&".&$=+*"=&)%)+#?:

12+'"3-."/)'0

! B&*=+1>&+,&)%)+#?&/7"/&-.&.7"#%8&*?&"==&

./#%")&0#+1%..+#.&-'&"&)(=/-<0#+1%..+#

! 3EFG&0%#&*=+1>H&./+#%8&-'&3EI3FG&*"'>.

! J%#?&,"./&/+&"11%..&2-K%K&".&,"./&".&#%$-./%#.L:&&9-/7+(/&!"#$%&'#()*&+,

4,)5+,-."/)'0

! M7%&="#$%&*=+1>&+,&)%)+#?&.7"#%8&*?&"==&

)(=/-<0#+1%..+#.&+'&/7%&1+)0(/%&8%C-1%

! @-N%&8%0%'8.&+'&8%C-1%&! 5OEPG&/+&3KOQG

! R-$7&*"'89-8/7&S&344QGT.

! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&

="/%'1?K&

! 678-1"17%8

9):%&+:&-."/)'0

! B&*=+1>&+,&#%"8<+'=?&)%)+#?&.7"#%8&*?&"==&

)(=/-<0#+1%..+#.&2E6FG:

! U"17%8&C-"&VFG&1"17%&0%#&)(=/-<0#+1%..+#

! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&="/%'1?&+'&1"17%&)-..

8";&<'"-."/)'0! B&="#$%&*=+1>&+,&#%"8<+'=?&)%)+#?&.7"#%8&*?&"==&)(=/-<0#+1%..+#.

! W%"8.&,#+)&/%I/(#%&)%)+#?&1"'&*%&"#$%&'()*#+,!X%"#%./&+#&=-'%"#&-'/%#0+="/-+'&,+#&,#%%L

! U"17%8&C-"&VFG&1"17%&0%#&)(=/-<0#+1%..+#! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&="/%'1?&+'&1"17%&)-..

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 98: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

'

!"#$%&"'%

! !"#$%&'()*%#&+,&#%$-./%#.&0%#&./#%")&

0#+1%..+#&23456&-'&.+)%&7"#89"#%:

! ;%#+<1=+1>&1?1=%&"11%..

! @/+#%&%-/7%#&A5*-/&-'/%$%#&+#&A5*-/&,=+"/

()*+,-."/)'0

! B&.)"==&0+#/-+'&+,&$=+*"=&)%)+#?&/7"/&-.&

0#-C"/%&/+&"&./#%")&0#+1%..+#

! D,/%'&(.%8&".&+C%#,=+9&,#+)&#%$-./%#.

! @=+9&/+&"11%..&2.")%&".&$=+*"=&)%)+#?:

12+'"3-."/)'0

! B&*=+1>&+,&)%)+#?&/7"/&-.&.7"#%8&*?&"==&

./#%")&0#+1%..+#.&-'&"&)(=/-<0#+1%..+#

! 3EFG&0%#&*=+1>H&./+#%8&-'&3EI3FG&*"'>.

! J%#?&,"./&/+&"11%..&2-K%K&".&,"./&".&#%$-./%#.L:&&9-/7+(/&!"#$%&'#()*&+,

4,)5+,-."/)'0

! M7%&="#$%&*=+1>&+,&)%)+#?&.7"#%8&*?&"==&

)(=/-<0#+1%..+#.&+'&/7%&1+)0(/%&8%C-1%

! @-N%&8%0%'8.&+'&8%C-1%&! 5OEPG&/+&3KOQG

! R-$7&*"'89-8/7&S&344QGT.

! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&

="/%'1?K&

! 678-1"17%8

9):%&+:&-."/)'0

! B&*=+1>&+,&#%"8<+'=?&)%)+#?&.7"#%8&*?&"==&

)(=/-<0#+1%..+#.&2E6FG:

! U"17%8&C-"&VFG&1"17%&0%#&)(=/-<0#+1%..+#

! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&="/%'1?&+'&1"17%&)-..

8";&<'"-."/)'0! B&="#$%&*=+1>&+,&#%"8<+'=?&)%)+#?&.7"#%8&*?&"==&)(=/-<0#+1%..+#.

! W%"8.&,#+)&/%I/(#%&)%)+#?&1"'&*%&"#$%&'()*#+,!X%"#%./&+#&=-'%"#&-'/%#0+="/-+'&,+#&,#%%L

! U"17%8&C-"&VFG&1"17%&0%#&)(=/-<0#+1%..+#! @=+9&/+&"11%..&! .%C%#"=&7('8#%8&1=+1>&1?1=%&="/%'1?&+'&1"17%&)-..

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 99: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

0

! !"#$%#&'#%(")'*+%,*"-'*+%."/0#'/#%'/1%2$3#45$%

6$6"57%'**%)"6$%85"6%#&$%0'6$%9&70:)'*%

6$6"57%9""*

! ;40#%1:88$5%:/%'))$00%9'##$5/0+%)')&:/<+%$#)=

! >%.?@>%1$A:)$%:0%'%&:<&*7%9'5'**$*%95")$00"5

! B$%'0046$%:#%)'/%$3$)4#$%6'/7%&4/15$10%"8%

#&5$'10%:/%9'5'**$*

! 2&5$'10%C%D#5$'6%E5")$00"50%F%G

! B&$/%H5:#:/<%.?@>%0"8#H'5$+%#&:/I%:/%#$560%"8%#&5$'10+%/"#%95")$00"50

! >%!"#$"% :0%$3$)4#$1%'0%'%!"#$

! >%&#'( :0%'%)"**$)#:"/%"8%%&"'($)*+,-./

! >%)*#"+(,-%./!,:0%'%)"**$)#:"/%"8%%&"'($/

! 2&5$'1%-*")I0%'/1%#&5$'10%'5$%<:A$/%4/:J4$%

:1$/#:8:$50%

! K1$/#:8:$50%-$%G@+%L@%"5%M@

! ?0$1%#"%&$*9%:1$/#:87%H&:)&%9'5#%"8%'%95"-*$6%

'%#&5$'1N-*")I%0&"4*1%"9$5'#$%"/

@$A:)$

,5:1

0)

12324

0

12354

0)

15324

0

15354

0)

16324

0)

16354

!

! 2&5$'1%O*")I%PG+GQ

7)

12324

7)

12354!

7)

15324

7)

15354

7)

16324

7)

16354

!

!

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 100: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

E

!"#$%&"'()'*%+,-./

! 0"1$%&"'2"34&5

! *67-'-./'.,&$'89"

2%:$;<.&4#"==4&

!"#$%&"'%()*+,-

."/)'0

!

1&'"+/-2')*"%%)'

13+'"4-."/)'0

5,)6+,-."/)'0

7)8%&+8&-."/)'0

9":&;'"-."/)'0

! -':,&>"'?:4#@'4A'&",+<49:5'3"34&5'=B,&"+'?5'

,::'3%:$;<C&4#"==4&=

! D",+='A&43'$"1$%&"'3"34&5'#,9'?"'

!"#$%&'()"*+

! *,#B"+'E;,'FGH'#,#B"'C"&'3%:$;<C&4#"==4&! I:4J'$4',##"==', ="E"&,:'B%9+&"+'#:4#@'#5#:"'

:,$"9#5'49'#,#B"'3;==

! 0"1$%&"'3"34&5';='&",+'$B&4%>B','!"#!$%"&

%"'"%"()"

! 69:;@"'94&3,:'3"34&5K'$"1$%&"=',&"'&",+'

$B&4%>B',9'"1C:;#;$'!"#!$%"&'"!)*

! -'$"1$%&"'A"$#B'3,5'#4&&"=C49+'$4'&",+;9>',9+';9$"&C4:,$;9>','9%3?"&'4A'3"34&5'

,++&"=="=

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 101: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

E

!"#$%&"'()'*%+,-./

! 0"1$%&"'2"34&5

! *67-'-./'.,&$'89"

2%:$;<.&4#"==4&

!"#$%&"'%()*+,-

."/)'0

!

1&'"+/-2')*"%%)'

13+'"4-."/)'0

5,)6+,-."/)'0

7)8%&+8&-."/)'0

9":&;'"-."/)'0

! -':,&>"'?:4#@'4A'&",+<49:5'3"34&5'=B,&"+'?5'

,::'3%:$;<C&4#"==4&=

! D",+='A&43'$"1$%&"'3"34&5'#,9'?"'

!"#$%&'()"*+

! *,#B"+'E;,'FGH'#,#B"'C"&'3%:$;<C&4#"==4&! I:4J'$4',##"==', ="E"&,:'B%9+&"+'#:4#@'#5#:"'

:,$"9#5'49'#,#B"'3;==

! 0"1$%&"'3"34&5';='&",+'$B&4%>B','!"#!$%"&

%"'"%"()"

! 69:;@"'94&3,:'3"34&5K'$"1$%&"=',&"'&",+'

$B&4%>B',9'"1C:;#;$'!"#!$%"&'"!)*

! -'$"1$%&"'A"$#B'3,5'#4&&"=C49+'$4'&",+;9>',9+';9$"&C4:,$;9>','9%3?"&'4A'3"34&5'

,++&"=="=

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 102: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

/

! !"#$%&"'()*&+(,(-./(0.(1,23("4"2$%,556(7.8(

3*+,92((:;<;(=#=/(0.($"#$%&">

! :5"+"2$'(,33&"''"3(?6(,()5*,$92<@A*92$(

!"#$%&'()*(!+))*$"#,-%'(.-/"#0(,**,1("#$%&2

1=;B/=;B8

1B;B/(B;B8

1-;C/(-;C8

! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2 ! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2

!"#$"%&'()*&"$)+,

1-;B/(-;B8(F(1-/-8

10;C/(0;C8(F(10/08

1B;G/(B;G8(F(1B/B8

! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2

-)+"#$'()*&"$)+,

1-;B/(-;B8(F(H(1B/B8(I(H(1B/-8(I(H(1-/B8(I(H(1-/-8

10;C/(0;C8(F(10/08

! J($"#$%&"(&")"&"2D"(9'(?*%23($*(,(?5*DK(*)(

+"+*&6($E&*%<E($E"(JLM

! .9))"&"2$($"#$%&"(&")"&"2D"'(+,6(?"(?*%23($*(

$E"(',+"(*&(*4"&5,AA92<(?5*DK'(*)(+"+*&6

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 103: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

/

! !"#$%&"'()*&+(,(-./(0.(1,23("4"2$%,556(7.8(

3*+,92((:;<;(=#=/(0.($"#$%&">

! :5"+"2$'(,33&"''"3(?6(,()5*,$92<@A*92$(

!"#$%&'()*(!+))*$"#,-%'(.-/"#0(,**,1("#$%&2

1=;B/=;B8

1B;B/(B;B8

1-;C/(-;C8

! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2 ! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2

!"#$"%&'()*&"$)+,

1-;B/(-;B8(F(1-/-8

10;C/(0;C8(F(10/08

1B;G/(B;G8(F(1B/B8

! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2

-)+"#$'()*&"$)+,

1-;B/(-;B8(F(H(1B/B8(I(H(1B/-8(I(H(1-/B8(I(H(1-/-8

10;C/(0;C8(F(10/08

! J($"#$%&"(&")"&"2D"(9'(?*%23($*(,(?5*DK(*)(

+"+*&6($E&*%<E($E"(JLM

! .9))"&"2$($"#$%&"(&")"&"2D"'(+,6(?"(?*%23($*(

$E"(',+"(*&(*4"&5,AA92<(?5*DK'(*)(+"+*&6

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 104: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

/

! !"#$%&"'()*&+(,(-./(0.(1,23("4"2$%,556(7.8(

3*+,92((:;<;(=#=/(0.($"#$%&">

! :5"+"2$'(,33&"''"3(?6(,()5*,$92<@A*92$(

!"#$%&'()*(!+))*$"#,-%'(.-/"#0(,**,1("#$%&2

1=;B/=;B8

1B;B/(B;B8

1-;C/(-;C8

! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2 ! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2

!"#$"%&'()*&"$)+,

1-;B/(-;B8(F(1-/-8

10;C/(0;C8(F(10/08

1B;G/(B;G8(F(1B/B8

! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2

-)+"#$'()*&"$)+,

1-;B/(-;B8(F(H(1B/B8(I(H(1B/-8(I(H(1-/B8(I(H(1-/-8

10;C/(0;C8(F(10/08

! J($"#$%&"(&")"&"2D"(9'(?*%23($*(,(?5*DK(*)(

+"+*&6($E&*%<E($E"(JLM

! .9))"&"2$($"#$%&"(&")"&"2D"'(+,6(?"(?*%23($*(

$E"(',+"(*&(*4"&5,AA92<(?5*DK'(*)(+"+*&6

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 105: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

/

! !"#$%&"'()*&+(,(-./(0.(1,23("4"2$%,556(7.8(

3*+,92((:;<;(=#=/(0.($"#$%&">

! :5"+"2$'(,33&"''"3(?6(,()5*,$92<@A*92$(

!"#$%&'()*(!+))*$"#,-%'(.-/"#0(,**,1("#$%&2

1=;B/=;B8

1B;B/(B;B8

1-;C/(-;C8

! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2 ! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2

!"#$"%&'()*&"$)+,

1-;B/(-;B8(F(1-/-8

10;C/(0;C8(F(10/08

1B;G/(B;G8(F(1B/B8

! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2

-)+"#$'()*&"$)+,

1-;B/(-;B8(F(H(1B/B8(I(H(1B/-8(I(H(1-/B8(I(H(1-/-8

10;C/(0;C8(F(10/08

! J($"#$%&"(&")"&"2D"(9'(?*%23($*(,(?5*DK(*)(

+"+*&6($E&*%<E($E"(JLM

! .9))"&"2$($"#$%&"(&")"&"2D"'(+,6(?"(?*%23($*(

$E"(',+"(*&(*4"&5,AA92<(?5*DK'(*)(+"+*&6

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 106: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

/

! !"#$%&"'()*&+(,(-./(0.(1,23("4"2$%,556(7.8(

3*+,92((:;<;(=#=/(0.($"#$%&">

! :5"+"2$'(,33&"''"3(?6(,()5*,$92<@A*92$(

!"#$%&'()*(!+))*$"#,-%'(.-/"#0(,**,1("#$%&2

1=;B/=;B8

1B;B/(B;B8

1-;C/(-;C8

! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2 ! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2

!"#$"%&'()*&"$)+,

1-;B/(-;B8(F(1-/-8

10;C/(0;C8(F(10/08

1B;G/(B;G8(F(1B/B8

! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2

-)+"#$'()*&"$)+,

1-;B/(-;B8(F(H(1B/B8(I(H(1B/-8(I(H(1-/B8(I(H(1-/-8

10;C/(0;C8(F(10/08

! J($"#$%&"(&")"&"2D"(9'(?*%23($*(,(?5*DK(*)(

+"+*&6($E&*%<E($E"(JLM

! .9))"&"2$($"#$%&"(&")"&"2D"'(+,6(?"(?*%23($*(

$E"(',+"(*&(*4"&5,AA92<(?5*DK'(*)(+"+*&6

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 107: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

/

! !"#$%&"'()*&+(,(-./(0.(1,23("4"2$%,556(7.8(

3*+,92((:;<;(=#=/(0.($"#$%&">

! :5"+"2$'(,33&"''"3(?6(,()5*,$92<@A*92$(

!"#$%&'()*(!+))*$"#,-%'(.-/"#0(,**,1("#$%&2

1=;B/=;B8

1B;B/(B;B8

1-;C/(-;C8

! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2 ! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2

!"#$"%&'()*&"$)+,

1-;B/(-;B8(F(1-/-8

10;C/(0;C8(F(10/08

1B;G/(B;G8(F(1B/B8

! !"#$%&"()"$DE"'(A"&)*&+(,()95$"&92<(*A"&,$9*2

-)+"#$'()*&"$)+,

1-;B/(-;B8(F(H(1B/B8(I(H(1B/-8(I(H(1-/B8(I(H(1-/-8

10;C/(0;C8(F(10/08

! J($"#$%&"(&")"&"2D"(9'(?*%23($*(,(?5*DK(*)(

+"+*&6($E&*%<E($E"(JLM

! .9))"&"2$($"#$%&"(&")"&"2D"'(+,6(?"(?*%23($*(

$E"(',+"(*&(*4"&5,AA92<(?5*DK'(*)(+"+*&6

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 108: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

E

! !"#$%&"'()*+(,"(*--&"''"-(./($01()1-"'2(

!"#$%&'()* */-(!"!+!"#$%&'()*

! 31/4/1&)*5.6"-(./-.7"'(&*/8"(9&1)(:;<;=($1(

:3<>=

! 31&)*5.6"-(./-.7"'(&*/8"(9&1)(:;<;=($1(:?<?=

! @A*$(A*BB"/'(0A"/(*($"#$%&"(711&-./*$"(.'(

1%$'.-"($A"(-1)*./(19($A"($"#$%&"C

! D*/('"5"7$(*/(*--&"''./8()1-"2

! !"#$%&'( E(7A11'"(/"*&"'$(9&1)(,1%/-*&+

! )*#%%&'(+E($1&1.-*5 -1)*./

! !"#$%&"(&"9"&"/7"'(*&"(-"75*&"-(*'(*('B"7.*5(

$+B"(19(851,*5(F*&.*,5"2

!!"#$%&#!!'(#)(*+#,!"#$-'%&'$()&*()-''*%$.'/#)(*+#0#123

! ,-./2(D1)B%$"(G/.9."-(H"F.7"(I&7A.$"7$%&"

! D&"*$"-(,+(3JKHKI

! I(0*+($1(B"&91&)(71)B%$*$.1/(1/($A"(LMG

! NB"7.9.7*$.1/(91&2

! I(71)B%$"&(*&7A.$"7$%&"

! I(5*/8%*8"

! I/(*BB5.7*$.1/(./$"&9*7"(:IMK=

! !A"(DGHI(IMK(71/'.'$'(19($A&""(B*&$'2

! !A"(A1'$(IMK

! !A"(-"F.7"(IMK

! !A"(71))1/(IMK

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 109: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

E

! !"#$%&"'()*+(,"(*--&"''"-(./($01()1-"'2(

!"#$%&'()* */-(!"!+!"#$%&'()*

! 31/4/1&)*5.6"-(./-.7"'(&*/8"(9&1)(:;<;=($1(

:3<>=

! 31&)*5.6"-(./-.7"'(&*/8"(9&1)(:;<;=($1(:?<?=

! @A*$(A*BB"/'(0A"/(*($"#$%&"(711&-./*$"(.'(

1%$'.-"($A"(-1)*./(19($A"($"#$%&"C

! D*/('"5"7$(*/(*--&"''./8()1-"2

! !"#$%&'( E(7A11'"(/"*&"'$(9&1)(,1%/-*&+

! )*#%%&'(+E($1&1.-*5 -1)*./

! !"#$%&"(&"9"&"/7"'(*&"(-"75*&"-(*'(*('B"7.*5(

$+B"(19(851,*5(F*&.*,5"2

!!"#$%&#!!'(#)(*+#,!"#$-'%&'$()&*()-''*%$.'/#)(*+#0#123

! ,-./2(D1)B%$"(G/.9."-(H"F.7"(I&7A.$"7$%&"

! D&"*$"-(,+(3JKHKI

! I(0*+($1(B"&91&)(71)B%$*$.1/(1/($A"(LMG

! NB"7.9.7*$.1/(91&2

! I(71)B%$"&(*&7A.$"7$%&"

! I(5*/8%*8"

! I/(*BB5.7*$.1/(./$"&9*7"(:IMK=

! !A"(DGHI(IMK(71/'.'$'(19($A&""(B*&$'2

! !A"(A1'$(IMK

! !A"(-"F.7"(IMK

! !A"(71))1/(IMK

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 110: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/D/#D/$$0

E

! !"#$%&"'()*+(,"(*--&"''"-(./($01()1-"'2(

!"#$%&'()* */-(!"!+!"#$%&'()*

! 31/4/1&)*5.6"-(./-.7"'(&*/8"(9&1)(:;<;=($1(

:3<>=

! 31&)*5.6"-(./-.7"'(&*/8"(9&1)(:;<;=($1(:?<?=

! @A*$(A*BB"/'(0A"/(*($"#$%&"(711&-./*$"(.'(

1%$'.-"($A"(-1)*./(19($A"($"#$%&"C

! D*/('"5"7$(*/(*--&"''./8()1-"2

! !"#$%&'( E(7A11'"(/"*&"'$(9&1)(,1%/-*&+

! )*#%%&'(+E($1&1.-*5 -1)*./

! !"#$%&"(&"9"&"/7"'(*&"(-"75*&"-(*'(*('B"7.*5(

$+B"(19(851,*5(F*&.*,5"2

!!"#$%&#!!'(#)(*+#,!"#$-'%&'$()&*()-''*%$.'/#)(*+#0#123

! ,-./2(D1)B%$"(G/.9."-(H"F.7"(I&7A.$"7$%&"

! D&"*$"-(,+(3JKHKI

! I(0*+($1(B"&91&)(71)B%$*$.1/(1/($A"(LMG

! NB"7.9.7*$.1/(91&2

! I(71)B%$"&(*&7A.$"7$%&"

! I(5*/8%*8"

! I/(*BB5.7*$.1/(./$"&9*7"(:IMK=

! !A"(DGHI(IMK(71/'.'$'(19($A&""(B*&$'2

! !A"(A1'$(IMK

! !A"(-"F.7"(IMK

! !A"(71))1/(IMK

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

Memory

Tuesday, January 13, 2009

Page 111: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

© NVIDIA Corporation 2008 10

Data Movement in a CUDA Program

Host Memory

Device Memory

[Shared Memory]

COMPUTATION

[Shared Memory]

Device Memory

Host Memory

Memory

Tuesday, January 13, 2009

Page 112: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007ECE 498AL, University of Illinois, Urbana-Champaign

112

A Common Programming Pattern• Local and global memory reside in device memory

(DRAM) - much slower access than shared memory• So, a profitable way of performing computation on the

device is to block data to take advantage of fast shared memory:– Partition data into data subsets that fit into shared memory– Handle each data subset with one thread block by:

• Loading the subset from global memory to shared memory, using multiple threads to exploit memory-level parallelism

• Performing the computation on the subset from shared memory; each thread can efficiently multi-pass over any data element

• Copying results from shared memory to global memory

Memory

Tuesday, January 13, 2009

Page 113: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007ECE 498AL, University of Illinois, Urbana-Champaign

113

A Common Programming Pattern (cont.)• Texture and Constant memory also reside in device

memory (DRAM) - much slower access than shared memory– But… cached!– Highly efficient access for read-only data

• Carefully divide data according to access patterns– R/O no structure constant memory– R/O array structured texture memory– R/W shared within Block shared memory– R/W registers spill to local memory– R/W inputs/results global memory

Memory

Tuesday, January 13, 2009

Page 114: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

GPU Atomic Operations

Associative operationsadd, sub, increment, decrement, min, max, ...

and, or, xor

exchange, compare, swap

Atomic operations on 32-bit words in global memory

Requires compute capability 1.1 or higher (G84/G86/G92)

Atomic operations on 32-bit words in shared memory and 64-bit words in global memory

Requires compute capability 1.2 or higher

Memory

Tuesday, January 13, 2009

Page 115: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

CUDA Advanced (Preview)

IAP09 CUDA@MIT / 6.963

Tuesday, January 13, 2009

Page 116: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

9M02: High Performance Computing with CUDA

CUDA librariesCUDA libraries

CUDA includes 2 widely used libraries

CUBLAS: BLAS implementation

CUFFT: FFT implementation

CUDPP (Data Parallel Primitives), available from

http://www.gpgpu.org/developer/cudpp/ :

Reduction

Scan

Sort

Preview

Tuesday, January 13, 2009

Page 117: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

!

!

"#$!%&'(!"()*+,(!

"-./01!

"()*+,(!

2011"-.!

"()*+,(!

0011"-.

"()*+,(!

0311"-4

5!*6!7(,8*+!,*+(9! :1! ;3! ;3! <!

,*+(!,=*,>?!"@A! ;B:1! ;B3C! ;B:D! ;B<D!

+(EF98(+9G,*+(! 3<HI! :/HI! :/HI! :/HI!

9'('G,*+(! ;3HI! ;3HI! ;3HI! ;3HI!

'('*+J!KL9?!"@A ;B;! ;B;! 1B2! ;B1!

'('*+J!KL9?!MF%9 D;/! /D3! :0<! ;/0!

K&%NOFN8P?!"IG9! ;<;! C1! 03! :/!

'('*+J!&'*L%8! ;"I! D;/QI! C30QI! /D3QI!

4#?!M(&>!"6=*MG9! 3/<! </2! :<3! 2:!

4#?!M(&>!M(+!,*+(! /;! /C! //! /:!

4#?!6=*M9RO*+N! ;0! /D! ;3! ;/!

S#?!M(&>!"6=*MG9! C0! T! T! T!

S#?!6=*M9RO*+N! <B<! T! T! T!

-&K=(!;R!-P(!=F98!*6!8P(!"#$9!L9(N!F%!8PF9!98LNJB!4#!F9!9F%E=(!M+(U

,F9F*%!&%N!S#!F9!N*LK=(!M+(,F9F*%B!4'('!F9!9P&+(N!'('*+JB!#(&>!

6=*M!+&8(9!&+(!9P*O%!6*+!'L=8FM=J!&%N!&NN!*M(+&8F*%9B!)=*M9RO*+N!

F9!8P(!+&8F*!*6!M(&>!"6=*MG9!+&8(!8*!MF%U'('*+J!K&%NOFN8P!F%!

O*+N9B!!

!"#$%&'()*#+,-./0,12,34#",5"#0",6*#"'(,78+"9(',,

V&9F=J!V*=>*7!

W*'ML8(+!4,F(%,(!SF7F9F*%!$%F7(+9F8J!*6!W&=F6*+%F&!&8!I(+>(=(J

X&'(9!YB!S(''(=!

W*'ML8(+!4,F(%,(!SF7F9F*%!&%N!S(M&+8'(%8!*6!Q&8P('&8F,9!$%F7(+9F8J!*6!W&=F6*+%F&!&8!I(+>(=(J!

7901('$1,

Y(! M+(9(%8! M(+6*+'&%,(! +(9L=89! 6*+! N(%9(! =F%(&+! &=E(K+&! L9F%E!+(,(%8! ZV[S[\! "#$9B! ]L+! '&8+F^U'&8+F^! 'L=8FM=J! +*L8F%(!_"`QQa!+L%9!LM! 8*!31b!6&98(+! 8P&%! 8P(!7(%N*+c9! F'M=('(%8&U8F*%!&%N!&MM+*&,P(9!8P(!M(&>!*6!P&+NO&+(!,&M&KF=F8F(9B!]L+!d$?!ef! &%N! WP*=(9>J! 6&,8*+FA&8F*%9! &,PF(7(! LM! 8*! 01g21b! *6! 8P(!M(&>! "`QQ! +&8(B! ]L+! M&+&==(=! d$! +L%%F%E! *%! 8O*! "#$9!&,PF(7(9!LM!8*!hD<1!"6=*MG9B!-P(9(!+(9L=89!&+(!&,,*'M=F9P(N!KJ!,P&==(%EF%E!8P(!&,,(M8(N!7F(O!*6!8P(!"#$!&+,PF8(,8L+(!&%N!M+*UE+&''F%E! ELFN(=F%(9B!Y(! &+EL(! 8P&8! '*N(+%! "#$9! 9P*L=N! K(!7F(O(N! &9! 'L=8F8P+(&N(N! 'L=8F,*+(! 7(,8*+! L%F89B! Y(! (^M=*F8!K=*,>F%E!9F'F=&+=J!8*!7(,8*+!,*'ML8(+9!&%N!P(8(+*E(%(F8J!*6!8P(!9J98('! KJ! ,*'ML8F%E! K*8P! *%! "#$! &%N! W#$B! -PF9! 98LNJ! F%U,=LN(9!N(8&F=(N!K(%,P'&+>F%E!*6! 8P(!"#$!'('*+J!9J98('! 8P&8!+(7(&=9! 9FA(9! &%N! =&8(%,F(9! *6! ,&,P(9! &%N! -dIB! Y(! M+(9(%8! &!,*LM=(! *6! &=E*+F8P'F,! *M8F'FA&8F*%9! &F'(N! &8! F%,+(&9F%E! M&+&=U=(=F9'!&%N!+(EL=&+F8J!F%!8P(!M+*K=('!8P&8!M+*7FN(!L9!OF8P!9=FEP8=J!PFEP(+!M(+6*+'&%,(B!

:,;#1(2<4$1*2#,

Y(! '&>(! 8P(! 6*==*OF%E! ,*%8+FKL8F*%9B! )*+! 8P(! 6F+98! 8F'(?! O(!9P*O!&%!d$?!ef!&%N!WP*=(9>J!6&,8*+FA&8F*%! 8P&8!&,PF(7(!,*'UML8&8F*%&=!+&8(9!*7(+!:11!"6=*MG9!*%!&!"#$B!-P(9(!&+(!8P+((!*6!8P(!'*98!OFN(=J!L9(N!6&,8*+FA&8F*%9! F%!N(%9(! =F%(&+!&=E(K+&!&%N!M&7(! 8P(! O&J! 6*+! 8P(! F'M=('(%8&8F*%! *6! 8P(! (%8F+(! d\#\WH!=FK+&+J!i\%N(+9*%!(8!&=B!;221j!6*+!8P(!"#$9B!

]L+! +(9L=89! &=9*! F%,=LN(! M(+6*+'&%,(! *%! 8P(! 0U9(+F(9! *6!ZV[S[\!"#$9!8P&8!O&9!%*8!M+(7F*L9=J!&88&F%(N!F%!8P(!;BD!J(&+9!9F%,(!8P(9(!"#$9!O(+(!&7&F=&K=(B!Y(!M+*7FN(!%(O!F%9FEP89!F%8*!M+*E+&''F%E! 8P(9(! &%N! %(O(+!"#$9! 8P&8! P(=M! L9! &,PF(7(! M(+U6*+'&%,(!F%!9L,P!K&9F,!>(+%(=9!&9!'&8+F^U'&8+F^!'L=8FM=J!8P&8!F9!31b! 6&98(+! 8P&%! 8P*9(! F%! 8P(! *M8F'FA(N! 7(%N*+c9! =FK+&+J!W$Id\4! ;B;B! 4*'(! *6! *L+! ,*N(9! P&7(! K((%! =F,(%9(N! KJ!ZV[S[\! &%N! F%,=LN(N! F%! W$Id\4! /B1B! [%! *L+! &MM+*&,P! O(!8PF%>! *6! 8P(! "#$! &9! &! 'L=8F8P+(&N(N! 7(,8*+! L%F8! &%N! *L+! K(98!&=E*+F8P'9! O(+(! 6*L%N! 8*! ,=*9(=J! +(9('K=(! (&+=F(+! 9*=L8F*%9!6*L%N!6*+!7(,8*+!M+*,(99*+9B!

Y(! M(+6*+'! N(8&F=(N! K(%,P'&+>9! *6! 8P(! "#$! &%N! +(7(&=!9*'(!*6! 8P(!K*88=(%(,>9?!9L,P!&9!&,,(99!8*!8P(!*%U,PFM!'('*+J!8P&8! K*L%N9! 8P(! M(+6*+'&%,(! *6! *L+! K(98! ,*N(9?! &%N! >(+%(=!=&L%,P!*7(+P(&N!8P&8!M+*PFKF89!(66F,F(%8!6F%(UE+&F%!,*'ML8&8F*%9B!-P(! K(%,P'&+>9! +(7(&=! 8P(! 98+L,8L+(! *6! 8P(!"#$!'('*+J! 9J9U8('?!F%,=LNF%E!9FA(9!&%N!=&8(%,F(9!*6! 8P(!d;!&%N!d/!,&,P(9!&%N!-dIB!)*+! 8P(! 6F+98! 8F'(!O(! F'M=('(%8! &%N!'(&9L+(! 8P(!M(+6*+U'&%,(! *6! &! E=*K&=! K&++F(+! 8P&8! +L%9! (%8F+(=J! *%! 8P(! "#$B!Y(!K(=F(7(! 8PF9! F9! &%! F'M*+8&%8! 98(M! 8*O&+N9! *M(+&8F%E!"#$9!OF8P!=*O(+!W#$!F%8(+7(%8F*%B!

-*!&,PF(7(!8P(!K(98!M(+6*+'&%,(!F%!'&8+F^!6&,8*+FA&8F*%9!O(!L9(!98&8(!*6!&+8!8(,P%FkL(9!9L,P!&9!=**>U&P(&N?!*7(+=&MMF%E!W#$!&%N! "#$! ,*'ML8&8F*%?! &L8*8L%F%E?! 9'&+8(+! 7&+F&%89! *6! /U=(7(=!K=*,>F%E?!&%N!,P**9F%E!8P(!+FEP8!'('*+J!=&J*L8l!O(!&=9*!L9(!&!%*7(=! &=E*+F8P'!OF8P!'*NF6F(N! %L'(+F,9B!Y(! &%&=JA(! 8P(! M(+U6*+'&%,(!*6!*L+!F'M=('(%8&8F*%9!F%!N(8&F=!8*!9P*O!8P&8!&==!,*'UM*%(%89!*6!8P(!6F%&=!9J98('!+L%!&8!8P(!%(&+=J!*M8F'&=!+&8(9B!

]L+!K(98!9M((NLM9!79B!*%(!kL&N!,*+(!W#$!&+(!*7(+!<!!F%!&==!8P+((!6&,8*+FA&8F*%9B!

-P(!+(98!*6!8PF9!M&M(+!F9!*+E&%FA(N!&9!6*==*O9B!4(,8F*%!/!N(U

9,+FK(9! 8P(! &+,PF8(,8L+(! *6! 8P(!"#$9!O(! L9(N?! PFEP=FEP8F%E! 8P(!6(&8L+(9!,*''*%!8*!7(,8*+!&+,PF8(,8L+(9B!4(,8F*%!:!K(%,P'&+>9!*M(+&8F*%9! F%,=LNF%E!'('*+J! 8+&%96(+?!>(+%(=! 98&+8ULM?!&%N!K&+U+F(+9?! &%N! L9(9! 8P(9(! 8*! &%&=JA(! 8P(! M(+6*+'&%,(! *6! 8P(! M&%(=!6&,8*+FA&8F*%!*6!d$B!4(,8F*%!<!NF9,L99(9! 8P(!N(9FE%!&%N!M(+6*+U'&%,(! (7&=L&8F*%! *6!'&8+F^!'L=8FM=F,&8F*%B! 4(,8F*%! D! NF9,L99(9!8P(! N(9FE%! *6! d$?! ef! &%N! WP*=(9>J?! &%N! 4(,8F*%! 3! (7&=L&8(9!8P(F+! M(+6*+'&%,(B! 4(,8F*%! C! 9L''&+FA(9! &%N! N(9,+FK(9! 6L8L+(!O*+>B!

=,-./,7($%*1"$14(",

[%! 8PF9! O*+>! O(! &+(! ,*%,(+%(N! OF8P! M+*E+&''F%E! 0! 9(+F(9?! 2!9(+F(9?!&%N!/11!9(+F(9!*6!ZV[S[\!"#$9?!&9!=F98(N!F%!-&K=(!;B!)*+!8P(!N(9,+FM8F*%!*6!8P(F+!&+,PF8(,8L+(!9((!8P(!W$S\!M+*E+&''F%E!ELFN(! iZV[S[\! /110&j?! 8(,P%F,&=! K+F(69! iZV[S[\! /113l!ZV[S[\! /110Kj! &%N! =(,8L+(! 9=FN(9! F%! 8P(! ,*L+9(! *%! M+*E+&'U'F%E! "#$9! &8! 8P(! $%F7(+9F8J! *6! [==F%*F9?! $+K&%&UWP&'M&FE%!i@OL!&%N!HF+>!/11CjB!\NNF8F*%&=!F%9FEP89!,&%!K(!6*L%N!F%!!"#$%!&;?!OPF,P! F9!&! 8PF+NUM&+8J!NF9&99('K=(+!*6!"#$!KF%&+F(9!K&9(N!

*%!+(7(+9(U(%EF%((+F%E!*6!8P(!%&8F7(!F%98+L,8F*%!9(8B!-P(!F%98+L,U8F*%!9(8!,&==(N!#-.!8P&8!O&9!+(=(&9(N!KJ!7(%N*+!F9!&%!&K98+&,8F*%!8P&8!+(kLF+(9!6L+8P(+!,*'MF=&8F*%!&%N!9*!M+*7FN(9!6(O(+!F%9FEP89B!

=>:,?21'1*2#,

-P(!"#$!M+*E+&''F%E!'*N(=!L9(N!F%!8P(!W$S\!M+*E+&''F%E!(%7F+*%'(%8!iZV[S[\!/110&j!K*++*O9!'L,P!6+*'!&K98+&,8F*%9!L9(N!F%!E+&MPF,9?!(BEB!9L,P!&9!L9(N!F%!8P(!SF+(,8.!&%N!]M(%"d!98&%N&+N9B!"#$!M+*E+&'9!&+(!+L%!&9!,*==(,8F*%9!*6!9,&=&+!8P+(&N9!8P&8! +L%! 6&98(+! F6! 8P(J! +('&F%! ,*%7(+E(%8! F%! &%! 4[QS! 6&9PF*%B!4F'F=&+=J?! F%NF7FNL&=! &+F8P'(8F,! MFM(=F%(9! 8P&8! (^(,L8(! 9,&=&+!F%98+L,8F*%9! &+(! (^M*9(N! &9! F%NF7FNL&=! M+*,(99F%E! ,*+(9B! )*+!(^&'M=(?!8P(!8(,P%F,&=!K+F(6!*%!8P(!=&8(98!"#$!iZV[S[\!/110Kj!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!;!P88MRGGOOOB,9B+LEB%=GhO=&NF'F+GN(,LN&G!

!"#$%&&%'()*')$+,")-%.%*+/)'#)0+#-)1'2%"&)'3)+//)'#)2+#*)'3)*0%&)4'#,)3'#)2"#&'(+/)'#)1/+&&#''$)5&")%&).#+(*"-)4%*0'5*)3"")2#'6%-"-)*0+*)1'2%"&)+#")('*)$+-")'#)-%&*#%75*"-

3'#)2#'3%*)'#)1'$$"#1%+/)+-6+(*+.")+(-)*0+*)1'2%"&)7"+#)*0%&)('*%1")+(-)*0")35//)1%*+*%'()'()*0")3%#&*)2+."8)9')1'2:)'*0"#4%&";)*')#"257/%&0;)*')2'&*)'()&"#6"#&)'#)*'))

#"-%&*#%75*")*')/%&*&;)#"<5%#"&)2#%'#)&2"1%3%1)2"#$%&&%'()+(-='#)+)3""8))

>?@AAB)C'6"$7"#)@AAB;)D5&*%(;)9"E+&;)F>D)GHBIJIK@KKI@BLMIG=AB)N@M8AA)O@AAB)PQQQ

Preview

Tuesday, January 13, 2009

Page 118: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

31M02: High Performance Computing with CUDA

CUDA Example:CUDA Example:Fourier-spectral Poisson SolverFourier-spectral Poisson Solver

Solve a Poisson equation on a rectangular domain with

periodic boundary conditions using a Fourier-spectral

method.

This example will show how to use the FFT library, transfer

the data to/from GPU and perform simple computations on

the GPU.

Preview

Tuesday, January 13, 2009

Page 119: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

23M02: High Performance Computing with CUDA

Interfacing CUDA with other languagesInterfacing CUDA with other languages

CUDA kernels from FORTRAN, allocate pinnedmemory from FORTRAN

Calling CUDA from MATLAB with MEX files

Several packages (open source and commercial) tointerface CUDA with Python, IDL, .NET, FORTRAN(Flagon). Browse CUDA Zone to find all thepackages.

Rocks!

Preview

Tuesday, January 13, 2009

Page 120: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

Preview

Tuesday, January 13, 2009

Page 121: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

Preview

Tuesday, January 13, 2009

Page 122: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

Preview

Tuesday, January 13, 2009

Page 123: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

Preview

Tuesday, January 13, 2009

Page 124: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

Preview

Tuesday, January 13, 2009

Page 125: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

and...

Optimizing CUDA!

Preview

Tuesday, January 13, 2009

Page 126: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

Preview

Tuesday, January 13, 2009

Page 127: IAP09 CUDA@MIT 6.963 - Lecture 03: CUDA Basics #2 (Nicolas Pinto, MIT)

COME

Tuesday, January 13, 2009