Top Banner

of 63

De Novo Discovery MicroRNA From Small RNA Sequencing Data

Jul 05, 2018

Download

Documents

ChristosNoutsos
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    1/63

    De novo discovery of microRNA

    from small RNA sequencing data

     Francisco D. Morón-Duran

    15 de setembre de 2015

    Projecte Final de Carrera per a

    l'Enginyeria Tècnica en Informàtica de Sistemes

    Director  Xavier Messeger 

    Departament en Ciències de la Computació

    Co-director externVictor Moreno

    Institut Català d'Oncologia

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    2/63

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    3/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    !"#$% '( )'*+%*+,

    !"#$%&'(#)%"*********************************************************************************************************************************+,-%'# #/0 1#$'(#'$0 %2 #/)1 &%('30"#************************************************************************************* +

    , 4$)30$ %" 5%60('67$ 8)%6%9:********************************************************************************************* ;B, 2$%3 (%34'#7#)%"76 744$%7(/01******************************************************** EDJ)1(%C0$: -: 2%$?7$& 90"0#)(1K &0 "%C% 7"& -: /%3%6%9:********************************************EDL%34'#7#)%"76 4$0&)(#)%" -: 37(/)"0 607$")"9***************************************************************FM!&0"#)2)(7#)%" 2$%3 13766 >B, 10G'0"()"9 -710& %" 7 $020$0"(0 90"%30**********************FM

    N$%O0(# PC0$C)0?***********************************************************************************************************************FEP-O0(#)C0****************************************************************************************************************************** FF

    N67"")"9********************************************************************************************************************************FFL/$%"%6%9)(76 467"********************************************************************************************************** FFQ(%"%3)( -'&90#************************************************************************************************************* F+

    N$%4%10& 4)406)"0 %'#6)"0***************************************************************************************************** FR

    N$%O0(# !346030"#7#)%"*************************************************************************************************************F;

    N$04$%(011)"9 %2 )"4'# S,HTU 2)601*************************************************************************************F;

    >07& (%66741)"9 7"& $04$010"#7#)C)#: 2)6#0$)"9**********************************************************************FI,6)9"30"# 1#$7#09:*************************************************************************************************************** FA

    J0 8$')O" 9$74/ (%"1#$'(#)%" 7"& (%"#)9 71103-6:*********************************************************FAH00& 1#04K !"&0=)"9 (%"#)91 7"& 10G'0"(0 @V30$1***********************************************************FAW%#)"9 1#04K H0G'0"(01 C%#)"9 (%"#)9 (7"&)&7#01************************************************************ FX8$07@)"9 #)01K &)1#7"(0 -0#?00" 10G'0"(01********************************************************************* FX

    , 2%$376 &02)")#)%" %2 &)1#7"(0*********************************************************************************** FX

    Q$$%$1 &0#0(#)%"******************************************************************************************************************** +M

    >07(/)"9 7 (%"10"1'1 10G'0"(0******************************************************************************************* +E

    E

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    4/63

     8+2$*()*% !9 :%+;$

    !&0"#)2:)"9 76$07&: 7""%#7#0& 3)>B,********************************************************************************* +E

    >01'6#1***************************************************************************************************************************************++

    L%"(6'1)%"1 7"& 2)"76 $037$@1**************************************************************************************************+A

    >020$0"(01 7"& 8)-6)%9$74/:**************************************************************************************************** +D

    !3790 ($0&)#1***************************************************************************************************************************** RE

    ,""0= ,K 871/ 1($)4# 2%$ S,HTU 4$04$%(011)"9********************************************************************** R+

    ,""0= 8K N:#/%" (%&0 2%$ #/0 4$%O0(#***************************************************************************************R;

    57)" 1%'$(0K 37)"*4:*********************************************************************************************************** R;

    5%&'60 10G1*4:*********************************************************************************************************************RA

    5%&'60 &0-9*4:********************************************************************************************************************;E

    5%&'60 4%66*4:********************************************************************************************************************* ;A

    ,""0= LK H($)4# 2%$ 8

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    5/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    -*+.'/01+2'*

    L%34'#)"9 H()0"(0 )1 7 9$07# #%%6 #% /064 '"&0$1#7"& -)%6%9)(76 G'01#)%"1 7$)10" 1)"(0 #/05%60('67$ 8)%6%9: $0C%6'#)%" #/7# #%%@ 467(0 )" 6)20 1()0"(01 )" #/0 ED;MZ1* T/0 27(# #/7# 6)20

    4$%(01101 7$0 0"(%&0& )"#% 90"%301 (%"#7)")"9 4$%9$731 #/7# (7" -0 $07&[ 3%&)2)0& 7"&

    1'44$0110& -: #/0 (0661 )1 1%30?/7# 271()"7#)"9*

    S%$ 7 -)%6%9)1# ?)#/ 1%30 (%34'#7#)%"76 -7(@9$%'"& )1 "%# 4%11)-60 #% #/)"@ %2 JB, 7"&

    JB,V-)"&)"9 4$%#0)"1 6)@0 4%6:30$7101 %$ 3)137#(/ $047)$ 4$%#0)"1 ?)#/%'# 37@)"9 7"

    7"76%9: ?)#/ T'$)"9 37(/)"01* \0"(0[ #/0 (%"C0$90"(0 -0#?00" !"2%$37#)%" T/0%$: 7"&

    8)%6%9: )1 "%?7&7:1 71 "7#'$76 #/7# 1#0$0%#:401 7$%'"& (%34'#7#)%"76 1()0"#)1#1 7"& -)%6%9)1#17$0 $74)&6: (/7"9)"9 71 #/0 "00& #/7# #/010 40%460 /7C0 #% '"&0$1#7"& #% 07(/ %#/0$ -0(%30

    3%$0 0C)&0"#*

    T/)1 ?%$@ 4$0#0"&1 #% -0 7"%#/0$ 0=73460 %2 /%? #/0 1:"0$9)01 4$%&'(0& -0#?00" (%34'#)"9

    7"& 6)20 1()0"(01 (7" 0"/7"(0 -)%6%9)(76 &)1(%C0$)01 7"& -%%1# %'$ '"&0$1#7"&)"9 %2 6)20* B%#

    %"6: -: 4$0&)(#)C0 769%$)#/31 47$1)"9 7 90"%3)( (%&0 #/7# ?0 &% "%# 2'66: '"&0$1#7"& :0#[ -'#

    ?)#/ (%34'#7#)%"76 30#/%&1 #/7# 10$C0 71 7 379")2:)"9 96711 %$ 7 (%34711 #/7# 9')&01

    1()0"#)1#1 #/$%'9/ #/0)$ -)%6%9)(76 G'01#)%"1*

    3#'0+ +4% ,+.01+0.% '( +42, /'105%*+

    T/)1 &%('30"# )1 37)"6: &)C)&0& )"#% 2)C0 10(#)%"1* T/0 2)$1# %"0 )1 #/0 4$010"# )"#$%&'(#)%"

    7"& 9)C01 #/0 (%"#0=# )" ?/)(/ #/0 ?%$@ )1 &0C06%40& ])"  1 =+(."+ %$ :%3"*532+ >(%3%6,? [

    0=467)"1 #/0 -)%6%9)(76 4$%-603 ])" @$7+%'5*($6 .(*+%/01 ][ %C0$C)0?1 #/0 ('$$0"# 1#7#0 %2

    #/0 7$# ])" >+("- A()7%+, %- $5*3"(* 2*(' )"45"$*($6 ] 7"& 9)C01 #/0 $7#)%"760 2%$ #/)1 4$%O0(#

     ])" !()*%&"+, %- .(/01 B, *%.=5727(%$23 2==+%2*A")? *

    T/0 "0=# 10(#)%" ]  C+%D"*7 %&"+&("E?   (%C0$1 #/0 4$%O0(#Z1 37)" %-O0(#)C0 7"& 467"")"9 #%

    37#0$)76)^0 )#1 9%761 2$%3 #/0 (/$%"%6%9)(76 7"& 0(%"%3)( 4%)"#1 %2 C)0?* !# 761% %'#6)"01 7

    -71)( 1(/037#)(1 %2 #/0 1#$'(#'$0 %2 #/0 &01)9"0& 1%2#?7$0 #% 9)C0 7 47"%$73)( C)0? %2 #/0

    &)220$0"# 1#041 #% 2%66%? )" %$&0$ #% $07(/ #/010 9%761*

    J'$)"9 #/0 C+%D"*7 @.=3"."$727(%$ 10(#)%" #/0 37)" 1#041 %'#6)"0& )" #/0 4$0C)%'1 10(#)%" 7$0

    &0#7)60& 71 6%"9 71 1#$7#09)01 #% $07(/ #/0 3)>B, (7"&)&7#01 7$0 0=467)"0&*

    +

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    6/63

     8+2$*()*% !9 :%+;$

    S)"766:[ )" /")537)[ #/0 37)" $01'6#1 %2 #/0 1%2#?7$0 %'#4'# 7$0 0=467)"0& )" 7 10# %2 -)%6%9)(76

    1734601 #/7"@1 #% #/0 >(%.2+F"+) 2$' G5)*"=7(B(3(7, H$(7  2$%3 #/0  I27232$ @$)7(757" %- 

    J$*%3%6,9

    !" 7" 0=#$7 I%$*35)(%$) 2$' -($23 +".2+F) 10(#)%" :%' ?)66 2)"& 0=4%10& #/0 37)" &)22)('6#)01

    7$)10" &'$)"9 #/0 &0C06%430"# %2 #/)1 4$%O0(#[ 4%11)-60 )34$%C030"#1 7"& #/%'9/#1 7-%'# #/0

    7(#'76 &0()1)%"1 #7@0" 7# #/0 -09)"")"9 %2 #/0 4$%O0(# ?)#/ #/0 40$140(#)C0 9)C0" -: #/0

    %-#7)"0& $01'6#1*

    R

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    7/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    3 6.25%. '* 7'$%10$". 82'$'9:

    ,1 0C0$: (%34'#)"9 4$%O0(#[ -02%$0 9%)"9 )"#% #/0 7-1#$7(#)%" %2 4$%-603 1%6C)"9[ #/0$0 )1 7

    "00& #% '"&0$1#7"& #/0 "7#'$0 %2 #/0 G'01#)%"1 #/7# (7" 7$)10 )" #/0 4$%(011* T/0$02%$0 )1

    "0(0117$: #% (%660(# 1%30 )"2%$37#)%" 7-%'# #/0 2)06& #/7# )"C%6C01 #/0 G'01#)%" #% 1%6C0* T/)1

    10(#)%" 7)3 )1 #% 4$%C)&0 1%30 $060C7"# )"2%$37#)%" 7"& 7 271# 47"%$73)( C)0? %2 #/0 -)%6%9:

    -7(@9$%'"& #/7# 1'44%$# #/0 4$%O0(# 4$%-603*

    ;2(% 2, 1'/%/ 2* /%'

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    8/63

     8+2$*()*% !9 :%+;$

    a][ 2%$3)"9 7 &%'-60V/06)= 1#$'(#'$0 ?)#/ +Rf #'$"1 ]E 7$31#$%"9 0G'761 EM VX

    (0"#)30#0$1]* T/)1 (747()#: %2 37@)"9 &)220$0"# "'3-0$ %2 /:&$%90" -%"&1 #% 07(/ 47)$ %2

    -7101 9)C01 7 140()76 4$%40$#: #% JB, 3%60('601K &"'(#')#*+* >0&'"&7"(: 766%?1 JB,&",-.*)/.$#  ?/0" -%#/ (/7)"1 7$0 1047$7#0& 2$%3 07(/ %#/0$[ 71 #/0 1730 )"2%$37#)%" )1

    4$010"# )" -%#/ 1#$7"&1[ 76#/%'9/ 0"(%&0& )" 7 (%346030"#7$: ?7:* >0&'"&7"(: 761%

    766%?1 #% $047)$ 0$$%$1 )"#$%&'(0& )" JB,[ ?/0" 4%11)-60[ -: 3)137#(/ $047)$ 30(/7")131

    #$)990$0& -: #/0 (066*

    ,"%#/0$ @0: 7140(# %2 JB, 3%60('601 )1 #/0)$ 7-)6)#: #% (/7"90* JB, (7" -0 76#0$0& -:

    $7"&%3 3'#7#)%"1 )"#$%&'(0& 0)#/0$ -: $046)(7#)%" 0$$%$1[ -7& 3)137#(/ $047)$1 %$ -:

    0=#0$"76 790"#1 6)@0 $7&)7#)%" #/7# 76#0$1 #/%10 (/03)(76 (%34%'"&1 (%"2%$3)"9 #/0 &%'-60V/06)= 7"& 37@)"9 4%11)-60 7" )34%$#7"# 7140(# %2 6)C)"9 %$97")131K "0$-(/.$#*

     !"#$%& ( G*A"." %- 7A" '%5B3"

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    9/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    =2#'*01$%21 "12/ ", +4% >'.?2*9 1'6: '( +4% 9%*'5%

    T/0 90"%30 )1 7 4$0()%'1 4%11011)%" 2%$ 7 (066* T/0$02%$0[ )# 3'1# -0 4$%#0(#0& 7"& 17C0&

    (7$02'66:* !# )1 @"%?" #/7# "%# 766 90"%3)( 6%(7#)%"1 7$0 (%347(#0& #/0 1730 ?7:* a0"01

    -0)"9 7(#)C06: 0=4$0110& 7$0 6%(7#0& )" 6011 (%347(# $09)%"1 #% 766%? #/0 #$7"1($)4#)%"

    37(/)"0$: #% 7((011 #/0)$ (%&0 ?/)60 "%"V7(#)C0 90"01 7$0 @04# )" 3%$0 &0"10 $09)%"1

    (%347(#0& -: 1%30 4$%#0)"1 (7660& /)1#%"01 )" 7 1#$'(#'$0 @"%?" 71 (/$%37#)"* !" #/)1

    (%"&0"10& (/$%37#)"[ 90"0 (%&0 )1 4$0C0"#0& 2$%3 '""0(0117$: /7^7$&1*

    S'$#/0$3%$0[ 90"01 0"(%&0& )" #/0 90"%30 7$0 90"0$)(* T/0: 3'1# -0 '102'6 )" 0C0$: (066

    )" 766 #)11'01 %2 7" %$97")13[ -'# 90"01 7$0 @"%?" #% /7C0 &)220$0"# $%601 &040"&)"9 %" #/0

    (066 #:40 #/0: 7$0 -0)"9 0=4$0110& %"* T/'1[ #/0 2'"(#)%"76)#: %2 7 90"0 )1 "%# %-#7)"0&

    &)$0(#6: 2$%3 )#1 (%&0* a0"01 7$0 "1,&"22"' )"#% $)-%"'(60)( 7()& 3%60('601 _>B,` (7660&

    30110"90$ >B, _3>B,`*

    >B, )1 7 3%$0 67-)60 C0$1)%" %2 JB,[ 7"& #/'1 0$$%$V4$%"0* T/)1 )1 &'0 #% #/0 /:&$%=:6

    9$%'4 (%"#7)"0& )" #/0 FZ 4%1)#)%" %2 #/0 $)-%10 #/7# (7" 7(# 71 7 "'(60%4/)60 797)"1# #/0 $01#

    %2 #/0 3%60('60 _S)9* F`* H%30 C)$'101 '10 >B, )"1#07& %2 JB, 71 #/0 C0/)(60 2%$ #/0)$

    90"%30 7"& #/)1 (%"20$1 7" 7&C7"#790 #% #/03 37@)"9 #/0)$ (%&0 6011 1#7-60 7"& 3%$0

    &)22)('6# #% &0#0(# #% #/0)$ /%1#1 )33'"0 1:1#031* T:4)(766:[ >B, 3%60('601 7$0 1)"960V

    1#$7"&0& 7"& 73%"9 #/0)$ "'(60%-7101 )1 2%'"& 5+2*(3 _g` )"1#07& %2 T*

    S%$ 7 90"0 -0)"9 0=4$0110&[ 7" >B, 4%6:30$710 3'1# 7((011 )#1 (%&0 )" #/0 90"%30[ %40"

    #/0 JB, &%'-60V/06)= 7"& 1#7$# /&)#2*&.3.#!  )# )"#% >B,* T/0 90"0Z1 (%"#0"# )1 #/0"

    (%4)0& )"#% 3'6#)460 3>B, 3%60('601 7"& #/0 3%$0 3>B, 3%60('601 %2 7 90"0 7$0

    4$%&'(0&[ #/0 3%$0 0=4$0110& )1 17)& #% -0 #/7# 90"0*

    T/0"[ 3>B, 3%60('601 (7" -0 4$%(0110& -: 76#0$"7#)C0 146)()"9 9)C)"9 71 7 $01'6# &)220$0"#C0$1)%"1 %2 7 90"0* H%30 %2 #/010 C0$1)%"1 7$0 @"%?" #% -0 #)11'0V140()2)( 7"& /7C0

    &)220$0"# &02)"0& 2'"(#)%"1 )" #/0 (066* B%"V146)(0& 3>B, 3%60('601 7$0 @"%?" 71  =+"V

    3>B,[ ?/)60 146)(0& %"01 7$0 "730& .275+" 3>B,*

    \%?0C0$[ 2%$ 3%1# %2 #/0 90"01 #% -0 2'"(#)%"76 )# )1 "0(0117$: #%   /&)#2-)/" #/03 2)$1# )"#%

    4$%#0)"1* N$%#0)"1 7$0 73)"% 7()& (/7)"1 ]761% (7660& 4%6:404#)&01]* T/0: 7$0 #/0 2)"76

    2'"(#)%"76 4$%&'(# %2 7 90"0 7"& #/0 7(#%$ #/7# 467:1 #/0 3%1# )34%$#7"# 47$#K #/0 (7#76:#)(

    A

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    10/63

     8+2$*()*% !9 :%+;$

    4$%(011 #/7# 60#1 )#1 )"#0"&0& -)%6%9)(76 2'"(#)%" #% #7@0 467(0* S'$#/0$3%$0[ 4$%#0)"

    1:"#/01)1 2$%3 7 37#'$0 3>B, (7" /7C0 &)1#)"(# %'#(%301*

    >)-%1%301 ]#/0 3%60('67$ (%3460=01 $0G')$0& 2%$ 3>B, #$7"167#)%" )"#% 4$%#0)"][ $07&

    3>B, #$7"1($)4#1 )" #$)460#1* QC0$: #/$00 (%"10('#)C0 -7101 (%$$014%"& #% 7 '")G'0 73)"%

    7()& )" #/0 4$%#0)"* H%[ 2%$ 7 1)"960 3>B,[ #/$00 4%11)-60  +"2'($6 -+2.") 0=)1# &040"&)"9

    %" #/0 #$7"167#)%" 1#7$#)"9 4%)"#* Q7(/ 1#7$#)"9 4%)"# ?/0$0 7 $)-%1%30 1#7$#1 )#1 4$%#0)"

    1:"#/01)1 )1 (7660& 7"  %="$ +"2'($6 -+2."  _P>S`[ 7"& #/0$0 (7" -0 3'6#)460 %2 #/03

    &040"&)"9 %" #/0 (%"($0#0 4%1)#)%" #/0: 7$0 467(0& 76%"9 #/0 3>B,*

    8'# )# )1 )34%$#7"# #% $0303-0$ #/7# 4$%#0)"1 7$0 "%# #/0 %"6: (7#76:#)( 467:0$1 )" #/0 (066*

    H%30 >B, 3%60('601 761% /7C0 7 &02)"0& 2'"(#)%" -: #/03106C01* T/)1 )1 #/0 (710 %2

    $)-%1%376 >B, _$>B,` ?/)(/ (%"2%$31 #/0 $)-%1%301 76%"9 ?)#/ 1%30 $)-%1%376

    4$%#0)"1 7"& 37@01 #/0 #$7"167#)%" %2 3>B, )"#% 73)"% 7()& (/7)"1* P#/0$ >B, 3%#)C01

    (7" 761% /7C0 7 2'"(#)%" ?)#/%'# -0)"9 #$7"167#0& )"#% 4$%#0)" 7"&[ "%?7&7:1[ 7 -)9

    $0107$(/ 7$07 /71 -00" 2%'"& )" #/0 1% (7660& "%"V(%&)"9 >B, _"(>B,`[ 7 1'-9$%'4 %2

    ?/)(/ 7$0 3)($%>B, _3)>B,` 3%60('601 #/7# ?)66 -0 #/0 37)" 2%('1 %2 #/)1 #0=#*

     !"#$%& ) G7+5*75+" %- 2 /01 )7+2$'

    X

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    11/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    7%,,%*9%. =@3 "*/ 6.'+%2* $%A%$,

    T/0 #:4)(76 3>B, 1#$'(#'$0 2%$ 7 37#'$0 3>B, ]#/0 %"0 76$07&: 4$%(0110& -:

    76#0$"7#)C0 146)()"9] )1 $04$010"#0& )" S)9* +* T/)1 )1 #/0 >B, 3%60('60 #/7# #/0 $)-%1%30

    ?)66 $07& 7"& ?)66 '10 71 7 3%6& #% 71103-60 73)"% 7()&1 )" #/0 %$&0$ &)(#7#0& -: #/0

    4$%#0)"V(%&)"9 90"0 #% 1:"#/01)^0 )#1 (%$$014%"&)"9 4%6:404#)&0*

     !"#$%& * G7+5*75+" %- 2 7,=(*23 A5.2$ =+%7"($ *%'($6 ./01 ($*35'($6 7A" 5$7+2$)327"' +"6(%$) LHM/)N

    ,1 ?0 (7" 100[ "%# 766 3>B, 10G'0"(0 )1 #$7"167#0& )"#% 4$%#0)"* P"6: #/0  *%'($6 )"45"$*"

    _LJH` 2$7930"# ?)66 -0 (%"C0$#0& #% 73)"% 7()&1* S67"@)"9 #/0 LJH 7$0 ;Z 7"& +Z

    5$7+2$)327"' +"6(%$)  _gT>` #/7# &014)#0 "%# -0)"9 #$7"167#0& (7" 47$#)()47#0 )" #/0 90"0

    #$7"167#)%" $09'67#)%" 71 ?0 ?)66 100 67#0$ %"*

    ,# 2)$1#[ )# ?71 #/%'9/# #/7# 4$%#0)" 60C061 1/%'6& -0 1%30?/7# 4$%4%$#)%"76 #% #/0

    0=4$011)%" 60C06 %2 7 90"0 )" 3>B,* \%?0C0$[ 4$%#0%3)( 1#'&)01 1%%" $0C0760& #/7# #/)1 )1

    "%# $0766: #/0 (710* T/0 73%'"# %2 7 4$%#0)" )" 7 (066 &040"&1 %" 7 C7$)0#: %2 7140(#1 1'(/

    71 #/0 1#7-)6)#: %2 #/0 4$%#0)" %$ )#1 1:"#/01)1 7"& &09$7&7#)%" $7#01* >0(0"#6:[ #$7"167#)%"76

    $09'67#)%" /71 030$90& 71 7" )34%$#7"# @0: 27(#%$ 9%C0$")"9 4$%#0)" 60C061* S$%3 #/0 4%%6

    %2 7C7)67-60 3>B, 3%60('601 )" 7 (066 "%# 766 %2 #/03 7$0 #$7"167#0& ?)#/ #/0 1730

    022)()0"(:* 5%1# )34%$#7"#6:[ #$7"167#)%" 37(/)"0$: (7" -0 &)$0(#0& #% #/%10 3>B, #/7# #/0

    (066 $0G')$01 )" 7 (0$#7)" 3%30"#[ )" 7 4$%(011 @"%?" 71  7+2$)327(%$23 *%$7+%3* B%#

    1'$4$)1)"96:[ 3>B, 0=4$011)%" )1 ('$$0"#6: 100" 3%$0 6)@0 7 -'220$ %2 4%#0"#)76 4$%#0)"1 2%$

    #/0 (066 #/7" 6)@0 #$7&)#)%"76 90"0 0=4$011)%"*

     ! #$%&' ()**&+, )+ (&-- .,$/(,/$&

    !" 90"0$76 #0$31[ 766 0'@7$:%#)( (0661 (7" -0 &)C)&0& )" #?% 37)" (%347$#30"#1K #/0

    "'(60'1 7"& #/0 (:#%1%6* T/0 (:#%1%6 )1 )1%67#0& 2$%3 #/0 0"C)$%"30"# #/$%'9/ 7 6)4)&

    -)67:0$ #/7# 766%?1 #/0 37)"#0"7"(0 %2 #/0 (%"&)#)%"1 #/7# 37@0 4%11)-60 #/0 (%$$0(#

    2'"(#)%")"9 %2 #/0 (066[ ?/7# )1 @"%?" 71 (066  A%."%)72)()* T/0 "'(60'1 )1[ 7# #/0 1730

    D

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    12/63

     8+2$*()*% !9 :%+;$

    #)30[ )"1)&0 7"& )1%67#0& 2$%3 #/0 (:#%1%6 -: 7" 7&&)#)%"76 6)4)& -)67:0$*

    c/)60 #/0 90"0#)( (%&0 )1 6%(7#0& 7"& #$7"1($)-0& )" #/0 "'(60'1[ )# )1 %" #/0 (:#%1%6 #/7#

    30110"90$ >B, )1 #$7"167#0& )"#% 4$%#0)"* T/)1 )346)01 #/7# -%#/[ $>B, 7"& 3>B, 3'1#

    -0 0=4%$#0& 2$%3 #/0 "'(60'1* S'$#/0$3%$0[ 3)($%>B, ]?/)(/ ?)66 -0 &01($)-0& )" #/0

    "0=# 10(#)%"] 761% 3'1# -0 0=4%$#0& #% #/0 (:#%1%6 -: 4$%#0)"1 @"%?" 71 "K=%+7($)  #%

    7(/)0C0 )#1 2'"(#)%"*

    EM

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    13/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    -*+.'/012*9 521.'=@3

    5)($%>B, 7$0 13766 "%"V(%&)"9 >B, 3%60('601 (%34$)1)"9 1)^01 73%"9 ED #% FR

    "'(60%#)&01 (747-60 %2 3%&'67#)"9 90"0 7(#)C)#: #/$%'9/ #/0 -6%(@790 %2 #/0 #$7"167#)%"

    4$%(011 %2 7 90"0Z1 3>B, #% )#1 4$%#0)" 4$%&'(# _Q1#0660$ 5[ FMEE`* , 3)>B, -)"&1 #% )#1

    #7$90# 3>B, -: (%346030"#7$)#: %2 10G'0"(01 /7340$)"9 #/0 4$%9$011)%" %2 #/0 4%6:404#)&0

    06%"97#)%" -: #/0 $)-%1%30 7"&h%$ 4$%3%#)"9 (607C790 7"& &09$7&7#)%" %2 #7$90#0& #$7"1($)4#1*

    T/)1 $09'67#%$: (747-)6)#)01 37@0 3)>B, )"#0$01#)"9 3%60('601 71 #7$90#1 2%$ 3%60('67$

    #/0$74)01 ?/)(/ $0G')$0 &)$0(#0& 1)60"()"9 %$ 7(#)C7#)%" %2 90"01 7"& 4%11)-60 9%%& (6)")(76

    &)10710 -)%37$@0$1*

    7'$%10$". ,+.01+0.%

    , 37#'$0 3)>B, 3%60('60 )1 7 1)"960 1#$7"&0& >B, %2 ED #% FR  B2)" =2(+) _-4` 60"9#/

    ?)#/ 7 &02)"0& )""' $09)%" 7($%11 "'(60%#)&01 F #% A %$ F #% X* T/)1 100& $09)%" /71 7 @0:

    $%60 )" #/0 140()2)()#: %2 #/0 3%60('60 2%$ )#1 #7$90# 3>B, 71 )# -)"&1 )#1062 #% #/0 +Z gT> %2

    #/0 30110"90$ -: 10G'0"(0 (%346030"#7$)#:* T/0 $01# %2 #/0 10G'0"(0 %2 7 3)>B, 37:

    761% )"#0$7(# ?)#/ #/0 30110"90$ >B, -: (%346030"#7$)#: )" 7" 7&&)#)C0 ?7:* T/0 1#$%"90$

    #/0 (%346030"#7$)#: %2 7 3)>B, 2%$ )#1 #7$90#[ #/0 3%$0 4$%-7-)6)#: %2 #/0 #7$90# #% -0(607C0& -: "'(607101 $0($')#0& -: #/010 3)>B,V3>B, )"#0$7(#)%"1*

     !"#$%& +  !(--"+"$7 -%+.) %- .(*+%/01 23%$6 (7) B(%6"$")() =+%*"))

    802%$0 -0(%3)"9 7 1)"960 1#$7"&0& 3%60('60[ 3)>B,1 1'220$ 7 10$)01 %2 3%&)2)(7#)%"1

    2$%3 7" %$)9)"76 >B, A2(+=($  %$ 6%%4 1#$'(#'$0 _ =+(V3)>B,` 1:"#/01)^0& -: >B,

    4%6:30$710 !! ]%"0 %2 #/0 4%6:30$7101 )" (/7$90 %2 #$7"1($)-)"9 JB, 90"01 )"#% #/0)$

    EE

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    14/63

     8+2$*()*% !9 :%+;$

    >B, 2%$3] _S)9* R`*

    T/010 6%%41 (7" -0 2%$30& 0)#/0$ -: 140()2)(766: 0"(%&0& 3)>B, 90"01 )" #/0 1% (7660&

    *)#$#.*)- ,)/56)+ %$ %$)9)"7#0& 2$%3 )"#$%")( $09)%"1 %2 7 90"0 ]#/%10 )"1)&0 7 90"0 -'#

    "%# (%&)"9 2%$ 4$%#0)" 7"& 146)(0& %'# %2 #/0 37#'$0 3>B, &'$)"9 3>B, 37#'$7#)%" -:

    #/0 146)(0%1%30] 2%66%?)"9 7" 76#0$"7#)C0 %$ #$#7*)#$#.*)- ,)/56)+ _S)9* ;`*

    T/0 0=)1#0"(0 %2 "%"V(7"%")(76 3)($%>B, 37@01 &)22)('6# 3)>B, 4$0&)(#)%" -:

    (%34'#7#)%"76 769%$)#/31 47$1)"9 #/0 90"%30[ 71 #/0: (7" -0 2%'"& "%# 71 7 1)"960 '")G'0

    JB, 207#'$01 -'# %-2'1(7#0& )"1)&0 %#/0$ @"%?" %"01* T/)1 7140(# )1 %"0 %2 #/0

    2'"&730"#76 3%#)C7#)%"1 #/7# 37@0 3)>B, &)1(%C0$: 2$%3 13766 >B, 10G'0"()"9 7"

    7##$7(#)C0 744$%7(/ #% 6%%@ 2%$ #/03*

    82'9%*%,2,

    S)$1#[ )" #/0 "'(60'1 %2 #/0 (066[ >B, 4%6:30$710 !! 1:"#/01)^01 7" >B, 3%60('60 -710& %"

    #/0 90"%30 10G'0"(0* T/)1 (7" -0 2$%3 0)#/0$ 7 3)>B, 90"0 _ =+(V3)>B,` %$ 7 4$%#0)"V

    (%&)"9 90"0 )" ?/)(/ )"#$%"1 7$0 4$010"# 1/%$# /7)$4)"1 _6%%41` #/7# (7" -0 4$%(0110& -:

    J$%1/7* J$%1/7 )1 7 "'(60710 4$%#0)" )" (/7$90 %2 0=()1)"9 #/0 6%%4 2$%3 #/0 >B, 1#$'(#'$0

    2%$30& -: #/0 4%6:30$710 $01'6#)"9 )" 7 137660$ 6%%4 (7660& =+"V3)>B,*

    !# )1 )34%$#7"# #% "%#0 #/7# 2$%3 07(/  =+"V3)>B, 1#$'(#'$0 7 #%#76 %2 F 37#'$0 3)>B, (7"

    -0 4$%&'(0&[ %"0 2$%3 07(/ 0=#$030 %2 #/0 6%%4[ 9)C)"9 467(0 #% #/0 1%V(7660& ;4V3)>B,

    %$ +4V3)>B,*

    T/0 =+"V3)>B, (7" -0 0=4%$#0& 2$%3 #/0 "'(60'1 #% #/0 (:#%46713 -: 7 #$7"14%$#0$ 4$%#0)"

    (7660& Q=4%$#)" ; _dNP;`* !" #/0 (:#%46713[ 7 4$%#0)" (%3460= )1 $0($')#0& )"(6'&)"9 J)(0$

    7"& 7" ,aPEVR 4$%#0)"1* ,aP $0(%9")^01 #/0 &%'-60 1#$7"&0& 47$# %2 #/0  =+"V3)>B,

    ?/)60 J)(0$ &%01 #/0 1730 ?)#/ #/0 6%%4* c)#/ )#1 "'(60710 7(#)C)#:[ J)(0$ -$07@1 #/0 6%%4607C)"9 #/0 1#$7"&0& 47$# %2 #/0 3%60('60 ?)#/ ,aP ?/)(/ #/0" &0()&01 ?/)(/ 37#'$0

    3)>B, 10G'0"(0 @0041 7"& ?/)(/ %"0 6)-0$7#01 #% -0 &09$7&0&* P"(0 7 1)"960 1#$7"&0&

    >B, (%$$014%"&)"9 #% 7 37#'$0 3)>B, )1 '")#0& #% ,aP[ #/0 $0($')#30"# %2 #/0 >!HL

    (%3460= ]#/7# ?)66 )"#0$7(# ?)#/ #7$90# 3>B,] )1 4$%&'(0&*

    >037$@7-6:[ "%# 76?7:1 -%#/ 37#'$0 3)>B, #/7# (7" -0 %$)9)"7#0& 2$%3 7 1)"960 4$0V

    3)>B, 6%%4 /7C0 7 2'"(#)%"76 $%60* !" 1%30 (7101[ %"0 %2 #/0 #?% 3%60('601 )1 $74)&6:

    EF

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    15/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    &09$7&0& 7"& %"6: %"0 -0(%301 7 2'"(#)%"76 3)>B,* !" %#/0$ (7101[ -%#/ 3%60('601 (7"

    -0 2'"(#)%"76 7"& ,aP 4$%#0)" &0()&01 ?/)(/ %"0 @0041 ?)#/ &)220$0"# 4$%-7-)6)#)01 -710&

    %" #/0 10G'0"(0 %2 #/0 1#$'(#'$0*

     !"#$%& ,  I2$%$(*23 2$' $%$B, #$7"167#)%"*

    T/0$02%$0[ #/0 4$010"(0 %2 #/0 3)>B, )347)$1 #/0 0=4$011)%" %2 )#1 #7$90# 90"01*

    T/0 (6711)(76 30(/7")13 )1 #% &)22)('6# #/0 4$%9$011)%" %2 #/0 $)-%1%30 76%"9 #/0 3>B,

    #/7# )1 -%'"& #% #/0 3)>B, -: (%346030"#7$)#: %2 10G'0"(01[ 1#766)"9 #/0 4$%#0)" 1:"#/01)1

    '"#)6 #/0 $)-%1%30 '"-)"&1 #/0 30110"90$ 3%60('60* !" (7101 ?/0$0 10G'0"(0

    (%346030"#7$)#: -0#?00" #/0 3)>B, 7"& )#1 #7$90# )1 /)9/[ #/0 4$%#0)" (%3460= $0($')#0&

    -: 3)>B, _ /@GI ̀ (7" 2%$(0 #/0 (607C790 %2 #/0 30110"90$ >B,[ 06)3)"7#)"9 )" 27(# #/0

    E+

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    16/63

     8+2$*()*% !9 :%+;$

    0=4$011)%" %2 #/7# 90"0*

    !# /71 -00" &01($)-0& 71 ?066 #/0 )347)$30"# %2 #/0 4$0V)")#)7#)%" (%3460= 2%$37#)%"[

    $0G')$0& 2%$ #/0 #$7"167#)%" %2 3>B, 2$%3 #/0 $)-%1%30 -02%$0 #/0 $)-%1%30 )#1062 )1

    $0($')#0&* P#/0$ 3%$0 0=%#)( )"C01#)97#0& 30(/7")131 )"(6'&0 #/0 $0($')#30"# %2 4$%#07101

    #/7# 37: &09$7&0 #/0 4$%#0)" 1:"#/01)^0& -: #/0 $)-%1%30 71 1%%" 71 )# (%301 %'# %2 #/0

    $)-%1%376 (%3460=[ %$ &)$0(#6: -6%(@ #/0 -)"&)"9 %2 #/0 XMH $)-%1%30 #% #/0 "'(60%#)&0

    (/7)"*

    !# )1 )34%$#7"# #% $037$@ #/7# #/0 3)>B, -: )#1062 /71 "% 2'"(#)%" ?)#/%'# #/0 4$%#0)"1 7"&

    4$%#0)")( (%3460=01 #/7# )# $0($')#1 #% 7220(# )#1 #7$90#1* T/010 4$%#0)" (%3460=01 (7$$:)"9

    #/0 3)($%>B, 0=4%10 #/0 100& $09)%" %2 #/0 10G'0"(0 #% 766%? 2%$ (%346030"#7$)#:

    /:-$)&)^7#)%" #% #/0 #7$90# 7"& #% &)$0(# #/03106C01 #% #/0 $09'67#)"9 3>B, 10# &)(#7#0& -:

    &)220$0"# 3)>B, 1)9"7#'$01 4$%9$7330& -: #/0 (066 )" 7 (%"#0=#V140()2)( 37""0$

    _&040"&0"# %" #)11'0[ &0C06%430"#76 1#790i`*

     !"#$%& - G*A".27(* %- 7A" '(--"+"$7 '")*+(B"' ."*A2$().) %- 2*7(%$ -%+ .(*+%/01

    ER

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    17/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    8.2%( 42,+'.: '( *01$%21 "12/ ,%B0%*12*9

    C"*9%. ,%B0%*12*9

    !" EDAA[ S$0&0$)(@ H7"90$ &0C06%40& 7 30#/%& 2%$ JB, 10G'0"()"9 ?/)(/ "%?7&7:1 )1

    1#)66 (%"1)&0$0& 71 #/0 9%6& 1#7"&7$& )" #/0 )"&'1#$: _H7"90$ S[ 0# 76* EDAA`* !# )1 "%# 1(767-60

    -'# (%"C0")0"# 2%$ 13766V1(760 4$%O0(#1* !#1 90"0$7#0& $07&1 7$0 $067#)C06: 6%"9 7"& )1 '10&

    71 7 (%"2)&0"# #0(/")G'0 )" %$&0$ #% C76)&7#0 2)"&)"91 &%"0 -: %#/0$ 744$%7(/01 &'0 #% )#1

    1)346)()#: 7"& $06)7-)6)#:*

    P$)9)"76 H7"90$ 10G'0"()"9 (%"1)1#1 )" 2%'$ 1047$7#0& $07(#)%"1 )" ?/)(/ 7 JB, #03467#0[

    JB, 4$)30$ 7"& 7 4%6:30$710 7$0 3)=0& #%90#/0$ ?)#/ 7 4%%6 %2 &BTN ] 

    &0%=:"'(60%1)&0#$)4/%14/7#01 %$ JB, -7101K &8TN[ &9TN[ &:TN 7"& &;TN] 7"& 7

    (%"C0")0"# 73%'"# %2 &&BTN _&)&0%=:"'(60%#)&01` 2%$ 07(/ &)220$0"# $07(#)%"* &&BTN1

    67(@ +ZVP\ 9$%'4[ ?/)(/ )"/)-)#1 #/0 (747-)6)#: %2 0=#0"& #/0 JB, 10G'0"(0 -: #/0

    4%6:30$710 ?/0" )# )1 )"(%$4%$7#0& #% #/0 4%6:30$)^7#)%" $07(#)%"* &&BTN 73%'"# '10& )1

    1'22)()0"# #% 1#)66 766%? 2%$ #/0 0C0"#'76 1:"#/01)1 %2 #/0 6%"9 %$)9)"76 #$7"1($)4# O%)"#6: ?)#/

    766 &)220$0"# 4%11)-60 1/%$#0$ 60"9#/1 10G'0"(01 )" 7 4$%-7-)6)1#)( ?7:*

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    18/63

     8+2$*()*% !9 :%+;$

    4%4'67$)^0& )" #/0 ")"0#)01 ?)#/ #/0 \'37" a0"%30 N$%O0(# 7"& 1)3)67$ 4$%O0(#1*

    , #'"0& C0$1)%" %2 #/)1 #0(/")G'0 )1  A("+2+*A(*23 )A%765$ )"45"$*($6[ (%"1)1#)"9 %" #/0

    1060(#)%" %2 #/0 3)")3'3 "'3-0$ %2 2$7930"#1 #/7# (%C0$ #/0 0"#)$0 90"%30 #% 7(/)0C0

    3%$0 #/$%'9/4'# ?)#/ 6011 )"2$71#$'(#'$0[ #/%'9/ $0G')$)"9 3%$0 (%3460= 769%$)#/31*

    @%07(#)%" _NL>` _H7)@) >j[ 0# 76* EDXX`*

    J014)#0 NL> /71 -00" 7 9$07# #%%6 )" 3%60('67$ -)%6%9:[ 7 @"%?" 4$%-603 ?)#/ #/7#

    #0(/")G'0 )1 #/0 2%$37#)%" %2 (/)30$)(76 10G'0"(01 -: /:-$)&)^7#)%" %2 '"140()2)(

    2$7930"#1 76%"9 #/0 )#0$7#)C0 4$%(01101 %2 /:-$)&)^7#)%"V&0"7#'$76)^7#)%" %2 JB, 1#$7"&1*

    P-C)%'16:[ #/7# )1 7 ")9/#37$0 )" #/0 10G'0"()"9 2)06&*

    H%6)&V4/710 7346)2)(7#)%" 766%?1 #% 2)= JB, 2$7930"#1 %" 7 1'$27(0[ #/0$02%$0 #/0: (7""%#

    /7C0 4/:1)(76 (%"#7(# ?)#/ %#/0$ 2$7930"#1 &'$)"9 #/0 7346)2)(7#)%" 4$%(011 7"& 07(/

    06%"97#)%" %2 7 JB, 1#$7"& (7" %(('$ )"&040"&0"#6: 2$%3 766 #/0 %#/0$1[ ?)#/ &)220$0"#

    #034%1* T/)1 1)#'7#)%" )1 )&076[ 1)"(0 )# 60#1 '1 #% 47$76606)^0 #/0 4$%(011 7"& 37@0 )# 7 /)9/V

    #/$%'9/4'# #0(/")G'0*

    T/0 4$)"()476 &$7?-7(@ %2 BaH #0(/"%6%9: )1 #/0 60"9#/ %2 #/0 %-#7)"0& 10G'0"(01* c)#/

    H7"90$[ #/0 9%6& 1#7"&7$&[ :%' (7" 90# 67$90 2$7930"#1 %2 -710 47)$1 7# #/0 0=40"10 %2 )#1

    (%1# )" #0$31 %2 #)30 7"& 3%"0:* !" #/0 BaH ?%$6&[ ?/0$0 ?0 90# $07&1 %2 F;MV;MM-4 ?)#/

    R;R _>%(/0` -710& %" 03'61)%" NL> 7"& 4:$%10G'0"()"9 %$ +MVE;M-4 ?)#/ !66'3)"7

    _H%60=7` -710& %" -$)&90 NL> 7"& 10G'0"()"9 -: 1:"#/01)1[ (%34'#7#)%"76 769%$)#/31 7$0"0(0117$: #% 90# )"2%$37#)%" %2 #/0 1734601 7"76:^0&* ,"& /0$0 )1 ?/0$0 1/%$# $07& 76)9"0$1

    /7C0 7" )34%$#7"# $%60*

    a)C0" 7 $020$0"(0 90"%30[ 1/%$# $07& 76)9"0$1 #$: #% 374 BaH $07&1 #% &)220$0"# $09)%"1

    766%?)"9 &0C)7#)%"1 2$%3 #/0 $020$0"(0* H/%$# 60"9#/ %2 10G'0"(01[ 4$%-7-)6)#: %2 0$$%$1 )"

    #/0 $07& 7"& #/0$02%$0 0$$%$1 )" #/0 3744)"91 7$0 %C0$(%30 -: ?%$@)"9 ?)#/ 7 6%# %2 &7#7 )"

    %$&0$ #% 3)")3)^0 0$$%$ )" 7 4$%-7-)6)1#)( ?7:* H% ?0 3'1# -0 1'$0 %'$ $09)%" %2 )"#0$01# )1

    EI

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    19/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    10G'0"(0& 0"%'9/ #)301 #% 90# 7 (%"10"1'1 -0#?00" &)220$0"# $07&1 )" %$&0$ #% (766 7 -710[

    ?/7# )1 (7660& 173460 (%C0$790*

    T/0$0 7$0 7 6%# %2 769%$)#/31 '10& #% 76)9" 10G'0"(01 #% $020$0"(01[ 2$%3 16%? -'# $06)7-60

    %"01 6)@0 8

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    20/63

     8+2$*()*% !9 :%+;$

    EX

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    21/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    D2,1'A%.: '( 52=@3 (.'5 1'560+"+2'*"$ "66.'"14%,

    \)9/6: (%"10$C0& 4$)37$: 10G'0"(01 O%)"#6: ?)#/ (/7$7(#0$)1#)( 10(%"&7$: 1#$'(#'$0 %2 3)>B,

    7"& #/0)$ 4$0('$1%$1 7$0 '10& 76#%90#/0$ #% 2)"& "%C06 3)($%>B, 90"01* 5'6#)460 1#$7#09)01 7$0

    7C7)67-60 #% &)1(%C0$ "%C06 3)>B,[ 0)#/0$ 2$%3 /%3%6%9: 2%'"& -: 76)9"30"# %2 @"%?"

    3)>B, )" %#/0$ 140()01 )"#% %'$ #7$90# 90"%301[ (%34'#7#)%"766: 47$1)"9 #/0 90"%30 6%%@)"9

    2%$ $0(%9")^7-60 47##0$"1 %2 3)>B,V6)@0 $09)%"1 %$ &)99)"9 )"#% )1%67#0& 10G'0"(01 2%'"& )"

    >B, 10G'0"()"9 6)-$7$)01 90"0$7#0& )" #/0 ?0# 67-%$7#%$:*

    D2,1'A%.: #: ('.>"./ 9%*%+21,E 0& +)1) "*/ #: 4'5'$'9:

    P"0 744$%7(/ )1 #% 76)9" 76$07&: @"%?" 3)>B, 4$0('$1%$ 10G'0"(01 )" %#/0$ 140()01 #% #/0&01)$0& %$97")13 90"%30 )" %$&0$ #% 2)"& /%3%6%9)01 C)7 6%(76 76)9"30"# #0(/")G'01 6)@0

    8

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    22/63

     8+2$*()*% !9 :%+;$

    )'560+"+2'*"$ 6.%/21+2'* #: 5"142*% $%".*2*9

    c)#/ #/0 0=)1#)"9 10# %2 @"%?" 3)>B,[ 769%$)#/31 -710& %" 37(/)"0 607$")"9 (7" -0

    #$7)"0& #% &0#0(# 140()2)( 207#'$01 #/7# 37@0 7 90"%3)( 10G'0"(0 7 9%%& (7"&)&7#0 #%0C0"#'766: -0(%30 7 3)>B,* S%6&V-7(@[ (%"10$C7#)%" )"2%$37#)%" %2 10G'0"(0 7"&

    10(%"&7$: 1#$'(#'$0 (7" -0 0C76'7#0& #% (6711)2: >B, 1#$'(#'$01 )"#% 3)>B,V6)@0

    10G'0"(01 %$ "%#* a0"%30 10G'0"(01 %$ $07&1 4$%(00&)"9 2$%3 /)9/V#/$%'9/4'#

    10G'0"()"9 0=40$)30"#1 (7" -0 '10& 2%$ #/)1 @)"& %2 &0#0(#)%"*

    , 4%1#0$)%$ (%347$)1%" ?)#/ 76$07&: 7""%#7#0& 10G'0"(01 3'1# -0 &%"0 #% @004 '"&0$

    (%"#$%6 27610 4%1)#)C01 7"& #7@0 (7$0 %2 4%11)-60 &'46)(7#)%" 0"#$)01 )"#% 7""%#7#)%"

    &7#7-7101* H3766 >B, 6)-$7$)01 2%$ #/0 %$97")13[ )2 7C7)67-60[ (7" -0 '10& #% 71101 )2 #/0(7"&)&7#01 7$0 -0)"9 2%'"& 7# #/0 >B,V60C06 )" 7C7)67-60 1734601 71 7 4$%%2 %2 #/0)$

    (747-)6)#: %2 -0)"9 7" '")&0"#)2)0& 3)>B,*

    -/%*+2(21"+2'* (.'5 ,5"$$ =@3 ,%B0%*12*9 #",%/ '* " .%(%.%*1%9%*'5%

    ,"%#/0$ 744$%7(/ #% #/0 )&0"#)2)(7#)%" %2 "%C06 3)>B, )1 -710& %" #/0 90"0$7#)%" %2

    6)-$7$)01 %2 13766 >B, 10G'0"()"9* T/010 6)-$7$)01 (%"1)1#1 %" #/0 )1%67#)%" %2 13766

    2$7930"#1 %2 >B, 2%'"& )" 7 173460 4$)%$ #% #/0)$ 10G'0"()"9 -: B0=# a0"0$7#)%"

    H0G'0"()"9 30#/%&1 _BaH`* P"(0 #/0 10G'0"()"9 $07&1 7$0 %-#7)"0&[ #/0: 7$0 76)9"0&

    797)"1# #/0 $020$0"(0 90"%30 2%$ #/0 %$97")13 #/0: 4$%(00& 7"& $09)%"1 (%C0$0& -: #/%10

    $07&1 7$0 )&0"#)2)0& )"#% #/7# 90"%30*

    5)($%>B,V6)@0 $09)%"1 7$0 071)6: &0#0(#0& ?/0" -%#/[ ;4 7"& +4 3)>B, 7$0 2%'"&[ 71

    #/0: 7$0 $067#)C06: (6%10 #% 07(/ %#/0$ )" #/0 90"%30 O'1# 1047$7#0& #% 766%? #/0 2%$37#)%"

    %2 #/0 3%$0 %$ 6011 (%3460= >B, 6%%4[ 9)C)"9 71 7 $01'6# #?% (6%106: 6%(7#0& 407@1 %2

    (%C0$790 %2 744$%=)37#06: FF "'(60%#)&01 07(/ %"0*

    Q=46%$7#)%" %2 13766 >B, 6)-$7$)01 ?)#/ B0=# a0"0$7#)%" H0G'0"()"9 #0(/")G'01 (7" -0

    &01)$7-60 #% )&0"#)2: 6%? 0=4$0110& 3)>B, 7"& &)22)('6# #% C76)&7#0 (7"&)&7#0 $09)%"1*

    BaH #0(/")G'01 60# '1 C)0? )2 #/010 10G'0"(01 0=)1# %$ "%# )" %'$ 6)-$7$: 7# 7 967"(0* T/)1

    4%)"# )1 #/0 %"0 #/7# #/)1 4$%O0(# ?)66 4'# #/0 2%('1 )"*

    FM

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    23/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    F.'G%1+ HA%.A2%>

    ,1 ?0 /7C0 4$0C)%'16: 100"[ 10G'0"(0 %2 3)>B, 3%60('601 )1 3%1#6: (%"10$C0& )" )#1 (0"#$76$09)%" ])"(6'&)"9 #/0 100&] ?)#/ #/0 0=(04#)%" %2 1%30 4%6:3%$4/)13[ ?/)60 ;Z 7"& +Z 0"&1

    %2 #/0 "'(60)( 7()& (/7)" 7$0 3%$0 C7$)7-60[ ?/)(/ $01'6#1 )" #/0 0=)1#0"(0 %2 7 10# %2 1% (7660&

    )1%3)>1 2%$ 07(/ 3)>B, _B0)610" LT[ 0# 76* FMEF`* 3)>8710[ T/0 7'#/%$)#7#)C0 3)>B,

    &7#7-710 (%"#7)"1 766 #/0 @"%?" 10G'0"(01 2%'"& )" #/0 6)#0$7#'$0 %$ 4$0&)(#0& -:

    (%34'#7#)%"76 30#/%&1 _a$)22)#/1Vb%"01 H[ 0# 76* FMMI`*

    P" 7"%#/0$ "%#0[ 13766 >B,V10G )1 7 /)9/V#/$%'9/4'# #0(/")G'0 #/7# 766%?1 #/0 10G'0"()"9 %2

    1/%$# >B, 3%60('601 4$010"# )" 7 4$047$0& 173460 )"C%6C)"9 #/0)$ 7346)2)(7#)%" #% 3)")3)^0#/0 $)1@ %2 %-#7)")"9 0$$%"0%'1 $07&1 -: 1)3'6#7"0%'16: 10G'0"()"9 #/0 1730 3%60('60

    3'6#)460 #)301*

    g4 #% &7#0[ 0=)1#0"# 76)9"30"# #%%61 #7@0 1'-10G'0"(01 %2 10G'0"(01 #% 76)9" 71 #/0  *%+"

    +"6(%$[ 7"& '1'766: #/)1 47$# )1 1060(#0& 2$%3 #/0 3%1# $06)7-60 47$# %2 7 $07&K )#1 -09)"")"9[

    ?/0$0 -710 G'76)#)01 7$0 #/0 -01#* 5744)"9 #/)1  *%+" +"6(%$ #% 90"%3)( (%%$&)"7#01 ]#/0 1%

    (7660& )""' 1#04] )1 1#$7)9/#2%$?7$&[ '1)"9 0)#/0$ /71/)"9 #0(/")G'01 %$ 8'$$%?1Vc/0060$

    #$7"12%$3 769%$)#/31 _8'$$%?1 5[ 0# 76* EDDR`* T/0"[ 7"  "K7"$)(%$ )7"=  #7@01 (/7$90 %237#(/)"9 #/0 $037)"&0$ %2 #/0 $07& #% #/0 1060(#0& 90"%3)( 6%(7#)%" )" %$&0$ #% C76)&7#0 %$ #%

    &)1(7$& #/0 4%11)-60 37#(/* T/)1 "K7"$)(%$ )7"=  )1 (%34'#7#)%"766: 0=40"1)C0[ 140()766: )"

    67$90 90"%301 )" ?/)(/ $040#)#)C0 10G'0"(01 7$0 2$0G'0"# 7"& 607& #% 3'6#)460 ?$%"9 90"%3)(

    6%(7#)%"[ 6)@0 #/0 /'37" 90"%30*

    c)#/ 13766 >B,V10G[ %2#0" $07&1 7$0 "%# 766)9"0& &)$0(#6: #% #/0 0"#)$0 90"%30[ -'# #% 7

    ('$7#0& &7#7-710 %2 76$07&: @"%?" 3)>B,[ 6)@0 3)$8710* !" #/0 (710 %2 3)>B,[ #/0 0=#$0301

    %2 #/0 10G'0"(0 3)9/# "%# -0 (%"10$C0&[ 4%11)-6: )"#0$20$)"9 ?)#/ )")#)76 1#041 %2 #$7&)#)%"7676)9"30"# 1%2#?7$0[ 7"& #/0 27(# #/7# 3)>B, 7$0 $04$010"#0& -: C0$: 13766 10G'0"(01 (7"

    &)22)('6# #/0 76)9"30"#*

    >0(0"#6:[ 7 3'6#)V100& 1#$7#09: /71 -00" 4'-6)1/0& 2%$ 3744)"9 $07&1 #% 7 $020$0"(0 90"%30

    '1)"9 #/0 )""'

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    24/63

     8+2$*()*% !9 :%+;$

    2%$ 90"%3)( 6%(7#)%"1[ 7"& #/7# 90"%3)( 6%(7#)%" #/7# /71 3%$0 C%#01 )1 #/0 7((04#0& %"0[ ?)#/

    #/0 2)"76 76)9"30"# &)$0(#6: (%34'#0& (%'"#)"9 1'-$07& 3744)"91 06)3)"7#)"9 )" 9$07#

    3071'$0 #/0 %C0$6%7& %2 #/0 0=#0"1)%" 1#04*

    !" #/)1 4$%O0(#[ ?0 ?)66 '10 #/0 )""'

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    25/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    !   H%2#?7$0 #01#)"9iiiiiiiiiiiiiiiii ;M /

    !   J%('30"#7#)%"iiiiiiiiiiiiiiiiiEMM /

    :?@ 5

    I1'*'521 #0/9%+

    c)#/ R;M / )" 3)"&[ 7" 0(%"%3)( -'&90# (7" -0 &%"0 #% 01#)37#0 #/)1 4$%O0(#Z1 (%1# )" (710

    )# 1/%'6& -0 0=#0$"76)^0& -: 7 (%347":*

    T/0 -)%6%9: %C0$C)0? (%301 2%$ 2$00[ 71 0&'(7#)%"76 (%1# )1 "%$3766: 47$# %2 )"&)C)&'76

    2%$37#)%" 7"& "%# 711'30& -: #/0 ('1#%30$* S%$ #/0 1%2#?7$0 &01)9"[ &0C06%430"# 7"&#01#)"9[ 7" 7&C7"(0& -)%)"2%$37#)()7" %$ (%34'#7#)%"76 -)%6%9)1# )1 "00&0& ?)#/ 7"

    01#)37#0& (%1# %2 IM lh/* T/0 &%('30"#7#)%" %2 #/0 4$%O0(# (7" -0 067-%$7#0& -: 7

    &%('30"#7$)1# %$ 7&3)")1#$7#)C0 ?%$@0$ ?)#/ @"%?60&90 )" (%34'#)"9 ?)#/ 7" 01#)37#0&

    (%1# %2 +M lh/*

    !   8)%6%9: %C0$C)0? _M lh/ = F;/`iiiiiiiiiii* M l

    !   H%2#?7$0 &01)9" _IM lh/ = A;/`iiiiiiiiiiR[;MM l

    !   H%2#?7$0 &0C06%430"# _IM lh/ = FMM/`iiiiii* EF[MMM l

    !   H%2#?7$0 #01#)"9 _IM lh/ = ;M/`iiiiiiiiii+[MMM l

    !   J%('30"#7#)%" _+M lh/ = EMM/`iiiiiiiiii+[MMM l

    :(%.2+F"+) 2$' G5)*"=7(B(3(7, H$(7  2$%3 #/0 I27232$ @$)7(757" %- J$*%3%6,9

    F+

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    26/63

     8+2$*()*% !9 :%+;$

    F.'6',%/ 626%$2*% '0+$2*%

    S%$ #/0 17@0 %2 1)346)()#:[ 4$04$%(011)"9 1#041 7$0 "%# 1/%?" )" #/0 &)79$73 -06%?[ 71 #/0:

    $0G')$0 #/0 '10 %2 0=)1#0"# 1%2#?7$0 ?)#/)" 7 871/ 1($)4# 4$%4%$#)%"0& )" ,""0= ,*

    FR

     !"#$%& .  C(="3($" 7% ('"$7(-, .(/01 -+%. ).233 /01

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    27/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    F.'G%1+ -56$%5%*+"+2'*

    T/0 2)$1# 744$%7(/ (%"1)&0$0& 2%$ #/)1 4$%O0(# ?71 #% '10 47)$?)10 76)9"30"#1 73%"9 )"4'#10G'0"(01 #% 2)"& 3)>B,V6)@0 10G'0"(0 (7"&)&7#01* J'0 #% #/0 (%1# )" (%34'#7#)%"76 #)30 2%$

    #/)1 744$%7(/[ )# ?71 $74)&6: &)1(7$&0& )" 27C%$ %2 #/0 )""'B,V6)@0 (7"&)&7#0

    9$%'41 7$0 &01($)-0&[ 7# #/0 1730 #)30 #/7# $060C7"# 2%$376)#)01 %2 #/010 4$%(0&'$01 7$0

    %'#6)"0&*

    F.%6.'1%,,2*9 '( 2*60+ J3C!K (2$%,

    T/0 2)$1# 1#04 )1 #% 40$2%$3 7 G'76)#: (%"#$%6 (/0(@ #% )"4'# $07&1[ &)1(7$&)"9 #/%10 ?)#/ 6%?

    G'76)#)01 %$ 267990& 71 -7& $07&1 -: #/0 10G'0"(0$* ,66 10G'0"()"9 467#2%$31 4$%C)&0 1'(/

    #%%61 #% 0"1'$0 #/0 G'76)#: %2 #/0)$ 9)C0" %'#4'#* T/)1 )1 #/0 (710 %2 L,H,W, _!66'3)"7` R[

    76#/%'9/ )"&040"&0"# 1%2#?7$0 761% 0=)1# 6)@0 S,HTdVT%%6@)#; %$ S71#ULI*

    c)#/ $037)")"9 G'76)#: $07&1[ 7&74#%$ #$)33)"9 3'1# -0 40$2%$30& #% $03%C0 2$%3 #/0

    10G'0"(01 #/0 6)"@)"9 7&74#%$1 )"#$%&'(0& &'$)"9 #/0 6)-$7$: 4$047$7#)%" $0G')$0& 2%$ #/0

    10G'0"()"9 7346)2)(7#)%" 4$%(011* H%30#)301[ 761% -7$(%&01 7$0 )"(6'&0& )" #/010 7&74#%$1 #%

    )&0"#)2: 10G'0"(01 4$%C)&)"9 2$%3 3'6#)460 &)220$0"# 1734601 10G'0"(0& 7# %"(0[ ?/7# ?0 (766

    3'6#)460=0& 10G'0"()"9 ]#/)1 9)C01 (/0740$ 10G'0"(0$ $'"1 7# #/0 0=40"10 %2 6%?0$ 40$V

    173460 (%C0$790]*

    R   /##41Khh1'44%$#*)66'3)"7*(%3h10G'0"()"9h10G'0"()"9m1%2#?7$0h(717C7*/#36;   /##4Khh/7""%"67-*(1/6*0&'h271#=m#%%6@)#hI   /##4Khh???*-)%)"2%$37#)(1*-7-$7/73*7(*'@h4$%O0(#1h271#G(h

    F;

    https://support.illumina.com/sequencing/sequencing_software/casava.htmlhttp://hannonlab.cshl.edu/fastx_toolkit/http://www.bioinformatics.babraham.ac.uk/projects/fastqc/https://support.illumina.com/sequencing/sequencing_software/casava.htmlhttp://hannonlab.cshl.edu/fastx_toolkit/http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    28/63

     8+2$*()*% !9 :%+;$

    , #/)$& %4#)%"76 4$04$%(011)"9 1#04 )1 #% 2)6#0$ $07&1 -710& %" $07& 60"9#/* !" #/)1 (710[ 71 ?0

    7$0 100@)"9 $07&1 (%34$)10& -0#?00" EI 7"& +M "'(60%#)&0 -7101[ ?0 &)1(7$& 766 $07&1 7-%C0

    +M 7"& -06%? EI -7101 60"9#/* T/)1 1/%'6& /064 #% 7C%)& "%)10 2$%3 47$#)766: &09$7&0& >B,0C0"#'766: (74#'$0& &'$)"9 6)-$7$: 4$047$7#)%"*

    =%"/ 1'$$"6,2*9 "*/ .%6.%,%*+"+2A2+: (2$+%.2*9

    !" %$&0$ #% 17C0 (%34'#)"9 #)30[ #/0 2)$1# 1#04 %2 %'$ 76)9"30"# 1#$7#09: (%"1)1#1 %" (%66741)"9

    766 )&0"#)(76 $07&1* S%$ (67$)#:[ )" #/)1 4$%O0(#[ 7 $07& )1 &02)"0& 71 7 4)0(0 %2 )"4'# 7"& 7

    10G'0"(0 71 7 '")G'0 (/7$7(#0$ 1#$)"9 #/7# $04$010"#1 7 $07&* T/0$02%$0[ 3'6#)460 $07&1 (7" -0

    $04$010"#0& -: 7 '")G'0 10G'0"(0[ 7"& %"0 10G'0"(0 (7" -0 4$010"# 3%$0 #/7" %"(0 )" #/0

    )"4'# ?/)60 -0)"9 (%"1)&0$0& %"0 '")G'0 10G'0"(0*

    Q=4$011)%" 4$%2)601 %2 3)($%>B, 7$0 '1'766: %2 )"#0$01#* S%$ #/)1 $071%"[ @004)"9 $0(%$& %2

    #/0 "'3-0$ %2 #)301 7 10G'0"(0 74407$1 )" #/0 )"4'# )1 "0(0117$:* H%30 $07&1[ 7"& 1%30#)301

    7 6%# %2 #/03[ 37: 74407$ 7 $0766: 6%? "'3-0$ %2 #)301 _)*0* %"0`* T7@)"9 )"#% 7((%'"# #/7# #/)1

    &7#7 (%301 2$%3 7" 7346)2)(7#)%" 4$%(011[ ?/0$0 #/0 10G'0"(0$ 7346)2)01 #/0 173460 6)-$7$:

    -02%$0 $0766: 40$2%$3 7": 10G'0"()"9[ #/)1 )1 7 1#$7"90 (710 4%)"#)"9 #% 1%30 4%11)-60 0$$%$

    0)#/0$ 2$%3 #/0 10G'0"(0$ %$ #/0 NL> 1#04 _j0-1(/'66 b5[ 0# 76* FME;`* !" (%"10G'0"(0[

    10G'0"(01 74407$)"9 6011 #/7" 7 #/$01/%6& (7" -0 2)6#0$0& %'# -02%$0 9%)"9 7": 2'$#/0$* 8:

    FI

     !"#$%& /  O+2=A(*23 +"=+")"$727(%$ %- +"2' *%332=)($6 .%7(&27(%$

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    29/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    &027'6#[ )" #/0 1%2#?7$0 &0C06%40& )" #/)1 4$%O0(# #/)1 $04$010"#7#)C)#: C76'0 )1 10# #% ; #)301*

    ,": 10G'0"(0 ?)#/ 6011 %$ 0G'76 #/7" ; $07&1 )1 "%# 9%)"9 #% -0 4$%(0110&*

    3$29*5%*+ ,+."+%9:

    D% 8.02G* 9."64 1'*,+.01+2'* "*/ 1'*+29 ",,%5#$:

    c)#/ (%667410& $07&1 )"#% 10G'0"(01 7"& 72#0$ 2)6#0$)"9 -: $04$010"#7#)C)#: #/$01/%6&[

    10G'0"(01 7$0 #/0" '10& #% 40$2%$3 7 &0 8$')O" 9$74/ ?)#/ #/0)$ F V30$1* , &027'6# F  C76'0

    %2 EE )1 #7@0"* T/)1 C76'0 )1 13766 0"%'9/ #% (74#'$0 C7$)7-)6)#: )" 10G'0"(01 7"& -)9

    0"%'9/ #% 6)3)# #/0 0=#0"# %2 14'$)%'1 (%"1#$'(#0& (%"#)91[ &'0 #% #/0 27(# #/7# %'$

    10G'0"(01 7$0 -0#?00" EI 7"& +M "'(60%#)&01 60"9#/*

    P"(0 7 &0 8$')O" 9$74/ )1 -')6#[ #/%10 "%&01 #/7# (%30 2$%3 6011 #/7" 7 #/$01/%6& 10G'0"(01

    7$0 $03%C0& 2$%3 #/0 9$74/* T/0 &027'6# #/$01/%6& C76'0 )1 F* T/)1 6)3)#1 2'$#/0$ 0$$%"0%'1

    47#/1 9)C)"9 71 7 $01'6# (/)30$)( (%"#)91*

    T/0 9$74/ )1 #/0" C)1)#0& 2$%3 07(/ %2 )#1 "%&01 '"#)6 766 #/0 '"73-)9'%'1 (%"#)91 7$0

    2%'"&* g"73-)9'%'1 (%"#)91 7$0 &02)"0& -: #/0 6%"901# 47#/1 ?)#/%'# -$7"(/01 )" #/0

    9$74/* T/010 (%"#)91 ?)66 -0 #/0 %"01 7$%'"& ?/)(/ 766 #/0 %#/0$ $07&1 ?)66 -0 9$%'40&

    6%%@)"9 2%$ 3)>B,V6)@0 10G'0"(01*

    2&&0 .,&3E -*/%

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    30/63

     8+2$*()*% !9 :%+;$

    &)220$0"# 2$%3 %#/0$ 4$%(01101 &01($)-0& _@nR -: &027'6# )" %'$ 1%2#?7$0`*

    4),%+5 .,&3E C%B0%*1%, A'+2*9 1'*+29 1"*/2/"+%,

    P"(0 #/0 (%"#)9 )"&0= /71 -00" 10# '4[ )# )1 #)30 #% 766%? #/0 10G'0"(01 #% C%#0 2%$ #/0)$

    4$020$$0& (%"#)9* T/)1 )1 7(/)0C0& -: /7C)"9 07(/ F V30$ _%$ 1'-10G'0"(0 %2 60"9#/ F ̀ 2$%3

    7 10G'0"(0[ 6%%@)"9 )"#% #/0 )"&0= 2%$ (%"#)91 (%"#7)")"9 #/0  F V30$ 7"& 90"0$7#)"9 7 C%#0

    2%$ 07(/ %2 #/0 4%1)#)%"1 )" #/0 (%"#)9 ?/0$0 #/0 F V30$ -09)"")"9 )1 2%'"&*

    , C%#0 )1 7 #'460 %2 )"#090$1 L=*P =) N[ ?/0$0 =* )1 #/0 4%1)#)%" %2 #/0 F

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    31/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    1'-10G'0"(0 %2 60"9#/ F  2$%3 ) ] %(('$1 )" #/0 10G'0"(0 )* H%[ 2%$ 766 #/0 F V10G'0"(01 #/7#

    7$0 "%# F V30$1 %2 )[ #/0)$ (%%$&)"7#0 C76'0 ?)66 -0 ^0$%*

    a)C0" #/0 4$0C)%'1 &02)")#)%" ?0 (7" &02)"0 7 &)1#7"(0 6)@0

    'F")E[ )F#$%

    ($E

    RF 

    " =")E#(& =")F#(#F _E`

    T/0 2%$3'67 )" _E` )1 (7660& #/0  G452+"' R5*3('(2$ '()72$*"  -0#?00" =L)S N  7"& =L)T N* !#

    1/%'6& -0 "%#0& #/7# 2%$ (%347$)1%" 4'$4%101[ #/)1 &)1#7"(0 )1 1'22)()0"# ?)#/%'# #7@)"9 #/0

    1G'7$0 $%%#[ -'# )# )1 "%# 7 7+5" )727()7(*23 '()72$*"[ #/7# 307"1 )# "%# 17#)12)01 #/0 #$)7"960

    )"0G'76)#: _c' Tb[ 0# 76* EDDA`* T$)7"960 )"0G'76)#: 1#7#01 #/7#  7A" )5. %- 7A" 3"$67A) %- 2$,

    7E% )('") .5)7 B" 6+"27"+ 7A2$ %+ "4523 7% 7A" 3"$67A %- 7A" +".2($($6 )('"  _5%/7330&

    ,j[ 0# 76* FMME`* S%$ 7 #$'0 1#7#)1#)(76 &)1#7"(0[ ?/0" "00&0&[ #/0 1G'7$0 $%%# %2 #/0 C76'0

    1/%'6& -0 #7@0"*

    a)C0" #/0 "7#'$0 %2 %'$ 4$%-603[ ?/0$0 10G'0"(01 37: -0 1/%$# 7"& C7$)7-60 )" 60"9#/[ #/)1

    ?7: %2 (%'"#)"9 &)1#7"(01 37: -0 3)1607&)"9* T/0$02%$0[ ?0 3%&)2: 16)9/#6: #/0 &02)")#)%"

    %2 (%%$&)"7#0 =L)N( #% -0 7 2$0G'0"(: %2 #/0 F V30$ &02)"0& -: 7 F V10G'0"(0 )" ) 71 7 ?7: #%

    $03%C0 2$%3 #/0 3%&06 7": 10G'0"(0 60"9#/ $067#0& -)71*

     =")#($%(('$$0"(01 %2 F V30$ (  )" )

    #%#76 F V30$1 )" )  _F`

    !# )1 @"%?" #/7# F V30$ %(('$$0"(0 )1 C7$)7-60 -0#?00" 90"%301 2$%3 &)220$0"# 140()01[ 7"&

    #/7# 90"0$766: F V30$1 &% "%# /7C0 #/0 1730 (/7"(0 #% 74407$ )" 7 -)%6%9)(76 10G'0"(0* S%$

    #/)1 $071%"[ 7"%#/0$ &02)")#)%" %2 &)1#7"(0 #/7# #7@01 )"#% 7((%'"# #/0 1#7"&7$& &0C)7#)%" %2

    F V30$ %(('$$0"(0 )1 3%$0 1')#7-60 )" %'$ (710

     !F")E[ )F#$%($E

    RF 

    ! =")E#(& =")F#('(   #

    F

    _+`

    ?)#/ 7 &)79%"76 (%C7$)7"(0 37#$)=

    '($  E

     0 &E%$$E

     0 

    ! =")$#(&((#F

    _R`

    #7@)"9 B 71 #/0 "'3-0$ %2 #%#76 10G'0"(01 )" %'$ &7#710#[ 7"&

    FD

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    32/63

     8+2$*()*% !9 :%+;$

    (($ E 0 %$$E

     0 

     =")$#(   _;`

    T/0 2%$3'67 )" _+` )1 @"%?" 71 #/0 :2A232$%B() '()72$*"  _57/767"%-)1 NL[ ED+I` #/7#?)#/ 7 &)79%"76 (%C7$)7"(0 37#$)= )1 @"%?" 71  $%+.23(U"' R5*3('"2$ '()72$*"  _c' Tb[

    EDDA`* T/)1 )1 #/0 &)1#7"(0 &02)")#)%" #/7# #/0 1%2#?7$0 &0C06%40& &'$)"9 #/)1 4$%O0(# 7&%4#1*

    !" %'$ 10##)"9[ 10G'0"(0 &)1#7"(0 )1 %"6: (%34'#0& 2%$ #/0 %C0$674 -0#?00" 10G'0"(01 ?/0"

    #/0 %2210# -0#?00" #/03 )1 @"%?"*

    ,1 7 2)"76 $037$@ )" (%"#)9 711)9"30"#1[ 2%$ #/0 0=#$030 (710 ?/0$0 &)1#7"(01 -0#?00" #?% %$

    3%$0 (%"#)91 #% 7 10G'0"(0 7$0 )&0"#)(76[ 7 2)"76 &0()1)%" )1 #7@0" -: 60=)(%9$74/)(76 %$&0$ #%

    711)9" #/0 10G'0"(0 #% 7 (%"#)9 )" %$&0$ #% 0"1'$0 7 &0#0$3)")1#)( -0/7C)%'$ %2 #/0 769%$)#/3 #%

    $07(/ $04$%&'()-60 %'#4'#1*

    I..'., /%+%1+2'*

    P"(0 07(/ 10G'0"(0 )1 711)9"0& #% 7 (%"#)9[ $037)")"9 (%"#)91 ?)#/ ^0$% 711%()7#0& 10G'0"(01

    7$0 $03%C0& 2$%3 303%$:* ,61%[ (%"#)91 ?)#/ 2'9)#)C0 C%#01 (7" -0 )"140(#0& #% 97)"

    @"%?60&90 7-%'# ?/)(/ 10G'0"(01 $0(0)C0& #/%10 C%#01 #% 100 )2 #/%10 (%"#)91 (%'6& -0

    30$90&* !"#09$)#: %2 (%"#)91 )1 #/0" 7110110& #% $03%C0 -7& 10G'0"(01K 2%$ 07(/ 10G'0"(0 )" 7(%"#)9[ #/0 90"0$7#0& C%#01 1/%'6& -0 )" )"($071)"9 %$&0$*

    ,1 3)($%>B, /7C0 10G'0"(01 $067#)C06: C7$)7"# ]$0(766 ()%.(/) ][ 1%30 C7$)7-)6)#: )"1)&0 7

    (%"#)9 3'1# -0 766%?0&* T% 6)3)# #/0 0=#0"# %2 C7$)7-)6)#: )" 7 (%"#)9[ 7" %'#6)0$ &0#0(#)%"

    30#/%& )1 )346030"#0& '1)"9 #/0 &02)")#)%" %2 10G'0"(0 &)1#7"(0 7""%'"(0& 7-%C0*

    S%$ 07(/ 10G'0"(0 )" 7 (%"#)9[ 7 4%)"# ?)#/ R F  (%%$&)"7#01 ]07(/ $04$010"#)"9 7 F V30$ 7"&

    2)660& ?)#/ #/0 2$0G'0"(: %2 F V30$ %(('$$0"(0 )" #/0 10G'0"(0] )1 &02)"0& _@ n R -: &027'6#`*

    T/0"[ #/0 &)1#7"(0 2$%3 #/0 4%)"# $04$010"#)"9 #/0 (%"#)9 10G'0"(0 #% 07(/ %2 #/0 10G'0"(0

    4%)"#1 )1 (%34'#0&* L76('67#)"9 )"#0$G'7$#)60 $7"90 _!U>` 2%$ #/0 &)1#7"(01[ %'#6)0$ (7"&)&7#01

    7$0 2%'"& 7-%C0 U+ Y E*;_!U>`* T/010 (7"&)&7#01 7$0 #/0" )1%67#0& 2$%3 #/0 37)" 10#[ #/0

    (0"#$%)& %2 #/0 $037)")"9 4%)"#1 )1 2%'"& 7"& #/0 1#7"&7$& &0C)7#)%" %2 &)1#7"(01 2$%3 #/0

    $037)")"9 4%)"#1 #% #/0 (0"#$%)& )1 (%34'#0&* !2 &)1#7"(0 2$%3 07(/ %2 #/0 (7"&)&7#0 4%)"#1 #%

    #/0 (0"#$%)& )1 9$07#0$ #/7" #/0 1#7"&7$& &0C)7#)%"[ #/0 (7"&)&7#0 )1 &02)")#06: $03%C0& 2$%3

    #/0 (%"#)9*

    +M

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    33/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    =%"142*9 " 1'*,%*,0, ,%B0%*1%

    c)#/)" ('$7#0& (%"#)91[ 10G'0"(01 C%#)"9 2%$ #/03 (7" 1#7$# -02%$0 #/0 (%"#)9 10G'0"(0 %$ 0"&

    72#0$ )#* H% ?0 "00& 7 $04$010"#7#)C0 (%"10"1'1 10G'0"(0 #% $04$010"# #/7# 9$%'4 4$%-7-6:-)990$ #/7" #/0 (%"#)9 10G'0"(0 )#1062*

    T% #/)1 0"&[ 7" %2210# 2%$ 07(/ 10G'0"(0 -710& %" #/0)$ C%#01 2%$ #/0 (%"#)9 )1 (%34'#0& #%

    76)9" 47)$0& "'(60%#)&01* #/0"[ 2%$ 07(/ %C0$6744)"9 -710 7($%11 #/0 3'6#)460 10G'0"(01 )"1)&0

    #/0 9$%'4[ #/0 2$0G'0"(: %2 %(('$$0"(0 2%$ 07(/ 2%'"& -710 )1 (76('67#0&[ 7"& #/0 3%1#

    $04$010"#7#)C0 -710 )1 @004 71 #/0 (%"10"1'1 %"0 2%$ #/7# 4%1)#)%"*

    -/%*+2(:2*9 "$.%"/: "**'+"+%/ 52=@3,1 7 2)"76 7"& %4#)%"76 1#04[ 9$%'4 (%"10"1'1 10G'0"(01 7$0 76)9"0& #% 7 &7#7-710 %2 7""%#7#0&

    3)>B, &%?"6%7&0& 2$%3 3)>8710* T/)1 76)9"30"# (7" -0 '10& #% )&0"#)2: (%"#)91 #% 76$07&:

    &01($)-0& 3)>B,[ '"C0)6 &)220$0"# (%"#)91 -0)"9 #/0 1730 3)>B, _-0(7'10 %2 )1%3)>1 %$

    '"7C7)67-)6)#: %2 '"73-)9'%'1 (%"#)91 )" #/0 &0 8$')O" 9$74/` 7"& #% &0#0(# 4%11)-60

    10G'0"(01 4$010"# )" #0 &7#710# #/7# (%'6& -0 3)>B, "%# :0# 4$010"# )" #/0 ('$7#0& &7#7-710*

    S%$ #/)1 1#04[ 8

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    34/63

     8+2$*()*% !9 :%+;$

    +F

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    35/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    =%,0$+,

    g1)"9 #/0 &01)9"0& 1%2#?7$0[ 2)C0 13766 >B,V10G 6)-$7$)01 %2 "%$376 (%6%")( 3'(%17 2$%347#)0"#1 ?)#/ (%6%$0(#76 (7"(0$ 2$%3 #/0 LP(%.2+F"+) 2$'

    G5)*"=7(B(3(7, H$(7 7# I27232$ @$)7(757" %- J$*%3%6, ?0$0 4$%(0110&*

    ,1 1'337$)^0& %" #7-60 E[ 6)-$7$: &04#/1 ?0$0 C7$)7-60 )" #/0 $7"90 %2 E #% +*; 3)66)%" $07&1

    744$%=)37#06:* 802%$0 10G'0"(0 (%667410[ 10G'0"(0 &7#7 ?71 $0&'(0& #% 7" 744$%=)37#0 EMq

    %2 %$)9)"76 )"4'#* ,((%'"#)"9 10G'0"(01 ?)#/ 3%$0 #/7" ; %(('$$0"(01[ #/0 "'3-0$1 ?0$0

    2'$#/0$ &0($0710& #% 7$%'"& 7 ;q %2 '")G'0 10G'0"(01[ 1/%?)"9 #/7# #/0 C71# 37O%$)#: %2

    10G'0"(01 /7& C0$: 20? $07&1*

    T/010 6%?V(%'"# 10G'0"(01 (%'6& -0 4$010"# 71 (%"#73)"7"#1 %$ &09$7&7#)%" 4$%&'(#1 (%3)"9

    2$%3 0=)1#0"# #%#76 >B, )" #/0 1734601 #/7# ?0$0 (74#'$0& ?/)60 2)6#0$)"9 2%$ 10G'0"(0 60"9#/[

    #/%'9/ 1%30 0$$%$1 )"#$%&'(0& -: #/0 NL> 7346)2)(7#)%" (7" 761% -0 $014%"1)-60 2%$ #/010

    $07&1*

    0123& (  !")*+(=7(&" %- )2.=3") =+%*"))($6 %57=57 

    C"56$% !'+"$ .%"/,  M*2B0%

    ,%B0%*1%,

    C%B0%*1%, >2+4

    N O '110..%*1%,

    7 '/%, 2* /%

    8.02G* 9."64

      -*2+2"$ 1'*+29, J2*"$ 9.'06,

     !"##$%&   !"!##"$%& !!$"&'' $(")%% $!"(%* !"(*# $"+&+

     !"#"'%&   %"&'#"* !''"*#) $("!'+ $!"$)! !"%$$ $")*'

     !"#(#%&   $"%(#"%(% $!'"'*$ )"*%# )"$*+ $"%+' +#)

     !"#')%&   $"!+*")%' $!!"+)) +"+*+ +"'!' $")+& &(%

     !"#*+%&   $"#!$"$%& $#!"*!) &"$') +"#') $"((+ &()

    T/0 "'3-0$ %2 $07&1 40$ 2)"76 (%"#)9 1/%? 7" 0=4%"0"#)76 &)1#$)-'#)%"[ 71 100" )" 2)9'$0 D*

    S0? (%"#)91 90# 3%1# %2 $07&1[ ?/)60 7 6%"9 #7)6 %2 (%"#)91 %"6: 90# 7 20? $07&1* T/)1 3071'$09)C01 C76'0 2%$ (%"#)9 7-'"&7"(0 7"& #/)1 )1 (%"1)1#0"# ?)#/ 0=)1#0"# 90"0 0=4$011)%" 7"76:1)1[

    ?/0$0 6%9V#$7"12%$37#)%" %2 0=4$011)%" C76'01 )1 90"0$766: 7((04#0& 71 7 "%$376)^7#)%" 1#04

    -02%$0 ?%$@)"9 ?)#/ 90"0$7#0& &7#7 )" 90"0 0=4$011)%" 0=40$)30"#1*

    Q=4%"0"#)76 &)1#$)-'#)%" )1 761% 100" )" #/0 "'3-0$ %2 10G'0"(01 )" 07(/ (%"#)9 _S)9* EM`* c0

    1/%'6& $0(766 #/7# 7 10G'0"(0 (7" -0 $04$010"#7#)C0 %2 %"0 %$ 3%$0 $07&1[ 1% )#1 "'3-0$ )1 "%#

    "0(0117$)6: $067#0& ?)#/ (%"#)9 7-'"&7"(0[ -'# )# 9)C01 7 40$140(#)C0 %" &)C0$1)#: %2 3%60('601*

    ++

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    36/63

     8+2$*()*% !9 :%+;$

     !"#$%& 4  V()7%6+2.) %- +"2') ="+ *%$7(6 -%+ "2*A 2$23,U"' )2.=3"9

    +R

     !"#$%& (5  >%K =3%7 %- )"45"$*") ="+ *%$7(6

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    37/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    T7-60 F &01($)-01 #/0 "'3-0$ %2 37#(/01 -02%$0 76)9")"9 (%"10"1'1 10G'0"(01 2$%3 2)"76

    (%"#)91 797)"1# 3)>8710 ?)#/ 8

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    38/63

     8+2$*()*% !9 :%+;$

    +I

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    39/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    )'*1$0,2'*, "*/ (2*"$ .%5".?,

    c0 (7" (%"(6'&0 #/7# #/0 1%2#?7$0 )&0"#)2)01 3)($%>B,* ,3%"9 (%"#)91 "%# 37#(/)"9 #/03)>8710[ ?0 (7"Z# 0"1'$0 #/0: 7$0 3)>B, -02%$0 7 -)%6%9)(76 C76)&7#)%" )" #/0 67-* T/)1

    10G'0"(01 (7" -0 2$')# %2 47$#)76 &09$7&7#)%" %2 6%"90$ >B, 10G'0"(01 4$010"# )" #/0 (066 %$

    #$'0 13766 "%"V(%&)"9 >B,* L%"#)91 ?)#/ /)9/ C%#0 7"& $07& "'3-0$1 (%'6& /064 #% )&0"#)2:

    $060C7"# 7-'"&7"# 10G'0"(01 #/7# (7" -0 67#0$ #01#0& )" #/0 ?0# 67-*

    T/)1 1%2#?7$0 &01)9"[ )" %$&0$ #% 2)"& )1%3)>1[ #$)01 #% 1047$7#0 1)3)67$ 10G'0"(0 9$%'41 #/7#

    &)220$ 2$%3 #/0 (%"#)9 )")#)76 10G'0"(0 )" #/0)$ %C0$6744)"9 $09)%"1* T/)1[ )" 27(#[ (7" $01'6# )"

    3'6#)460 (%"#)91 -0)"9 #/0 1730 3)($%>B,* , 2'$#/0$ 1#04 #$:)"9 #% 30$90 #/%10 (%"#)91 )"#%-)990$ %"01 (%'6& -0 )346030"#0& 71 7" )34$%C030"# #% #/0 ('$$0"# 4$%9$73*

    J'$)"9 #/0 &01)9" 4/710 %2 #/)1 4$%O0(#[ N:#/%" 67"9'790 ?71 1060(#0& 2%$ )#1 022)()0"# '10 %2

    &)(#)%"7$)01 )"&0=01 7"& )#1 %-O0(# 3%&06 #/7# $0&'(01 >,5 '1790 -: '1)"9 4%)"#0$1 #% %-O0(#1

    )"1#07& %2 37)"6: (%4:)"9 &7#7* \%?0C0$[ #/)1 -0/7C)%$ 37@01 N:#/%" 7  7A+"2'

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    40/63

     8+2$*()*% !9 :%+;$

    +X

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    41/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    =%(%.%*1%, "*/ 82#$2'9."64:

    8'$$%?1[ 5[ c/0060$[ Jb _EDDR`* r, -6%(@ 1%$#)"9 6%116011 &7#7 (%34$011)%" 769%$)#/3s[T0(/")(76 >04%$# EFR[ J)9)#76 QG')430"# L%$4%$7#)%"

    L/73-0$1[ J< _EDD;`* !01\ 7A" '%5B3" A"3(K\ ="+)="*7(&" 2$' =+%)="*7(&" 27 -%+7, ,"2+)*B0? k%$@[ B*kK B0? k%$@ ,(7&03: %2 H()0"(01[ 4* RD*

    J7/3[ > _FMMX`* rJ)1(%C0$)"9 JB,K S$)0&$)(/ 5)01(/0$ 7"& #/0 07$6: :07$1 %2 "'(60)( 7()&$0107$(/s* V5.2$ O"$"7(*) EFF _I`K ;I;tXE*

    Q1#0660$[ 5 _FMEE`* rB%"V(%&)"9 >B,1 )" /'37" &)10710s* 0275+" /"&("E)* a0"0#)(1[ EF_EF`[XIEtAR*

    S7$)"066)[ L _FMMX`* I%.=3"K(7, ."2)5+") 2$' )(.(32+(7, ."7+(*)\ =+%="+7(") 2$' 2==3(*27(%$)7% B(%3%6(*23 )(6$23)* N/J #/01)1 _,637 57#0$ H#'&)%$'3 V g")C0$1)#u &) 8%6%9"7[8%6%9"7[ !#76:` 44* EEDtEER*

    a0"#6037"[ bS[ 5'66)"[ >L _EDXD`* T/0 &)1#$)-'#)%" %2 #/0 2$0G'0"(: %2 %(('$$0"(0 %2"'(60%#)&0 1'-10G'0"(01[ -710& %" #/0)$ %C0$674 (747-)6)#:* >(%."7+(*)[ R;_E`[ +;t;F*

    a$)22)#/1Vb%"01[ H[ a$%(%(@[ >b[ C7" J%"90"[ H[ 87#037"[ ,[ Q"$)9/#[ ,b _FMMI`* r3)>8710K

    3)($%>B, 10G'0"(01[ #7$90#1 7"& 90"0 "%30"(67#'$0s* 05*3"(* 1*(') /")"2+*A[+R_1'446 E`[ JERMtJERR*

    \0$1/0:[ ,J[ L/710[ 5 _ED;F`* r!"&040"&0"# 2'"(#)%"1 %2 C)$76 4$%#0)" 7"& "'(60)( 7()& )"9$%?#/ %2 -7(#0$)%4/790s* ] O"$ CA,)(%3* +IK+DV;I*

    j0-1(/'66[ b5[ v7&%$[ ,5 _FME;`* rH%'$(01 %2 NL>V)"&'(0& &)1#%$#)%"1 )" /)9/V#/$%'9/4'#10G'0"()"9 &7#7 10#1s* 05*3"(* 1*(') /")"2+*A[ 9@CAEA*

    1]#/0 %C0$6%%@0& $040$#%)$0 )" #/0&:"73)( 3)($%>B,%30s* M+"$') ($ O"$"7(*)K M@O[ FX_EE`[ ;RRtD*

    H7)@) >j[ a0627"& J\[ H#%2206 H[ H(/7$2 Hb[ \)9'(/) >[ \%$" aT[ 5'66)1 j8 0# 76* _EDXX`

    +D

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    42/63

     8+2$*()*% !9 :%+;$

    rN$)30$V&)$0(#0& 0"^:37#)( 7346)2)(7#)%" %2 JB, ?)#/ 7 #/0$3%1#7-60 JB,4%6:30$710s* G*("$*" F+DK RXAtRDE

    H7"90$ S[ B)(@60" H[ L%'61%" ,> _J0(03-0$ EDAA`* wJB, 10G'0"()"9 ?)#/ (/7)"V#0$3)"7#)"9 )"/)-)#%$1w* C+%*9 02739 1*2'9 G*(* g*H*,* AR _EF`K ;RI+tA*

    5y@)"0"[ W[ 8067^^%'9')[ J[ L'")76[ S[ T%301(' ,! _FME;`* O"$%."(%3%6(*23 G"45"$*" 1$23,)() ($ 7A" R+2 %- V(6A

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    43/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    -5"9% 1.%/2+,

    S)9'$0 E* JB,* R$*,*3%="'(2 >+(72$$(*29

    /##4Khh???*-$)#7"")(7*(%3hQ8(/0(@0&h#%4)(hEIAMI+hJB,

    S)9'$0 F* >B,* R$63()A ^(F(="'(29

    /##4Khh0"*?)@)40&)7*%$9h?)@)h>B, 

    S)9'$0 +* 50110"90$ >B,* R$63()A ^(F(="'(2*

    /##4Khh0"*?)@)40&)7*%$9h?)@)h50110"90$m>B,

    S)9'$0 R* b0C1)"0@ H@%@[ J*[ a%&")([ !*[ v%$([ 5*[ \%$C7#[ H*[ J%C([ N*[ j%C7([ 5* 7"& j'"0O[ T*

    _FME+`[ a0"%30V?)&0 )" 1)6)(% 1($00")"9 2%$ 3)($%>B, 90"0#)( C7$)7-)6)#: )" 6)C01#%(@ 140()01*

     1$(.23 O"$"7(*)[ RRK IIDtIAA*

    /##4Khh%"6)"06)-$7$:*?)60:*(%3h&%)hEM*EEEEh790*EFMAFh7-1#$7(#

    S)9'$0 ;* PZL7$$%66 J*[ H(/7020$[ ,* _FMEF`[ a0"0$76 N$)"()4761 %2 3)>B, 8)%90"01)1 7"& >09'67#)%"

    )" #/0 8$7)"* 0"5+%=),*A%=A2+.2*%3%6, /"&("E)P +XK +Dt;R*

    /##4Khh???*"7#'$0*(%3h"44hO%'$"76hC+Xh"Eh2'66h"44FMEFXA7*/#36

    S)9'$01 I* 5)($%>B,* R$63()A (̂F(="'(29

    /##41Khh0"*?)@)40&)7*%$9h?)@)h5)($%>B,

    S)9'$01 A[ X[ D 7"& EM ?0$0 ($07#0& 2%$ #/)1 4$%O0(#*

    RE

    http://www.britannica.com/EBchecked/topic/167063/DNAhttp://en.wikipedia.org/wiki/RNAhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://onlinelibrary.wiley.com/doi/10.1111/age.12072/abstracthttp://www.nature.com/npp/journal/v38/n1/full/npp201287a.htmlhttps://en.wikipedia.org/wiki/MicroRNAhttp://www.britannica.com/EBchecked/topic/167063/DNAhttp://en.wikipedia.org/wiki/RNAhttp://en.wikipedia.org/wiki/Messenger_RNAhttp://onlinelibrary.wiley.com/doi/10.1111/age.12072/abstracthttp://www.nature.com/npp/journal/v38/n1/full/npp201287a.htmlhttps://en.wikipedia.org/wiki/MicroRNA

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    44/63

     8+2$*()*% !9 :%+;$

    RF

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    45/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    3**%< 3E 8",4 ,1.26+ ('. J3C!K 6.%6.'1%,,2*9

    !"#$%$'()

    !! +(',-. /#01 2%3-/4(2'(5' 2%3-/67'3

    !! 89:%; =?4-(( @%35-=!! ;%(4'=A( $'A 67'3%5B =-'A((?3%AC>=->2%35D/>3 E2 0F E, 0D G  E? 0HFI/4(2'(5'JC&?K%( G  EL EM &? E> &? EB &? E& B-(

    !! N75'A'>5 LF/1 A?O&3?'A-A 2=?K !! )55>(.##475'A'>5/=-'A5)-A?4(/?=,#-(5'$3-#!! P-K?L-( 'A'>5?= 2=?K =-'A(!! QRR1D1F1R1RFRFFDRFDQ %( 5)- 'A'>5?= 2?= ?7= 3%$='=%-(S 5)%( L'37- %( L'=%'$3-475'A'>5 E4 E- 1/F ET EK FU EV R1 E' RR1D1F1R1RFRFFDRFD G  E? 0HFI'J6 HFI/4(2'(5'JC&?K%(CWC@R/4(2'(5' HFI/4(2'(5'JC&?K%(CWC@R/67'3

    R+

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    46/63

     8+2$*()*% !9 :%+;$

    RR

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    47/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    3**%< 8E F:+4'* 1'/% ('. +4% 6.'G%1+

    7"2* ,'0.1%E *6%+839 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! K%PXY3%,&-= E K%4=?PXY '3,%&K-&5 75%3%5B 2?= (K'33PXY(-6 A'5' Z4?A%&,. 752[!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! V'%& (?7=4-!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Y75)?=. @='&4%(4? ;/ V?=\&E;7='& ]2AK?=?&^%4?&4?3?,%'/&-5_!!! ;-(4=%>5%?&. ?=5 (B(%K>?=5 ?(/>'5)2=?K (-6( %K>?=5 N?33'>(-

    A-2 )-3>ab.  ccc =%&5 __ 2S cc  >=%&5 __ 2S cG5`h.G5G5`EK-= (%T- 2?= A- i=7%j& ,='>) 4?&(5=745%?&/c  >=%&5 __ 2S cG5`W.G5V%&%K7K 4?7&5 2?= `EK-= %&437(%?& %& A- i=7%j& ,='>)/c  >=%&5 __ 2S cG5`;.G5G5`EK-= (%T- 2?= 4?&5%, %&A-M 7(-A %& L?5'5%?&/c  >=%&5 __ 2S cG5&W.G5W)=-()?3A 2?= (-67-&4- ?447=-&4- 5? $- 4?&(%A-=-A/c  =-57=&

    A-2 K'%&ab.

      ccc V'%& $?AB ?2 5)- '>>3%4'5%?&/ ccc

      ,3?$'3 4?33-45%?& ! @?= A-$7,,%&, >7=>?(-(

      %2 &?5 a3-&a(B(/'=,Lb dd U ?= 3-&a(B(/'=,Lb dd kb.  )-3>ab  =-57=&

      `h d %&5a(B(/'=,LfFgb ! ` 2?= A- i=7%j& ,='>)  `W d %&5a(B(/'=,LfDgb ! W)=-()?3A 2?= `EK-= 4?L-=',- %& ;ih  `; d %&5a(B(/'=,LfRgb ! ` 2?= 4?&5%, L?5'5%?&  &W d %&5a(B(/'=,Lflgb ! W)=-()?3A 2?= (-67-&4- ?447=-&4-  2&'K- d (B(/'=,Lfmg ! @%3- &'K- 5? =-'A %&>75 2=?K   ?752%3- d (B(/(5A?75

      %2 3-&a(B(/'=,Lb dd k.

      %2 ?(/>'5)/%(2%3-a(B(/'=,LfUgb.  >=%&5 __ (B(/(5A-==S cnPP9P. 2%3-&'K- c e (B(/'=,LfUg e c '3=-'AB -M%(5(/c  )-3>ab  =-57=&  ?752%3- d ?>-&a(B(/'=,LfUgS cOecb ! @%3- 5? O=%5- 5)- ?75>75  %2 &?5 ?(/>'5)/%(2%3-a2&'K-b.  >=%&5 __ (B(/(5A-==S cc  >=%&5 __ (B(/(5A-==S cnPP9P. 2%3-&'K- c e 2&'K- e c A?-( &?5 -M%(5/c  )-3>ab  =-57=&

      4?33-45%?& d N?33'>(-a2&'K-S `hS `WS `;S &Wb  >=%&5 __ ?752%3-S 4?33-45%?&/4?&5%,(  %2 ?752%3- "d (B(/(5A?75.  ?752%3-/43?(-ab  =-57=&

    R;

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    48/63

     8+2$*()*% !9 :%+;$

    %2 CC&'K-CC dd cCCK'%&CCc.  K'%&ab

    RI

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    49/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    7'/0$% .&:.839 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    !!! K%PXY3%,&-= E K%4=?PXY '3%,&K-&5 75%3%5B 2?= (K'33PXY(-6 A'5' Z4?A%&,. 752[!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! V?A73- (-6(!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Y75)?=. @='&4%(4? ;/ V?=\&E;7='& ]2AK?=?&^%4?&4?3?,%'/&-5_!!! ;-(4=%>5%?&. N?33-45%?& ?2 8-67-&4-( O%5) A%22-=-&5 P-'A(!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    2=?K >?33 %K>?=5

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    50/63

     8+2$*()*% !9 :%+;$

      A-2 C4?K>75-p7'3(a(-32b.  ccc +>A'5- 4?&(-&(7( 67'3%5%-( 2?= 8-67-&4-(/ ccc

      2?= % %& ='&,-aFS (-32/K'M:-& e Fb.  (-32/$'(-p7'3f%g d H Q6Q. 1S Q&Q. 1 J

      2?= % %& (-32/(-67-&4-/L'37-(ab.  4 d F  2?= j %& %/,-5p7'3ab.  (-32/$'(-p7'3f4gfQ6Qg ed j  (-32/$'(-p7'3f4gfQ&Qg ed F  4 ed F  2?= % %& ='&,-aFS (-32/K'M:-& e Fb.  (-32/'L,p7'3/'>>-&Aa(-32/$'(-p7'3f%gfQ6Qg # (-32/$'(-p7'3f%gfQ&Qgb  =-57=&

      A-2 C>=%&587KK'=Ba(-32b.  ccc r=%5-( 5)- (7KK'=B ?2 5)- =-'A(/ ccc

      >=%&5 __ (B(/(5A-==S cW)-=- '=- c e (5=a(-32/&7KP-'A(bS  >=%&5 __ (B(/(5A-==S c=-'A( 4?33'>(%&, %&5?cS  >=%&5 __ (B(/(5A-==S (5=a(-32/7&%P-'A(b e c 7&%67- =-'A(/c

      =-57=&

    43'(( 8-67-&4-.  ccc h=?7> ?2 7&%67- 4?3?=(>'4- (-67-&4-(/ ccc

      A-2 CC%&%5CCa(-32S %S 4S 6S 3b.  ccc N=-'5-( ' &-O (-67-&4- 2=?K ' =-'A/ ccc

      (-32/C6 d fg ! N7K73'5%L- >-=E$'(- 67'3%5B  2?= j %& 6.  (-32/C6/'>>-&Aa?=Aajbb  (-32/C6K d (-32/C6 ! YL-=',- >-=E$'(- 67'3%5B  (-32/C6KCL'3%A d W=7- ! s'3%A%5B 23', 2?= 5)- 6K 'L-=',-  (-32/C'6 d 1 ! YL-=',- (-67-&4- 67'3%5B

      (-32/C'6CL'3%A d @'3(- ! s'3%A%5B 23', 2?= 5)- '6 'L-=',-  (-32/3 d 3 ! :-&,)5 ?2 5)- $'(-(>'4- (-67-&4-  (-32/4 d 4 ! N?3?=(>'4- (-67-&4-  (-32/$ d (-32/C4(D$(ab! i'(-(>'4- (-67-&4-  (-32/% d f%g ! :%(5 ?2 =-'A %A-&5%2%-=( O%5) 5)%( (-67-&4-  (-32/& d F ! X7K$-= ?2 =-'A( O%5) 5)%( (-67-&4-  (-32/$=7%j& d HJ ! ;- i=7%j& ,='>)  (-32/4?&5%,( d (-5ab ! N?&5%,( 5? O)%4) (-6 )'( L?5-A  (-32/>?33 d X?&-  =-57=&

      A-2 CC35CCa(-32S (-6b.

      =-57=& (-32/3 ] (-6/3

      A-2 CC3-CCa(-32S (-6b.

      =-57=& (-32/3 ]d (-6/3

      A-2 CC,5CCa(-32S (-6b.

      =-57=& (-32/3 _ (-6/3

      A-2 CC,-CCa(-32S (-6b.

      =-57=& (-32/3 _d (-6/3

      A-2 C4(D$(a(-32b.

    RX

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    51/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

      ccc h%L-( $'(-(>'4- (-67-&4- 2=?K ' 4?3?=(>'4- %&>75/ ccc

      A d H QWQ. cWhNYcS QYQ. cYNhWcS QNQ. cNYWhcS QhQ. chWYNc J ! W='&(%5%?& 5'$  =-5 d fg  4 d (-32/4f1g ! 47==-&5 4?3?=(>'4- 4)'='45-= 5? 5=-'5

      2?= % %& ='&,-aFS (-32/3 e Fb. ! 8`%> 5)- 2%=(5 $'(- a>=%K-=b  4 d Af4gf%&5a(-32/4f%gbg ! i'(- ?$5-&5%?& 2=?K 5='&(%5%?& 5'$3-  =-5/'>>-&Aa4b  =-57=& cc/j?%&a=-5b

      A-2 C'AAP-'Aa(-32S %S 4S 6b.  ccc YAA( =-'A O%5) %A-&5%2%-= %A 5? 5)- 8-67-&4-/ ccc

      %2 (-32/4 "d 4.  =-57=& @'3(- ! P-'A A?-( &?5 $-3?&, 5? 5)%( (-67-&4-  (-32/C6KCL'3%A d @'3(- ! 6K %( 4'3473'5-A ?&3B ?& A-K'&A  (-32/C'6CL'3%A d @'3(- ! '6 %( 4'3473'5-A ?&3B ?& A-K'&A  (-32/%/'>>-&Aa%b ! N?7&5 =-'A '( ' &-O (-67-&4- %5-K   (-32/& ed F ! +>A'5- 4?7&5( 2?= 5)- (-67-&4-  & d 1  2?= j %& 6. ! 8'L- 5)- 67'3%5B 2?= -'4) 5='&(%5%?&

      (-32/C6f&g ed ?=Aajb  & ed F  =-57=& W=7-

      A-2 ,-5p7'3a(-32b.  ccc h-5( 'L-=',- 67'3%5B 2?= -'4) (-67-&4-A 5='&(%5%?&/ ccc

      %2 &?5 (-32/C6KCL'3%A. ! +>A'5- K-'& 67'3%5B L'37-  (-32/C6K d fg  2?= % %& (-32/C6.  (-32/C6K/'>>-&Aa%#(-32/&b ! YL-=',- 67'3 2?= 5)%( $'(-  (-32/C6KCL'3%A d W=7- ! 8-5( L'3%A%5B 23',  (-32/C'6CL'3%A d @'3(-  %2 &?5 (-32/C'6CL'3%A.  (-32/,-5YL,p7'3ab

      =-57=& (-32/C6K 

      A-2 ,-5YL,p7'3a(-32b.  ccc h-5( 'L-=',- 67'3%5B 2?= 5)- (-67-&4-/ ccc

      %2 &?5 (-32/C6KCL'3%A.  (-32/,-5p7'3ab  %2 &?5 (-32/C'6CL'3%A.  (-32/C'6 d 1  2?= % %& (-32/C6K.  (-32/C'6 ed %  (-32/C'6 d (-32/C'6 # 3-&a(-32/C6Kb  (-32/C'6CL'3%A d W=7-  =-57=& (-32/C'6

      A-2 `K-=(a(-32S `b.  ccc i7%3A( 3%(5 ?2 `EK-=( 2?= 5)- =-'A(/ ccc

      = d fg  2?= % %& ='&,-a1S (-32/3 E ` e Fb.  =/'>>-&Aaf (-32/$f%.% e `gS % gb  =-57=& =

      A-2 ,-5tK@=-6(a(-32S `S % d 1S 2 d X?&-b.  ccc h-5( 2=-67-&4%-( ?2 `EK-=( %& (-67-&4-/ ccc

      %2 2 %( X?&-.  2 d (-32/3 E F  A d HJ  2?= `KS >( %& (-32/`K-=(a`b.

    RD

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    52/63

     8+2$*()*% !9 :%+;$

      %2 >( ] % ?= >( _ 2 E ` e F.  4?&5%&7-  %2 &?5 `K %& A.  Af`Kg d 1  Af`Kg ed F

      5` d 2 E % E ` e D  2?= `K %& A.  Af`Kg #d 23?'5a5`b  =-57=& A

      A-2 A-4%A-N?&5%,a(-32b.  ccc ;-4%A- 5? O)%4) 4?&5%, A? O- '((%,& 5)- (-67-&4-/  t-->( &-'=-(5 K?(5 L?5-A 4?&5%,/ ccc

      (3 d (?=5-Aa(-32/4?&5%,(S `-B d 3'K$A' 4. a3-&a4/Lf(-32gbS  E(-32/A%(5'&4-a4bS  4/(bb  2?= 4 %& (3f.EFg.  4/Af(-32g d (3fEFg  4/& Ed (-32/&u3-&a4/Lf(-32gb  A-3 4/Lf(-32g

      (-32/4?&5%,(/=-K?L-a4b  =-57=&

      A-2 A%(5'&4-a(-32S ?5)-=b.  ccc N?K>75-( V')'3'&?$%( A%(5'&4- $-5O--& 5O? $'(-(>'4- (5=%&,(/ ccc

      %2 (-32/>?33 dd X?&-.  =-57=& @'3(-  %2 &?5 %(%&(5'&4-a?5)-=S A%45b.  (%S ?%S (2S ?2 d 1S 1S (-32/3 E FS ?5)-=/3 E F  %2 ?5)-= %& (-32/4?&5%,(.  ? d ?5)-=/Lf(-32gf1gf1g E ?5)-=/Lf(-32gf1gfFg  %2 ? _ 1.  ?% d ?  %2 ? ] 1.

      (% d E?  ?2 d K%&a?2S ? e (-32/3 E Fb  (2 d K%&a(2S ?5)-=/3 EF E ?b  2F d (-32/,-5tK@=-6(a(-32/>?33/`LS (%S (2b  2D d ?5)-=/,-5tK@=-6(a(-32/>?33/`LS ?%S ?2b  -3(-.  2F d (-32/,-5tK@=-6(a(-32/>?33/`Lb  2D d ?5)-=  `K( d (-5a2F/`-B(abb Z (-5a2D/`-B(abb  2?= `K %& fM 2?= M %& 2F/`-B(ab %2 M &?5 %& 2Dg.  2Df`Kg d 1/1  2?= `K %& fM 2?= M %& 2D/`-B(ab %2 M &?5 %& 2Fg.  2Ff`Kg d 1/1  =-57=& (6=5a(7Kafaa2Ff`Kg E 2Df`Kgb#(-32/>?33/C`4f`KgbuuD 2?= `K %& `K(gbb

    ;M

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    53/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    7'/0$% 0ᛏ !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    !!! K%PXY3%,&-= E K%4=?PXY '3%,&K-&5 75%3%5B 2?= (K'33PXY(-6 A'5' Z4?A%&,. 752[!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! V?A73- A$,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Y75)?=. @='&4%(4? ;/ V?=\&E;7='& ]2AK?=?&^%4?&4?3?,%'/&-5_!!! ;-(4=%>5%?&. ;- i=7%j& h='>) %K>3-K-&5'5%?&/ ;-2%&%5%?& ?2 Y((-K$3B '&A!!! N?&5%, 43'((-(!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    2=?K K'5) %K>?=5 (6=5%K>?=5 (B(

    A-2 `K-=(a(-6S `b.  ccc v%-3A( ̀ EK-=( $-3?&,%&, 5? (-6/ ccc

      2?= % %& M='&,-a3-&a(-6b E ` e Fb.  B%-3A (-6f%.% e `g

    A-2 C2Oa`Kb.  ccc v%-3A( &-M5 >?((%$3- 2?=O'=A `EK-=( 2?= `K/ ccc

      2?= M %& QYNhWQ.  B%-3A `KfF.g e M

    A-2 C$Oa`Kb.  ccc v%-3A( &-M5 >?((%$3- $'4`O'=A `EK-=( 2?= `K/ ccc

      2?= M %& QYNhWQ.  B%-3A M e `Kf.EFg

    43'(( ;$,.  ccc N3'(( 2?= ;- i=7%j& h='>)/ ccc

      A-2 CC%&%5CCa(-32S (-6(S `S 5)=-()?3Ab.  ccc o&%5 ,='>) 2=?K (-6( O%5) `EK-=( >=-(-&5 K?=- 5)'& 5)=-()?3A 5%K-(/ ccc

      >=%&5 __ (B(/(5A-==S ci7%3A%&, ;- i=7%j& h='>) 2=?K 4?33-45-A =-'A(/c  (-32/h d HJ  2?= (-6 %& (-6(.  2?= ( %& (-6/$/(>3%5aQXQb. ! 8>3%5 =-'A( O%5) 7&`&?O& $'(-(  2?= `K %& `K-=(a(S `b. ! h-5 `EK-=( 2=?K -'4) (-67-&4-  %2 &?5 `K %& (-32/h.  (-32/hf`Kg d F ! o&%5%'3%T- &-O `EK-=  -3(-.  (-32/hf`Kg ed F ! YAA `EK-= 4?L-=',-  3?O4?L d fM 2?= M %& (-32/h %2 (-32/hfMg ]d 5)=-()?3Ag ! :%(5 3?O 4?L-=-A

      2?= M %& 3?O4?L.  A-3 (-32/hfMg ! P-K?L- 3?O 4?L-=-A `EK-=( 2=?K ,='>)  >=%&5 __ (B(/(5A-==S (5=a3-&a(-32/hbb e c 5?5'3 `EK-= &?A-( %& 5)- ,='>)/c  =-57=&

      A-2 ,-5C4?&5%,C2Oa(-32S `Kb.  ccc X'L%,'5- ,='>) 2?=O'=A( 2=?K `K O)%3- =-'4)%&, &?&E'K$%,7?7( >'5)(/ ccc

      4 d f`Kg ! @%=(5 `EK-=  O)%3- W=7-.  %2 (7KaM %& (-32/h 2?= M %& C2Oa4fEFgbb "d F.  $=-'` ! 9&- >?((%$3- >'5) ?&3B"  4'&A d fM 2?= M %& C2Oa4fEFgb %2 M %& (-32/hgf1g ! X-M5 4'&A%A'5- `EK-=  %2 4'&A dd `K.  $=-'` ! i=-'` 4B43-(" ! ?= Vw$%7( 4?&5%,(

      %2 (7KaM %& (-32/h 2?= M %& C$Oa4'&Abb "d F.

    ;E

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    54/63

     8+2$*()*% !9 :%+;$

      $=-'` ! N'&A%A'5- ()?73A $- =-'4)-A $B 3'(5 `EK-= ?&3B"  4/'>>-&Aa4'&Ab ! Y>>-&A 4'&A%A'5- 7&'K$%,7?7( `EK-= >'5)  =-57=& 4

      A-2 ,-5C4?&5%,C$Oa(-32S `Kb.  ccc X'L%,'5- ,='>) $'4`O'=A( 2=?K `K O)%3- =-'4)%&, &?&E'K$%,7?7( >'5)(/ ccc

      4 d f`Kg  O)%3- W=7-.  %2 (7KaM %& (-32/h 2?= M %& C$Oa4f1gbb "d F.  $=-'`  4'&A d fM 2?= M %& C$Oa4f1gb %2 M %& (-32/hgf1g  %2 4'&A dd `K.  $=-'`  %2 (7KaM %& (-32/h 2?= M %& C2Oa4'&Abb "d F.  $=-'`  4/%&(-=5a1S 4'&Ab ! o&(-=5 4'&A%A'5- '5 5)- $-,%&%&, ?2 >'5)  =-57=& 4

      A-2 ,-5C4?&5%,a(-32S `Kb.

      ccc h-5 7&'K$%,7?7( >'5) 4?&5'%&%&, `EK-= a%2 %5 -M%(5(b/ ccc  2O d (-32/,-5C4?&5%,C2Oa`Kb ! @?=O'=A >'5)  $O d (-32/,-5C4?&5%,C$Oa`Kb ! i'4`O'=A >'5)  $O d $Of.EFg ! P-K?L- `K 2=?K $'4`O'=A >'5) a>=-(-&5 %& $?5)b  %2 `K %& C2Oa2OfEFgb.  4 d 2O ! $O >'5) %( 2O '( O-33x  -3(-.  4 d $O e 2O ! V-=,- 2O '&A $O >'5)(  ! P-57=& 4?&5%,S `EK-= >'5) '&A `EK-= 4?L-=',-  =-57=& (-32/4?&5%,D(5=%&,a4bS 4S f(-32/hfMg 2?= M %& 4g

      A-2 4?&5%,D(5=%&,a(-32S 4b.  ccc r=%5- (-67-&4- (5=%&, 2=?K `EK-= >'5)/ ccc

      =-57=& 4f1g e QQ/j?%&aMfEFg 2?= M %& 4fF.gb

      A-2 '33C4?&5%,(a(-32b.  ccc h-5 '33 7&'K$%,7?7( >'5)( 4?&5'%&-A %& 5)- ,='>)/ ccc

      A?&- d (-5ab ! 8-5 ?2 L%(%5-A `EK-=(  = d fg ! :%(5 ?2 4?&5%,( 5? =-57=&  2?= M %& (-32/h.  %2 M &?5 %& A?&-.  (S 4S 4?L d (-32/,-5C4?&5%,aMb ! h-5 (-6S `EK-=( '&A `EK-= 4?L-=',-  2?= B %& 4.  A?&-/'AAaBb ! @3', '( L%(%5-A '33 `EK-=( %& 5)- 4?&5%,  =/'>>-&Aa(b  =-57=& =

    43'(( Y((-K$3B.  ccc N?33-45%?& ?2 4?&5%,(/ ccc

      A-2 CC%&%5CCa(-32S (-6(S `S 5)=-()?3Ab.  ccc o&%5%'3%T- 2=?K ' 3%(5 ?2 4?&5%, (-67-&4-(/ ccc

      (-32/A d ;$,a(-6(S `S 5)=-()?3Ab  (-32/& d 1  (-32/4 d HJ  (-32/` d `  >=%&5 __ (B(/(5A-==S cnM5='45%&, 7&%67- 7&'K$%,7?7( 4?&5%,(/c  2?= M %& (-32/A/'33C4?&5%,(ab.  (-32/4fMg d N?&5%,aMS `b  (-32/& ed F  >=%&5 __ (B(/(5A-==S (5=a(-32/&b e c 5?5'3 4?&5%,( O-=- ,-&-='5-A/c  =-57=&

    ;F

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    55/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

      A-2 CC,-5%5-KCCa(-32S Mb.  ccc h-5 4?&5%, M 2=?K '((-K$3B/ ccc

      %2 M %& (-32/4.  =-57=& (-32/4fMg  =-57=& @'3(-

      A-2 CC=->=CCa(-32b.  ccc / ccc

      % d F  ? d cc  2?= M %& (?=5-Aaf4 2?= 4 %& (-32/4gS  `-B d 3'K$A' >. aE(-32/4f>g/&S E3-&a(-32/4f>g/Lbbb.  ? ed c_ 4?&5%,Cc e (5=a%b e c L?5-(. c e (5=a(-32/4fMg/&b  ? ed c (-67-&4-(. c e (5=a3-&a(-32/4fMg/Lbb e c 4?7&5(. c  ? ed (5=a(7Kaf(/& 2?= ( %& (-32/4fMg/Lgbb e cG&c  ? ed (-32/4fMg/CC=->=CCa@'3(-b e cG&c  % ed F

      =-57=& ?

      A-2 7>A'5-N?L(a(-32b.  ccc @?=4- 7>A'5- ?2 4?&5%, 4?L-=',- L'37-(/ ccc

      2?= 4 %& (-32/4/L'37-(ab.  4/7>A'5-N?Lab  =-57=&

      A-2 4)-4`o&5-,=%5Ba(-32b.  ccc N)-4`( 4?L-=',- ?2 4?&5%,( $B (-6( %( %&4=-'(%&,/ ccc

      (-32/('&%5%T-ab  2?= 4 %& (-32/4/L'37-(ab.

      4/4)-4`o&5-,=%5Bab  (-32/('&%5%T-ab  2?= 4 %& (-32/4/L'37-(ab.  4/=-K?L-9753%-=(ab  (-32/('&%5%T-ab  =-57=&

      A-2 ('&%5%T-a(-32b.  ccc / ccc

      A d (-5af4 2?= 4 %& (-32/4 %2 3-&a(-32/4f4g/Lb dd 1gb  2?= 4 %& A.  A-3 (-32/4f4g  =-57=&

    43'(( N?&5%,.  ccc N?&5%, 2=?K ' 7&%67- 7&'K$%,7?7( >'5) %& 5)- ,=>')/ ccc

      A-2 CC%&%5CCa(-32S MS `b.  ccc o&%5%'3%T- 2=?K ' 4?&5%, (-67-&4-/ ccc

      (-32/( d M ! N?&5%, (-67-&4-  (-32/3 d 3-&aMb ! N?&5%, 3-&,5)  (-32/4 d fg ! =%&5 4?&5%,b  (-32/A d HJ ! ;%(4'=A-A =-'A(  (-32/` d `  (-32/4?&( d cc ! N?&(-&(7( (-6

    ;+

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    56/63

     8+2$*()*% !9 :%+;$

      =-57=&

      A-2 CC=->=CCa(-32S L-=$?(- d W=7-b.  ccc =-(-&5'5%?& ?2 ' 4?&5%,/ ccc

      ?1 d '$(a(-32/?b  %2 L-=$?(-.  2?= % %& M='&,-a?1b.  >=%&5 QQS  >=%&5 (-32/(  (( d (-32/L/`-B(ab  ((/(?=5a`-B d 3'K$A' (-6. aE(-6/A%(5'&4-a(-32bS  3-&a(-32/Lf(-6gbS  (-6/&bS  =-L-=(- d W=7-b  2?= ( %& ((.  ? d (-32/Lf(gf1gf1g E (-32/Lf(gf1gfFg e ?1  2?= % %& M='&,-a?b.  >=%&5 QQS  >=%&5 (/$S  >=%&5 caMc e (5=a(/&b e cScS

      >=%&5 cLc e (5=a3-&a(-32/Lf(gbb e cScS  >=%&5 cAc e (5=a(/A%(5'&4-a(-32bb e cbc  =-57=& (-32/4?&(

      A-2 CC3-&CCa(-32b.  ccc P-57=& &7K$-= ?2 (-67-&4-( %& 5)- N?&5%,/ ccc

      =-57=& 3-&a(-32/Lb

      A-2 C%6=9753%-=a(-32S 3S (%A- d c7>>-=cb.  ccc @%&A( %&5-=67'&5%3- ='&,- '&A =-57=& ' 5)=-()?3A 2?= ?753%-=(/ ccc

      & d 3-&a3b  %2 & ] R.

      =-57=& @'3(-  %2 a&eFbIl dd 1.  6F d 3fa&eFb#lEFg  6R d 3fRua&eFb#lEFg  -3(-.  6F d a3fa&eFb#lEFge3fa&eFb#lgbu/m  6R d a3fRua&eFb#lEFge3fRua&eFb#lgbu/m  %6= d 6R E 6F  %2 (%A- dd c7>>-=c.  =-57=& 6R e F/mu%6=  -3(-.  =-57=& 6F E F/mu%6=  =-57=&

      A-2 7>A'5-N?La(-32b.  ccc +>A'5- 4?L-=',- L'37-( >-= 4?&5%, $'(- '&A ,-5 4?&(-&(7( (-67-&4-/ ccc

      ?( d f(-32/Lf(gf1gf1g E (-32/Lf(gf1gfFg 2?= ( %& (-32/Lg  3( d f(/3 e (-32/Lf(gf1gf1g E (-32/Lf(gf1gfFg 2?= ( %& (-32/Lg  (-32/? d '$(aK%&a1SK%&a?(bbb  3-&,5) d K'Ma3(b e (-32/?  $ d fHJ 2?= % %& ='&,-a3-&,5)bg  (-32/4 d f1 2?= % %& ='&,-a3-&,5)bg  2?= 4 %& (-32/L/`-B(ab.  ? d ?(/>?>a1b e (-32/?  ?D d 1  2?= % %& 4/$.  %2 &?5 % %& $f? e ?Dg.  $f? e ?Dgf%g d 1  $f? e ?Dgf%g ed 4/&  ?D ed F  2?= ? %& ='&,-a3-&a$bb.

    ;R

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    57/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

      5?5'3 d 23?'5a(7Ka$f?g/L'37-(abbb  (-32/4f?g ed 5?5'3  %2 3-&a$f?gb _ 1.  2?= % %& $f?g.  $f?gf%g #d 5?5'3

      (-32/4?&( d QQ/j?%&af(?=5-Aa$f?g/`-B(abS  `-B d 3'K$A' &. E$f?gf&gbf1g 2?= ? %& ='&,-a3-&a$bb %2 3-&a$f?gb _ 1gb  =-57=&

      A-2 4)-4`o&5-,=%5Ba(-32b.  ccc P-K?L-( 5=%L%'3 &?&EK'54)%&, (-67-&4-(/ ccc

      A d (-5ab  2?= ( %& (-32/L.  L1 d (-32/Lf(gf1g  2?= L %& (-32/Lf(gfF.g.  %2 &?5 L1 ]d L.  ! i?5) 4??=A(/ %& L1 K7(5 ]d 5)'& 5)-%= =-(>-45%L/ %& L  A/'AAa(b  $=-'`  L1 d L

      2?= ( %& A.  (-32/Af(g d (-32/Lf(g  (-32/& Ed (/&  A-3 (-32/Lf(g  =-57=&

      A-2 =-K?L-9753%-=(a(-32b.  ccc P-K?L- ?753%-=( 23',,-A $B %&5-=67'&5%3- ='&,- '&A 4?&2%=K-A/  O%5) V')'3'&?$%( A%(5'&4- 5? 4-&5=?%A/ ccc

      ( d (?=5-Aa(-32/LS `-B d 3'K$A' (-6. (-6/A%(5'&4-a(-32bb  3 d f(-6/A%(5'&4-a(-32b 2?= (-6 %& (g  A d (-5ab ! 8-5 ?2 (-67-&4-( 5? =-K?L- 2=?K 4?&5%,  5)=-()?3A d (-32/C%6=9753%-=a3b  %2 &?5 5)=-()?3A. ! W?? 2-O (-67-&4-(

      =-57=&  ? d (7Kaf% _ 5)=-()?3A 2?= % %& 3gb  %2 ? _ 1.  4'&A( d (fE?.g ! N'&A%A'5-( 5? ?753%-=  ,=?7> d (f.E?g ! P-3%'$3- ,=?7>  4,=?7> d (-32/4-&5=?%AalS (-5a,=?7>bb ! h-5 4-&5=?%A  3,=?7> d f(-6/A%(5'&4-a4,=?7>b 2?= (-6 %& ,=?7>g ! ;%(5(/ 5? 4-&5=?%A  K d (7Ka3,=?7>b#3-&a3,=?7>b ! V-'& A%(5'&4-  AD d fa%EKbuuD 2?= % %& 3,=?7>g ! N?L(/  L d (6=5a(7KaADb#3-&aADbb ! 85A/ -==?=  2?= M %& 4'&A(.  %2 M/A%(5'&4-a(-32b _ L. ! P-K?L- (-6( O%5) A%(5'&4- _ (5A/ -==?=  A/'AAaMb  2?= ( %& A. ! ;-3-5- 23',,-A (-67-&4-( 2=?K 4?&5%,  (-32/Af(g d (-32/Lf(g  (-32/& Ed (/&  A-3 (-32/Lf(g

      =-57=&

      A-2 7>A'5-s?5-a(-32S (S LS `b.  ccc +>A'5-( 5)- L?5-( 2?= 5)- 4?&5%,/ ccc

      %2 &?5 ( %& (-32/L.  (-32/Lf(g d fg  (-32/Lf(g/'>>-&AaLb  (-32/& ed (/&  =-57=&

      A-2 `K-=(a(-32S `b.  ccc i7%3A( 3%(5 ?2 `EK-=( 2?= 5)- 4?&5%, ccc

    ;;

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    58/63

     8+2$*()*% !9 :%+;$

      = d fg  2?= % %& ='&,-a1S (-32/3 E ` e Fb.  =/'>>-&Aaf (-32/(f%.% e `gS % gb  =-57=& =

      A-2 4-&5=?%Aa(-32S `S (-6( d (-5abb.  ccc / ccc

      4 d HJ  3 d 3-&a(-32/Lb  2?= (-6 %& (-32/L.  %2 3-&a(-6(b _ 1 '&A (-6 &?5 %& (-6(.  4?&5%&7-  > d (-6/,-5tK@=-6(a`b  2?= `K %& >.  %2 &?5 `K %& 4.  4f`Kg d 1  4f`Kg ed >f`Kg  2?= `K %& 4.  4f`Kg #d 3  =-57=& 4

      A-2 ,-5tK@=-6(a(-32S `S % d 1S 2 d X?&-b.  ccc h-5( 2=-67-&4%-( ?2 `EK-=( %& (-67-&4-/ ccc

      %2 2 %( X?&-.  2 d (-32/3 E F  A d HJ  2?= `KS >( %& (-32/`K-=(a`b.  %2 >( ] % ?= >( _ 2 E ` e F.  4?&5%&7-  %2 &?5 `K %& A.  Af`Kg d 1  Af`Kg ed F  5` d 2 E % E ` e D  2?= `K %& A.

      Af`Kg #d 23?'5a5`b  =-57=& A

    ;I

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    59/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

    7'/0$% 3)--839 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    !!! K%PXY3%,&-= E K%4=?PXY '3%,&K-&5 75%3%5B 2?= (K'33PXY(-6 A'5' Z4?A%&,. 752[!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! V?A73- >?33!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Y75)?=. @='&4%(4? ;/ V?=\&E;7='& ]2AK?=?&^%4?&4?3?,%'/&-5_!!! ;-(4=%>5%?&. ;%45%?&'=%-( ?2 (-67-&4-( %&A-M-A $B 5)-%= 4?&5'%&-A `EK-=(/!!! ;-2%&%5%?& ?2 s?5-/ nM-475?= ?2 5)- L?5-E'&AE(--A >=?4-((/!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    %K>?=5 (B(

    43'((

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    60/63

     8+2$*()*% !9 :%+;$

      %2 &?5 `K %& (-32/CA4.  (-32/CA4f`Kg d (-5ab ! o&%5%'3%T- -K>5B `EK-= -&5=B  (-32/CA4f`Kg/'AAa57>3-aa4?&5%,S >(bbb ! 85?=- (-67-&4- '&A `EK-= >?(%5%?&  =-57=&

      A-2 CL?5-a(-32S (-6b.  ccc 85?=- L?5-( ,-&-='5-A 2=?K (-6 %&5? 4?&5%,( A%45%?&'=B/ ccc

      2?= `KS >( %& (-6/`K-=(a(-32/`b. ! h-5 `EK-=( '&A >?(%5%?&( %& (-6  %2 `K %& (-32/CA4.  2?= 4(-6S 4>( %& (-32/CA4f`Kg. ! h-5 >?(%5%?&( %& %&A-M-A =-'A(  4(-6/7>A'5-s?5-a(-6S s?5-a4>(S >(bS (-32/`b ! s?5- 4?&5%,  (-6/4?&5%,(/'AAa4(-6b  (-32/C4 ed F ! 87K &7K$-= ?2 L?5-A (-67-&4-(  >=%&5 __ (B(/(5A-==S cG=c e (5=a(-32/C4b e cG5>=?4-((-A >%-4-( ?2 A'5'/cS  (B(/(5A?75/237()ab ! @?=4- 5? -K>5B $722-=  =-57=&

      A-2 C%&%5N?L'=%'&4-(a(-32b.  ccc / ccc

      %2 &?5 (-32/L?5-A.  =-57=& @'3(-  ` d (-32/`L  4 d (-32/C4  2?= `K %& (-32/CA(.  ! N?K>75- -M>-45-A 2=-67-&4B 2?= `K   3 d (-32/CA(f`Kg  (-32/C`2f`Kg d (7Kaf(/,-5tK@=-6(a`bf`Kg 2?= ( %& 3gb#23?'5a4b  ! N?K>75- 4?L'=%'&4- 2?= `K   2 d (-32/C`2f`Kg  = d aa4 E 3-&a3b e Fbu2uuDb#a4 E Fb  (-32/C`4f`Kg d (7Kafa(/,-5tK@=-6(a`bf`Kg E 2buuD 2?= ( %& 3gb#a4 E Fb e =  =-57=&

      A-2 ,-5tK@=-6(a(-32S `K(b.  ccc h-5( 2=-67-&4B ?2 `EK-= `K %& %&>75 =-'A(/ ccc

      %2 &?5 (-32/L?5-A ?= (-32/C5 dd 1.  =-57=& @'3(-  A d HJ  2?= `K %& `K(.  %2 &?5 `K %& A '&A `K %& (-32/C`2.  Af`Kg d (-32/C`2f`Kg#23?'5a(-32/C5b  =-57=& A

    43'(( s?5-.  ccc s?5- (7>>?=5%&, ' `EK-= K'54)/ ccc

      A-2 CC%&%5CCa(-32S >FS >Db.

      ccc s?5- %&%5%'3%T'5%?& 2=?K (-67-&4- %&A-M-(/ ccc

      (-32/>F d >F !

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    61/63

     !" $%&% '()*%&"+, %- .(*+%/01 -+%. ).233 /01 )"45"$*($6 '272

      A-2 CC=->=CCa(-32b.  ccc r=%55-& =->=-(-&5'5%?& ?2 ' L?5- Qa>FS >DbQ/ ccc

      =-57=& (5=a57>3-aa(-32/>FS (-32/>Dbbb

      A-2 CC'AACCa(-32S ?5)-=b.  ccc 87K ?2 5O? L?5-(/ ccc

      =-57=& s?5-a(-32/>F e ?5)-=/>FS (-32/>D e ?5)-=/>Db

      A-2 CC(7$CCa(-32S ?5)-=b.  ccc 87$(5='45%?& ?2 ' L?5- 2=?K '&?5)-=/ ccc

      =-57=& s?5-a(-32/>F E ?5)-=/>FS (-32/>D E ?5)-=/>Db

      A-2 CC-6CCa(-32S ?5)-=b.  ccc n67'35B ?2 L?5-(/ ccc

      =-57=& a(-32/>F E ?5)-=/>F dd 1b '&A a(-32/>D E ?5)-=/>D dd 1b

      A-2 CC&-CCa(-32S ?5)-=b.  ccc ;%22-=-&4- ?2 L?5-(/ ccc

      =-57=& a(-32/>F E ?5)-=/>F "d 1b ?= a(-32/>D E ?5)-=/>D "d 1b

      A-2 CC3-CCa(-32S ?5)-=b.  ccc :?O-= -67'3/ ccc

      =-57=& (-32/>F ]d ?5)-=/>F '&A (-32/>D ]d ?5)-=/>D

    ;D

  • 8/16/2019 De Novo Discovery MicroRNA From Small RNA Sequencing Data

    62/63

     8+2$*()*% !9 :%+;$

    IM

  • 8/16/2019 De No