Advanced NCBI. The Entrez API https://github.com/lindenb/courses Pierre Lindenbaum @yokofakun [email protected]http://plindenbaum.blogspot.com Institut du Thorax. Nantes. France September 27, 2016 Pierre Lindenbaum@yokofakun [email protected]Advanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
c u r l ” h t t p : / / en . w i k i p e d i a . o rg / w i k i / Main page ”wget −O − ” h t t p : / / en . w i k i p e d i a . o rg / w i k i / Main page ”
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
LOCUS X53813 422 bp DNA l i n e a r MAM 22−JUN−1992DEFINITION Blue Whale heavy s a t e l l i t e DNA.ACCESSION X53813 X17460VERSION X53813 . 1 GI : 25KEYWORDS s a t e l l i t e DNA.SOURCE Ba l a enop t e r a muscu lus ( Blue whale )
ORGANISM Ba la enop t e r a muscu lusEukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ;Mammalia ; Eu t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ;My s t i c e t i ; B a l a e nop t e r i d a e ; Ba l a enop t e r a .
REFERENCE 1 ( ba se s 1 to 422)AUTHORS Arnason ,U. and Widegren ,B .TITLE Compos i t ion and chromosomal l o c a l i z a t i o n o f c e t acean h i g h l y
r e p e t i t i v e DNA with s p e c i a l r e f e r e n c e to the b l u e whale ,Ba l a enop t e r a muscu lus
COMMENT See a l s o <X52700−2> f o r 1 ,760 bp common ce tacean component c l o n e sand <X52703−6>,<X53811−4> f o r the 422 bp heavy s a t e l l i t e c l o n e s .
FEATURES Loca t i on / Q u a l i f i e r ss ou r c e 1 . . 4 2 2
/ organ i sm=”Ba l a enop t e r a muscu lus ”/mo l type=”genomic DNA”/ db x r e f=”taxon :9771”/ c l o n e=”7”
m i s c f e a t u r e 1 . . 4 2 2/ note=”heavy s a t e l l i t e DNA”
ORIGIN1 t a g t t a t t c a a c c t a t c c c a c t c t c t a g a t a c c c c t t a g c acgtaaagga a t a t t a t t t g
61 ggggtccagc ca tggagaa t ag t t t a ga c a c tagga tgag ataaggaaca c a c c c a t t c t121 aaagaaatca c a t t a g g a t t c t c t t t t t a a g c t g t t c c t t aaaacac tag ag t c t t a gaa181 a t c t a t t g g a ggcagaagca gtcaagggta g c c t aggg t t agggt taggc t t a ggg t t a g241 gg t t aggg ta cggc t taggg t a c t g t t t c g gggaggggtt caggtacggc g taggg ta tg301 gg t t a ggg t t agggt taggg t t a g t g t t a g gg t t agggc t cgg t t t aggg t a cggg t t ag361 ga t t aggg ta cg tg t t aggg t t aggg tagg g c t t a g g g t t agggtacgtg t t a ggg t t a g421 gg
//
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
embl {a c c e s s i o n ”X53813” ,v e r s i o n 1 } ,
g i 25 } ,d e s c r {
t i t l e ”Blue Whale heavy s a t e l l i t e DNA” ,s ou r c e {
org {taxname ” Ba l a enop t e r a muscu lus ” ,common ”Blue whale ” ,db {{
db ” taxon ” ,tag
i d 9771 } } ,orgname {
nameb i nom i a l {
genus ” Ba l a enop t e r a ” ,s p e c i e s ”muscu lus ” } ,
l i n e a g e ” Eukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ;Eu t e l e o s t om i ; Mammalia ; Eu t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ;My s t i c e t i ; B a l a e nop t e r i d a e ; Ba l a enop t e r a ” ,
gcode 1 ,mgcode 2 ,d i v ”MAM” } } ,
sub type {{
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
INSDSeq : := SEQUENCE {l o c u s V i s i b l e S t r i n g ,l e n g t h INTEGER ,s t r a nd edn e s s V i s i b l e S t r i n g OPTIONAL ,moltype V i s i b l e S t r i n g ,t opo l ogy V i s i b l e S t r i n g OPTIONAL ,d i v i s i o n V i s i b l e S t r i n g ,update−date V i s i b l e S t r i n g ,c r e a t e−date V i s i b l e S t r i n g OPTIONAL ,update−r e l e a s e V i s i b l e S t r i n g OPTIONAL ,c r e a t e−r e l e a s e V i s i b l e S t r i n g OPTIONAL ,d e f i n i t i o n V i s i b l e S t r i n g ,pr imary−a c c e s s i o n V i s i b l e S t r i n g OPTIONAL ,ent ry−v e r s i o n V i s i b l e S t r i n g OPTIONAL ,a c c e s s i o n−v e r s i o n V i s i b l e S t r i n g OPTIONAL ,othe r−s e q i d s SEQUENCE OF INSDSeqid OPTIONAL ,secondary−a c c e s s i o n s SEQUENCE OF INSDSecondary−accn OPTIONAL,p r o j e c t V i s i b l e S t r i n g OPTIONAL ,keywords SEQUENCE OF INSDKeyword OPTIONAL ,segment V i s i b l e S t r i n g OPTIONAL ,s ou r c e V i s i b l e S t r i n g OPTIONAL ,organ i sm V i s i b l e S t r i n g OPTIONAL ,taxonomy V i s i b l e S t r i n g OPTIONAL ,r e f e r e n c e s SEQUENCE OF INSDReference OPTIONAL ,comment V i s i b l e S t r i n g OPTIONAL ,comment−s e t SEQUENCE OF INSDComment OPTIONAL ,s t r u c−comments SEQUENCE OF INSDStrucComment OPTIONAL ,p r imary V i s i b l e S t r i n g OPTIONAL ,source−db V i s i b l e S t r i n g OPTIONAL ,database−r e f e r e n c e V i s i b l e S t r i n g OPTIONAL ,f e a t u r e−t a b l e SEQUENCE OF INSDFeature OPTIONAL ,f e a t u r e−s e t SEQUENCE OF INSDFeatureSet OPTIONAL ,sequence V i s i b l e S t r i n g OPTIONAL , −− Opt i ona l f o r con t i g , wgs , e t c .c o n t i g V i s i b l e S t r i n g OPTIONAL ,a l t−seq SEQUENCE OF INSDAltSeqData OPTIONAL
}
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
<?xml v e r s i o n=” 1 .0 ”?><!DOCTYPE GBSet PUBLIC ”−//NCBI//NCBI GBSeq/EN” ” h t t p : //www. ncb i . nlm . n i h . gov/ dtd /NCBI GBSeq . dtd ”><GBSet>
<GBSeq><GBSeq locus>X53813</GBSeq locus><GBSeq length>422</GBSeq length><GBSeq st randedness>doub l e</GBSeq st randedness><GBSeq moltype>DNA</GBSeq moltype><GBSeq topology> l i n e a r</GBSeq topology><GBSeq d i v i s i o n>MAM</ GBSeq d i v i s i o n><GBSeq update−date>22−JUN−1992</GBSeq update−date><GBSeq create−date>13−JUL−1990</GBSeq create−date><GBSeq d e f i n i t i o n>Blue Whale heavy s a t e l l i t e DNA</ GBSeq d e f i n i t i o n><GBSeq primary−a c c e s s i o n>X53813</GBSeq primary−a c c e s s i o n><GBSeq access ion−v e r s i o n>X53813 . 1</GBSeq access ion−v e r s i o n><GBSeq other−s e q i d s>
<GBSeqid>emb |X53813 . 1 |</GBSeqid><GBSeqid>g i |25</GBSeqid>
</GBSeq other−s e q i d s><GBSeq secondary−a c c e s s i o n s>
<GBSecondary−accn>X17460</GBSecondary−accn></GBSeq secondary−a c c e s s i o n s><GBSeq keywords>
<GBKeyword> s a t e l l i t e DNA</GBKeyword></GBSeq keywords><GBSeq source>Ba laenop t e r a muscu lus ( Blue whale )</GBSeq source><GBSeq organism>Ba laenop t e r a muscu lus</GBSeq organism><GBSeq taxonomy>Eukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ; Mammalia ; Eu t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d
a c t y l a ; Cetacea ; My s t i c e t i ; B a l a e nop t e r i d a e ; Ba l a enop t e r a</GBSeq taxonomy><GBSeq r e f e r ence s>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
<!ELEMENT GBSeq (GBSeq locus ,GBSeq length ,GBSeq s t randedness ? ,GBSeq moltype ,GBSeq topology ? ,GBSeq d i v i s i on ,GBSeq update−date ,GBSeq create−date ? ,GBSeq update−r e l e a s e ? ,GBSeq create−r e l e a s e ? ,GBSeq de f i n i t i o n ,GBSeq primary−a c c e s s i o n ? ,GBSeq entry−v e r s i o n ? ,GBSeq access ion−v e r s i o n ? ,GBSeq other−s e q i d s ? ,GBSeq secondary−a c c e s s i o n s ? ,GBSeq pro j ec t ? ,GBSeq keywords ? ,GBSeq segment ? ,GBSeq source ? ,GBSeq organism ? ,GBSeq taxonomy ? ,GBSeq r e f e r ence s ? ,GBSeq comment ? ,GBSeq comment−s e t ? ,GBSeq struc−comments ? ,( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
http://www.ncbi.nlm.nih.gov/books/NBK179288/ ”EntrezDirect (EDirect) is an advanced method for accessing the NCBI’sset of interconnected databases (publication, sequence, structure,gene, variation, expression, etc.) from a UNIX terminal window.
Functions take search terms from command-line arguments.Individual operations are combined to build multi-step queries.
Record retrieval and formatting normally complete the process.”
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
<DbName>pubmed</DbName><DbName>p r o t e i n</DbName><DbName>n u c c o r e</DbName><DbName>n u c l e o t i d e</DbName><DbName>n u c g s s</DbName><DbName>n u c e s t</DbName><DbName>s t r u c t u r e</DbName><DbName>genome</DbName><DbName>a s s e m b l y</DbName><DbName>g c a s s e m b l y</DbName><DbName>genomepr j</DbName><DbName>b i o p r o j e c t</DbName><DbName>b i o s a m p l e</DbName><DbName>b i o s y s t e m s</DbName><DbName>b l a s t d b i n f o</DbName><DbName>books</DbName><DbName>cdd</DbName><DbName>c l i n v a r</DbName>
( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
<?xml v e r s i o n=” 1 .0 ”?><e I n f o R e s u l t>
<DbInfo><DbName>pubmed</DbName><MenuName>PubMed</MenuName><De s c r i p t i o n>PubMed b i b l i o g r a p h i c r e c o r d</ De s c r i p t i o n><DbBui ld>Bui ld130805−2117m.4</DbBui ld><Count>22974581</Count><LastUpdate>2013/08/06 08 :33</ LastUpdate><F i e l d L i s t>
( . . . )<F i e l d>
<Name>UID</Name><FullName>UID</FullName><De s c r i p t i o n>Unique number a s s i g n e d to p u b l i c a t i o n</ De s c r i p t i o n><TermCount>0</TermCount><I sDa t e>N</ I sDa t e><I sNume r i c a l>Y</ I sNume r i c a l><S ing l eToken>Y</ S ing l eToken><H i e r a r c h y>N</ H i e r a r c h y><I sH idden>Y</ I sH idden>
</ F i e l d><F i e l d>
( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
” type ” : ” e i n f o ” ,” v e r s i o n ” : ”0 .3”
} ,” e i n f o r e s u l t ” : {
” db i n f o ” : {”dbname ” : ”pubmed” ,”menuname ” : ”PubMed” ,” d e s c r i p t i o n ” : ”PubMed b i b l i o g r a p h i c r e c o r d ” ,” dbbu i l d ” : ”Bui ld160921−2207m.6” ,” count ” : ”26470199” ,” l a s t u p d a t e ” : ”2016/09/22 16 :32” ,” f i e l d l i s t ” : [
{”name ” : ”ALL” ,” f u l l n ame ” : ” A l l F i e l d s ” ,” d e s c r i p t i o n ” : ” A l l te rms from a l l s e a r c h a b l e f i e l d s ” ,” termcount ” : ”179424126” ,” i s d a t e ” : ”N” ,” i s n um e r i c a l ” : ”N” ,” s i n g l e t o k e n ” : ”N” ,” h i e r a r c h y ” : ”N” ,” i s h i d d e n ” : ”N”
} ,{
”name ” : ”UID” ,” f u l l n ame ” : ”UID” ,” d e s c r i p t i o n ” : ”Unique number a s s i g n e d to p u b l i c a t i o n ” ,
The XSLT stylesheet. https://raw.githubusercontent.com/
lindenb/courses/master/about.ncbi/gquery2html.xsl
1 <?xml v e r s i o n=’ 1 .0 ’ encod ing=”UTF−8” ?>2 <x s l : s t y l e s h e e t xm l n s : x s l= ’ h t t p : //www.w3 . org /1999/XSL/Transform ’ v e r s i o n=’ 1 .0 ’>3 <x s l : o u t p u t method=”html ”/>45 <x s l : t em p l a t e match=”/”><html><body>6 <x s l : a p p l y−t emp l a t e s s e l e c t=” Re su l t ”/>7 </body></html></ x s l : t em p l a t e>89 <x s l : t em p l a t e match=” Re su l t ”>
10 <t a b l e><c ap t i o n><x s l : v a l u e−o f s e l e c t=”Term”/></ c ap t i o n>11 <t r><th>Database</ th><th>Count</ th><th>Sta tu s</ th></ t r>12 <x s l : a p p l y−t emp l a t e s s e l e c t=” eGQueryResu l t / Re su l t I t em ”/>13 </ t a b l e>14 </ x s l : t em p l a t e>1516 <x s l : t em p l a t e match=” Re su l t I t em ”>17 <t r>18 <td><a>19 <x s l : a t t r i b u t e name=” h r e f ”>h t t p : //www. ncb i . nlm . n i h . gov/<x s l : v a l u e−o f s e l e c t=”
DbName”/>?cmd=sea r ch& ; term=<x s l : v a l u e−o f s e l e c t=” t r a n s l a t e (/ Re s u l t /Term, ’ ’ , ’+ ’ ) ”/></ x s l : a t t r i b u t e>
20 <x s l : v a l u e−o f s e l e c t=”DbName”/></a></ td>21 <td><x s l : v a l u e−o f s e l e c t=”Count”/></ td>22 <td><x s l : v a l u e−o f s e l e c t=” Sta tu s ”/></ td>23 </ t r>24 </ x s l : t em p l a t e>2526 </ x s l : s t y l e s h e e t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D” |\
xm l l i n t −−fo rmat −
<eSea r c hRe su l t><Count>684</Count><RetMax>20</RetMax><Re tS t a r t>0</ Re tS t a r t><I d L i s t>
ESearchSearching for ’Mammuthus primigenius’ (JSON)
c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&retmode=j s o n ”
{” heade r ” : {
” type ” : ” e s e a r c h ” ,” v e r s i o n ” : ”0 .3”
} ,” e s e a r c h r e s u l t ” : {
” count ” : ”811” ,” retmax ” : ”20” ,” r e t s t a r t ” : ”0” ,” i d l i s t ” : [
c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&retmax=2” |\
xm l l i n t −−fo rmat −
<eSea r c hRe su l t><Count>684</Count><RetMax>2</RetMax><Re tS t a r t>0</ Re tS t a r t><I d L i s t>
<I d>507866428</ Id><I d>124056416</ Id>
</ I d L i s t><Tr a n s l a t i o n S e t>
<T r a n s l a t i o n><From>”Mammuthus p r im i g e n i u s ” [ORGN]</From><To>”Mammuthus p r im i g e n i u s ” [ Organism ]</To>
</ T r a n s l a t i o n></ T r a n s l a t i o n S e t><Tran s l a t i o nS t a c k>
<TermSet><Term>”Mammuthus p r im i g e n i u s ” [ Organism ]</Term><F i e l d>Organism</ F i e l d><Count>684</Count><Exp lode>Y</ Exp lode>
</TermSet><OP>GROUP</OP>
</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>”Mammuthus p r im i g e n i u s ” [ Organism ]</ Que r yT ran s l a t i o n>
</ eSea r chRe su l t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&retmax=3&r e t s t a r t =100” |\
xm l l i n t −−fo rmat −
<eSea r c hRe su l t><Count>684</Count><RetMax>3</RetMax><Re tS t a r t>100</ Re tS t a r t><I d L i s t>
c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&r e t t y p e=count ” |\
xm l l i n t −−fo rmat −
<e S e a r c h R e s u l t><Count>684</ Count>
</ e S e a r c h R e s u l t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&s o r t=Date+Re l ea s ed ”
xm l l i n t −−fo rmat −
<e S e a r c h R e s u l t><Count>811</ Count><RetMax>20</RetMax><R e t S t a r t>0</ R e t S t a r t>< I d L i s t><I d>1033204644</ I d><I d>1033204658</ I d><I d>1033204672</ I d><I d>1033204686</ I d><I d>1033204729</ I d><I d>1033204771</ I d><I d>1033204785</ I d><I d>1033204799</ I d><I d>1033204813</ I d><I d>1033204827</ I d><I d>1033204871</ I d><I d>1033205124</ I d><I d>1033205194</ I d><I d>1033205208</ I d><I d>1033205222</ I d><I d>1033205236</ I d><I d>1033205264</ I d><I d>1033205390</ I d>( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=nu c l e o t i d e&i d =507866428”
<eSummaryResult><DocSum><I d>507866428</ Id><I tem Name=”Capt ion ” Type=” S t r i n g ”>KC524742</ Item><I tem Name=” T i t l e ” Type=” S t r i n g ”>Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene , p a r t i a l cds</ Item><I tem Name=”Ext ra ” Type=” S t r i n g ”>g i |507866428 | gb |KC524742 . 1 | [ 5 0 7866428 ]</ Item><I tem Name=”Gi ” Type=” I n t e g e r ”>507866428</ Item><I tem Name=”CreateDate ” Type=” S t r i n g ”>2013/06/15</ Item><I tem Name=”UpdateDate” Type=” S t r i n g ”>2013/06/21</ Item><I tem Name=” F l ag s ” Type=” I n t e g e r ”>0</ Item><I tem Name=”TaxId ” Type=” I n t e g e r ”>37349</ Item><I tem Name=”Length ” Type=” I n t e g e r ”>9042</ Item><I tem Name=” Sta tu s ” Type=” S t r i n g ”> l i v e</ Item><I tem Name=”ReplacedBy ” Type=” S t r i n g ”></ Item><I tem Name=”Comment” Type=” S t r i n g ”><! [CDATA[ ] ]></ Item></DocSum></ eSummaryResult>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=nu c l e o t i d e&i d =507866428& retmode=j s o n ”
{” heade r ” : {
” type ” : ”esummary ” ,” v e r s i o n ” : ”0 .3”
} ,” r e s u l t ” : {
” u i d s ” : [”507866428”
] ,”507866428”: {
” u id ” : ”507866428” ,” c ap t i o n ” : ”KC524742 ” ,” t i t l e ” : ”Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene , p a r t i a l cds ” ,” e x t r a ” : ” g i |507866428 | gb |KC524742 . 1 | ” ,” g i ” : 507866428 ,” c r e a t e d a t e ” : ”2013/06/15” ,” updatedate ” : ”2013/06/21” ,” f l a g s ” : ”” ,” t a x i d ” : 37349 ,” s l e n ” : 9042 ,” b iomol ” : ” genomic ” ,”moltype ” : ”dna ” ,” t opo l ogy ” : ” l i n e a r ” ,” sou rcedb ” : ” i n s d ” ,” s e g s e t s i z e ” : ”” ,” p r o j e c t i d ” : ”0” ,
( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=snp&i d=25”
<eSummaryResult><DocSum><I d>25</ Id><I tem Name=”SNP ID” Type=” I n t e g e r ”>25</ Item><I tem Name=”Organism” Type=” S t r i n g ”></ Item><I tem Name=”ALLELE ORIGIN” Type=” S t r i n g ”></ Item><I tem Name=”GLOBAL MAF” Type=” S t r i n g ”>0 .4913</ Item><I tem Name=”GLOBAL POPULATION” Type=” S t r i n g ”></ Item><I tem Name=”GLOBAL SAMPLESIZE” Type=” I n t e g e r ”>0</ Item><I tem Name=”SUSPECTED” Type=” S t r i n g ”></ Item><I tem Name=”CLINICAL SIGNIFICANCE” Type=” S t r i n g ”></ Item><I tem Name=”GENE” Type=” S t r i n g ”>THSD7A</ Item><I tem Name=”LOCUS ID” Type=” I n t e g e r ”>221981</ Item><I tem Name=”ACC” Type=” S t r i n g ”>NM 015204 . 2 , NT 007819 .17</ Item><I tem Name=”CHR” Type=” S t r i n g ”>7</ Item><I tem Name=”WEIGHT” Type=” I n t e g e r ”>1</ Item><I tem Name=”HANDLE” Type=” S t r i n g ”>1000GENOMES, BGI , BL ,BUSHMAN,COMPLETE GENOMICS,CSHL−HAPMAP,GMI , ILLUMINA−UK,KWOK,PERLEGEN,SSMP,TISHKOFF</ Item><I tem Name=”FXN CLASS” Type=” S t r i n g ”>i n t r on−v a r i a n t</ Item><I tem Name=”VALIDATED” Type=” S t r i n g ”>by−1000G, by−c l u s t e r , by−f r equency , by−hapmap</ Item><I tem Name=”GTYPE” Type=” S t r i n g ”>t r u e</ Item><I tem Name=”NONREF” Type=” S t r i n g ”>f a l s e</ Item><I tem Name=”DOCSUM” Type=” S t r i n g ”>HGVS=NC 000007 .13 : g .11584142T> ;C , NG 027670 . 1 : g .292683A> ;G, NM 015204 . 2 : c .1454−1398A> ;G, NT 007819 .17 : g .11574142T> ;C|SEQ=TCTGTGAGCTTCTGCATGCAATCCT[A/G]TGCAATTGGAATTTGATAGTCCTTT|GENE=THSD7A:221981</ Item><I tem Name=”HET” Type=” I n t e g e r ”>50</ Item><I tem Name=”SRATE” Type=” I n t e g e r ”>0</ Item><I tem Name=”TAX ID” Type=” I n t e g e r ”>9606</ Item><I tem Name=”CHRRPT” Type=” S t r i n g ”>2 5 | 2 | 0 | 1 | 1 | 1 | 7 | NT 007819 .17 |11574141 |11584142 |THSD7A|0 . 499848 |0 . 00872267 | | 51 |1 | 1 |36 | 13 8 | 0 | | | T:2178 :0 .4913</ Item><I tem Name=”ORIG BUILD” Type=” I n t e g e r ”>36</ Item><I tem Name=”UPD BUILD” Type=” I n t e g e r ”>138</ Item><I tem Name=”CREATEDATE” Type=” S t r i n g ”>2000−09−19 17 :02</ Item><I tem Name=”UPDATEDATE” Type=” S t r i n g ”>2013−06−21 14 :17</ Item><I tem Name=”POP CLASS” Type=” S t r i n g ”></ Item><I tem Name=”METHOD CLASS” Type=” S t r i n g ”>computed , h y b r i d i z e , sequence , unknown</ Item><I tem Name=”SNP3D” Type=” S t r i n g ”></ Item><I tem Name=”LINKOUT” Type=” S t r i n g ”>ILLUMINA−UK| h t t p : //www. i l l um i n a . com/HumanGenomeNA18507 000019106 NCBI36 . 1 ch r7 11550667</ Item><I tem Name=”SS” Type=” I n t e g e r ”>654151077</ Item><I tem Name=”LOCSNPID” Type=” S t r i n g ”>7 11584142</ Item><I tem Name=”ALLELE” Type=” S t r i n g ”>R</ Item><I tem Name=”SNP CLASS” Type=” S t r i n g ”>snp</ Item><I tem Name=”CHRPOS” Type=” S t r i n g ”>7 :11584142</ Item><I tem Name=”CONTIGPOS” Type=” S t r i n g ”>NT 007819 .17 :11574142</ Item><I tem Name=”TEXT” Type=” S t r i n g ”></ Item><I tem Name=”LOOKUP” Type=” S t r i n g ”>325952</ Item></DocSum></ eSummaryResult>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=pubmed&i d =7939126”
<eSummaryResult><DocSum><I d>7939126</ Id><I tem Name=”PubDate” Type=”Date”>1994 Apr</ Item><I tem Name=”EPubDate” Type=”Date”></ Item><I tem Name=”Source ” Type=” S t r i n g ”>S l e ep</ Item><I tem Name=” Au tho rL i s t ” Type=” L i s t ”><I tem Name=”Author ” Type=” S t r i n g ”>Broughton R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>B i l l i n g s R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Ca r tw r i gh t R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Doucette D</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Edmeads J</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Edwardh M</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Er v i n F</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Orchard B</ Item><I tem Name=”Author ” Type=” S t r i n g ”>H i l l R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Tu r r e l l G</ Item></ Item><I tem Name=” LastAuthor ” Type=” S t r i n g ”>Tu r r e l l G</ Item><I tem Name=” T i t l e ” Type=” S t r i n g ”>Homic ida l somnambul ism: a ca se r e p o r t .</ Item><I tem Name=”Volume” Type=” S t r i n g ”>17</ Item><I tem Name=” I s s u e ” Type=” S t r i n g ”>3</ Item><I tem Name=”Pages ” Type=” S t r i n g ”>253−64</ Item><I tem Name=” LangL i s t ” Type=” L i s t ”><I tem Name=”Lang” Type=” S t r i n g ”>Eng l i s h</ Item></ Item><I tem Name=”NlmUniqueID” Type=” S t r i n g ”>7809084</ Item><I tem Name=”ISSN” Type=” S t r i n g ”>0161−8105</ Item><I tem Name=”ESSN” Type=” S t r i n g ”>1550−9109</ Item><I tem Name=”PubTypeList ” Type=” L i s t ”><I tem Name=”PubType” Type=” S t r i n g ”>Jou r na l A r t i c l e</ Item></ Item><I tem Name=”Reco rdSta tus ” Type=” S t r i n g ”>PubMed − i ndexed f o r MEDLINE</ Item><I tem Name=”PubStatus ” Type=” S t r i n g ”>ppub l i s h</ Item><I tem Name=” A r t i c l e I d s ” Type=” L i s t ”><I tem Name=”pubmed” Type=” S t r i n g ”>7939126</ Item><I tem Name=” e i d ” Type=” S t r i n g ”>7939126</ Item><I tem Name=” r i d ” Type=” S t r i n g ”>7939126</ Item></ Item><I tem Name=” H i s t o r y ” Type=” L i s t ”><I tem Name=”pubmed” Type=”Date”>1994/04/01 00 :00</ Item><I tem Name=”med l i ne ” Type=”Date”>1994/04/01 00 :01</ Item><I tem Name=” en t r e z ” Type=”Date”>1994/04/01 00 :00</ Item></ Item><I tem Name=” Re f e r e n c e s ” Type=” L i s t ”></ Item><I tem Name=”HasAbst rac t ” Type=” I n t e g e r ”>1</ Item><I tem Name=”PmcRefCount” Type=” I n t e g e r ”>4</ Item><I tem Name=”Fu l l Journa lName ” Type=” S t r i n g ”>S l e ep</ Item><I tem Name=”ELocat ion ID ” Type=” S t r i n g ”></ Item><I tem Name=”SO” Type=” S t r i n g ”>1994 Apr ; 1 7 ( 3 ) :253−64</ Item></DocSum></ eSummaryResult>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
<?xml v e r s i o n=” 1 .0 ”?><!DOCTYPE TSeqSet PUBLIC ”−//NCBI//NCBI TSeq/EN”<TSeqSet>
<TSeq><TSeq seqtype v a l u e=” n u c l e o t i d e ”/><TSeq g i>507866428</TSeq g i><TSeq accver>KC524742 . 1</TSeq accver><TSeq tax id>37349</TSeq tax id><TSeq orgname>Mammuthus p r im i g e n i u s</TSeq orgnam<TSeq d e f l i n e>Mammuthus p r im i g e n i u s i s o l a t e CME2<TSeq length>9042</TSeq length><TSeq sequence>GCACTTGCTTTTTTTGTCTTCTTCAGACCACGA
</TSeq></TSeqSet>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
<GBSeq><GBSeq locus>KC524742</GBSeq locus><GBSeq length>9042</GBSeq length><GBSeq st randedness>doub l e</GBSeq st randedness><GBSeq moltype>DNA</GBSeq moltype><GBSeq topology> l i n e a r</GBSeq topology><GBSeq d i v i s i o n>MAM</ GBSeq d i v i s i o n><GBSeq update−date>21−JUN−2013</GBSeq update−date><GBSeq create−date>15−JUN−2013</GBSeq create−date><GBSeq d e f i n i t i o n>Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene , p a r t i a l cds</ GBSeq d e f i n i t i o n><GBSeq primary−a c c e s s i o n>KC524742</GBSeq primary−a c c e s s i o n><GBSeq access ion−v e r s i o n>KC524742 . 1</GBSeq access ion−v e r s i o n><GBSeq other−s e q i d s>
<GBSeqid>gb |KC524742 . 1 |</GBSeqid><GBSeqid>g i |507866428</GBSeqid>
</GBSeq other−s e q i d s><GBSeq source>Mammuthus p r im i g e n i u s ( woo l l y mammoth)</GBSeq source><GBSeq organism>Mammuthus p r im i g e n i u s</GBSeq organism>
( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013DEFINITION Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene ,
p a r t i a l cds .ACCESSION KC524742VERSION KC524742 . 1 GI :507866428KEYWORDS .SOURCE Mammuthus p r im i g e n i u s ( woo l l y mammoth)
ORGANISM Mammuthus p r im i g e n i u sEukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ;Mammalia ; Eu t h e r i a ; A f r o t h e r i a ; P robo s c i d ea ; E l e phan t i d a e ;Mammuthus .
REFERENCE 1 ( ba se s 1 to 9042)AUTHORS Mirceta , S . , S ignore ,A .V . , Burns , J .M. , Cos s i n s ,A .R . , Campbel l ,K. L .
and Berenbr ink ,M.TITLE Evo l u t i o n o f mammalian d i v i n g c a p a c i t y t r a c e d by myog lob in net
s u r f a c e cha rgeJOURNAL Sc i e n c e 340 (6138) , 1234192 (2013)PUBMED 23766330
REFERENCE 2 ( ba se s 1 to 9042)AUTHORS Signore ,A .V . , Campbel l ,K. L . and Poinar ,H.N.TITLE D i r e c t Submis s i onJOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sc i ence s , U n i v e r s i t y o f
Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , CanadaCOMMENT ##Assembly−Data−START##
Sequenc ing Technology : : Sanger d i d eoxy s equenc i ng##Assembly−Data−END##
FEATURES Loca t i on / Q u a l i f i e r ssource 1 . . 9 042
/ organ i sm=”Mammuthus p r im i g e n i u s ”/mo l type=”genomic DNA”/ i s o l a t e=”CME2005/915”/ d b x r e f=” taxon :37349 ”/ t i s s u e t y p e=”bone”
gene <35..>9042/gene=”Mb”
mRNA j o i n ( <35 . .129 ,5627 . .5849 ,8979 . . >9042)/ gene=”Mb”/ product=”myog lob in ”
CDS j o i n (35 . . 129 , 5627 . . 5849 , 8979 . . >9042 )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013DEFINITION Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene ,
p a r t i a l cds .ACCESSION KC524742VERSION KC524742 . 1 GI :507866428KEYWORDS .SOURCE Mammuthus p r im i g e n i u s ( woo l l y mammoth)
ORGANISM Mammuthus p r im i g e n i u sEukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ;Mammalia ; Eu t h e r i a ; A f r o t h e r i a ; P robo s c i d ea ; E l e phan t i d a e ;Mammuthus .
REFERENCE 1 ( ba se s 1 to 9042)AUTHORS Mirceta , S . , S ignore ,A .V . , Burns , J .M. , Cos s i n s ,A .R . , Campbel l ,K. L .
and Berenbr ink ,M.TITLE Evo l u t i o n o f mammalian d i v i n g c a p a c i t y t r a c e d by myog lob in net
s u r f a c e cha rgeJOURNAL Sc i e n c e 340 (6138) , 1234192 (2013)PUBMED 23766330
REFERENCE 2 ( ba se s 1 to 9042)AUTHORS Signore ,A .V . , Campbel l ,K. L . and Poinar ,H.N.TITLE D i r e c t Submis s i onJOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sc i ence s , U n i v e r s i t y o f
Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , CanadaCOMMENT ##Assembly−Data−START##
Sequenc ing Technology : : Sanger d i d eoxy s equenc i ng##Assembly−Data−END##
FEATURES Loca t i on / Q u a l i f i e r ssource 1 . . 9 042
/ organ i sm=”Mammuthus p r im i g e n i u s ”/mo l type=”genomic DNA”/ i s o l a t e=”CME2005/915”/ d b x r e f=” taxon :37349 ”/ t i s s u e t y p e=”bone”
gene <35..>9042/gene=”Mb”
mRNA j o i n ( <35 . .129 ,5627 . .5849 ,8979 . . >9042)/ gene=”Mb”/ product=”myog lob in ”
CDS j o i n (35 . . 129 , 5627 . . 5849 , 8979 . . >9042 )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Web environment string returned from a previous ESearch, EPostor ELink call. When provided, ESearch will post the results of thesearch operation to this pre-existing WebEnv.
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Searching extinct species in the NCBI taxonomy (’extinct[PROP]’)c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?usehistory=y&db=
taxonomy&term=e x t i n c t%5BPROP%5D”
<eSea r c hRe su l t><Count>145</Count><RetMax>20</RetMax><Re tS t a r t>0</ Re tS t a r t><QueryKey>1</QueryKey><WebEnv>NCID 1 75550312 130.14.18.34 9001 1375948145 325582538</WebEnv><I d L i s t>
Fetch the extinct species in the NCBI taxonomy (’extinct[PROP]’)using the WebEnv parameter.
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=taxonomy&query key=1&WebEnv=NCID 1 75550312 130.14.18.34 9001 1375948145 325582538&retmode=xml”
<TaxaSet><Taxon><TaxId>1225531</TaxId><Sc i e n t i f i cName>Equus ovodov i</ S c i e n t i f i cName><OtherNames>
wget −O − ’ h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&term=e x t i n c t [PROP]& retmax=1000 ’ |\
xm l l i n t −fo rmat − |\grep ’< Id>’ |\cut −d ’< ’ −f 2 |\cut −d ’> ’ −f 2|\t r ”\n” ” , ”
wget −O − ’ h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / epo s t . f c g i ?db=taxonomy&WebEnd=NCID 1 15435144 130 . 1 4 . 2 2 . 2 1 59001 1474637318 669113391 0MetA0 S MegaStore F 1&i d=1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772. . . ”
Output:
<?xml v e r s i o n=” 1 .0 ”?><!DOCTYPE ePo s tRe su l t PUBLIC ”−//NLM//DTD ePos tResu l t , 11 May 2002//EN” ” h t t p : //
www. ncb i . nlm . n i h . gov/ e n t r e z / query /DTD/ ePost 020511 . dtd ”><ePo s tRe su l t><QueryKey>1</QueryKey><WebEnv>NCID 1 15467192 130 . 1 4 . 2 2 . 2 1 5
9001 1474637456 570452194 0MetA0 S MegaStore F 1</WebEnv></ ePo s tRe su l t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&term=Homo%20Sapiens&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 49001 1375948145 325582538&que r y k ey=1”
<eSea r c hRe su l t><Count>0</Count><RetMax>0</RetMax><Re tS t a r t>0</ Re tS t a r t><QueryKey>8</QueryKey><WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv><I d L i s t /><Tr a n s l a t i o n S e t /><Tran s l a t i o nS t a c k>
<OP>GROUP</OP><TermSet>
<Term>homo s a p i e n s [ A l l Names ]</Term><F i e l d>A l l Names</ F i e l d><Count>0</Count><Exp lode>N</ Exp lode>
</TermSet><OP>AND</OP>
</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>(#2) AND homo s a p i e n s [ A l l Names ]</ Que r yT ran s l a t i o n>
</ eSea r chRe su l t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&term=Tyrannosaurus&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 49001 1375948145 325582538&que r y k ey=1”
<eSea r c hRe su l t><Count>1</Count><RetMax>1</RetMax><Re tS t a r t>0</ Re tS t a r t><QueryKey>9</QueryKey><WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv><I d L i s t>
<I d>436494</ Id></ I d L i s t><Tr a n s l a t i o n S e t /><Tran s l a t i o nS t a c k>
<OP>GROUP</OP><TermSet>
<Term>Tyrannosaurus [ A l l Names ]</Term><F i e l d>A l l Names</ F i e l d><Count>1</Count><Exp lode>N</ Exp lode>
</TermSet><OP>AND</OP>
</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>(#2) AND Tyrannosaurus [ A l l Names ]</ Que r yT ran s l a t i o n>
</ eSea r chRe su l t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
e s e a r c h −db pubmed −q u e r y ” T y r a n n o s a u r u s ” | \e f i l t e r −mindate 2005 | \e f e t c h −fo rmat docsum | \x t r a c t −p a t t e r n DocumentSummary \−e l em en t M e d l i n e C i t a t i o n /PMID \−e l em en t I d S o r t F i r s t A u t h o r
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
ELinkSearching the pubmed records associated to sequence gi:507866428
h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e l i n k . f c g i ? dbfrom=nu c l e o t i d e&db=pubmed&i d =507866428&cmd=n e i g h b o r s c o r e
$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&i d =23766330& r e t t y p e=med l i ne&retmode=t e x t ”
PMID− 23766330TI − Evo l u t i o n o f mammalian d i v i n g c a p a c i t y t r a c e d by myog lob in net s u r f a c e
cha rge .PG − 1234192LID − 10.1126/ s c i e n c e .1234192 [ do i ]
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Using the stylesheethttps://github.com/lindenb/xslt-sandbox/blob/master/
stylesheets/bio/ncbi/gb2svg.xsl
x s l t p r o c <( c u r l ” h t t p s : // raw . g i t hub . com/ l i n d e n b / x s l t−sandbox /master / s t y l e s h e e t s/ b i o / ncb i / gb2svg . x s l ” ) \
” h t t p s : //www. ncb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=nu c l e o t i d e&i d=14971102&retmode=xml&r e t t y p e=gbc”
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
1 <?xml v e r s i o n=” 1 .0 ” encod ing=”UTF−8”?>2 <s v g : s v g xm ln s : s v g=” h t t p : //www.w3 . org /2000/ svg ” h e i g h t=”121” width=”920” s t y l e=”
s t r oke−wid th : 1px ; ”>3 <s v g : t i t l e>Human r o t a v i r u s segment 7 NSP3 gene , complete cds</ s v g : t i t l e>4 <s v g : d e f s>5 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=” grad ”>6 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=” b l a ck ”/>7 <s v g : s t o p o f f s e t=”50%” stop−c o l o r=”whitesmoke ”/>8 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” b l a ck ”/>9 </ s v g : l i n e a r G r a d i e n t>
10 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=”v e r t i c a l b o d y g r a d i e n t ”>
11 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=”wh i t e ”/>12 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” l i g h t g r a y ”/>13 </ s v g : l i n e a r G r a d i e n t>14 </ s v g : d e f s>15 <s v g : s t y l e type=” t e x t / c s s ”/>16 <s v g : g>17 <s v g : g t r an s f o rm=” t r a n s l a t e (0 , 0 ) ”>18 <s v g : r e c t x=”0” y=”0” width=”920” h e i g h t=”120” f i l l =” u r l (#
v e r t i c a l b o d y g r a d i e n t ) ” s t r o k e=” b l a c k ”/>19 <s v g : t e x t s t y l e=” c o l o r : r e d ; font−s i z e : 3 5 p x ; ” x=”10” y=”35”>Human r o t a v i r u s
segment 7 NSP3 gene , complete cds</ s v g : t e x t>20 <s v g : g>21 <s v g : r e c t x=”10” y=”40” width=”900” h e i g h t=”18” s t y l e=” f i l l : u r l (#grad ) ;
s t r o k e : b l a c k ; ” t i t l e=” 1 . . 1 074 ”/>22 <s v g : t e x t y=”54” x=”460” tex t−anchor=”midd le ”><s v g : t s p a n s t y l e=” font−
we i g h t : b o l d ; ”>s ou r c e</ s v g : t s p a n><s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3. org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=” h t t p : //www.w3 . org /1999/x l i n k ” font−we ight=” bo ld ”>organ i sm</ s v g : t s p a n>:Human r o t a v i r u s A <s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ”xm l n s : x l i n k=” h t t p : //www.w3 . org /1999/ x l i n k ” font−we ight=” bo ld ”>mol type</ s v g : t s p a n>: genomic RNA <s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=” h t t p : //www.w3 . org/1999/ x l i n k ” font−we ight=” bo ld ”>s t r a i n</ s v g : t s p a n>:M <s v g : t s p a nxm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org /1999/ x l i n k ” font−we ight=” bo ld ”>segment</ s v g : t s p a n>: 7 <s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e” xm l n s : x l i n k=” h t t p : //www.w3 . org /1999/ x l i n k ” font−we ight=” bo ld ”>c l o n e</ s v g : t s p a n>:M0</ s v g : t e x t>
23 </ s v g : g>24 <s v g : g>25 <s v g : r e c t x=”10” y=”60” width=”27.6794035414725 ” h e i g h t=”18” s t y l e=”
f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e=” 1 . . 3 4 ”/>26 <s v g : t e x t y=”74” x=”39.6794035414725 ” tex t−anchor=” s t a r t ”>27 <s v g : t s p a n s t y l e=” font−we i g h t : b o l d ; ”>5 ’UTR</s vg : t s pan>28 </s v g : t e x t>29 </svg :g>30 <svg :g>31 <s v g : r e c t x=”38.5181733457595” y=”80” width =”781.733457595526” h e i g h t
=”18” s t y l e=” f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e =”35..967”/>32 <s v g : t e x t y=”94” x=”429.384902143523” tex t−anchor=”midd le”><s v g : t s p a n
s t y l e=”font−we i g h t : b o l d ;”>CDS</s vg : t s pan><s v g : t s p a n xm l n s : x s i=”h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org /1999/ x l i n k ”font−we ight=”bo ld”>codon s t a r t</s vg : t s pan>: 1 <s v g : t s p a n xm l n s : x s i=”h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org /1999/x l i n k ” font−we ight=”bo ld”>product</s vg : t s pan>:NSP3 <s v g : t s p a n xm l n s : x s i=”h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org/1999/ x l i n k ” font−we ight=”bo ld”>p r o t e i n i d </s vg : t s pan>:AAK74116.1</ s v g : t e x t>
33 </svg :g>34 <svg :g>35 <s v g : r e c t x=”821.090400745573” y=”100” width =”88.909599254427” h e i g h t
=”18” s t y l e=” f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e =”968..1074”/>36 <s v g : t e x t y=”114” x=”819.090400745573” tex t−anchor=”end”>37 <s v g : t s p a n s t y l e=”font−we i g h t : b o l d ;”>3 ’UTR</ s v g : t s p a n>38 </ s v g : t e x t>39 </ s v g : g>40 <s v g : r e c t x=”0” y=”0” width=”920” h e i g h t=”120” f i l l =”none” s t r o k e=” b l a ck ”/
>41 </ s v g : g>42 </ s v g : g>43 </ s v g : s v g>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=pubmed&term=Tyrannosaurus&u s e h i s t o r y=t r u e ” | xm l l i n t −−fo rmat −
$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 59001 1375957034 1619786167&que r y k ey=1&retmode=xml”
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
1 <?xml v e r s i o n=’ 1 .0 ’ encod ing=”UTF−8” ?>2 <x s l : s t y l e s h e e t xm l n s : x s l= ’ h t t p : //www.w3 . org /1999/XSL/Transform ’ v e r s i o n=’ 1 .0 ’>3 <x s l : o u t p u t method=” t e x t ”/>456 <x s l : t em p l a t e match=”/”>7 date2count &l t ;− l i s t ( )8 <x s l : a p p l y−t emp l a t e s s e l e c t=”/PubmedArt i c l eSet / PubmedArt i c l e [ Med l i n eC i t a t i o n /
DateCreated /Year ] ”/>9 d f &l t ;− data . f rame (
10 Year=as . i n t e g e r ( names ( date2count ) ) ,11 Count=u n l i s t ( date2count )12 )13 png ( ’ j e te rpubmed . png ’ )14 p l o t ( d f )15 t i t l e ( ’ pubmed: count ( a r t i c l e s )=f ( y ea r ) ’ )16 dev . o f f ( )17 </ x s l : t em p l a t e>1819 <x s l : t em p l a t e match=”PubmedArt i c l e ”>20 <x s l : v a r i a b l e name=” yea r ” s e l e c t=”Med l i n eC i t a t i o n /DateCreated /Year ”/>21 date2count [ [ ”<x s l : v a l u e−o f s e l e c t=”$ yea r ”/>” ] ] & l t ;− i f e l s e ( i s . n u l l ( da te2count [ [
”<x s l : v a l u e−o f s e l e c t=”$ yea r ”/>” ] ] ) ,1 ,1+ date2count [ [ ”<x s l : v a l u e−o f s e l e c t=”$ yea r ”/>” ] ] )
22 </ x s l : t em p l a t e>2324 </ x s l : s t y l e s h e e t>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 59001 1375957034 1619786167&que r y k ey=1&retmode=xml” |\
x s l t p r o c pubmed2rs ta t s . x s l −
date2count <− l i s t ( )
da te2count [ [ ”2013” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2013” ] ] ) ,1 ,1+ date2count [ [ ”2013” ] ] )
da te2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ”2012” ] ] )
da te2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ”2012” ] ] )
da te2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ”2011” ] ] )
da te2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ”2011” ] ] )
( . . )df <− data . frame (Year=as . i n t e g e r (names ( date2count ) ) ,Count=u n l i s t ( date2count ))png ( ’ j e te rpubmed . png ’ )p l o t ( df )t i t l e ( ’ pubmed : count ( a r t i c l e s )=f ( y ea r ) ’ )dev . o f f ( )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 59001 1375957034 1619786167&que r y k ey=1&retmode=xml” |\
x s l t p r o c pubmed2rs ta t s . x s l − |\R −−no−save
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
<?xml v e r s i o n=” 1 .0 ” encod ing=”UTF−8”?><xsd : schema xm ln s : x s d=” h t t p : //www.w3 . org /2001/XMLSchema” xmlns=” h t t p : //www. ncb i . nlm . n i h . gov/SNP/docsum” targetNamespace=” h t t p : //www. ncb i . nlm . n i h . gov/SNP/docsum” e lementFormDefault=” q u a l i f i e d ” a t t r i b u t eFo rmDe f a u l t=” u n q u a l i f i e d ”><x s d : e l emen t name=”ExchangeSet ”>
<x s d : a n n o t a t i o n><x sd :documenta t i on>Set o f dbSNP refSNP docsums , v e r s i o n 3 .4</ x sd :documenta t i on>
</ x s d : a n n o t a t i o n><xsd :complexType>
<x s d : s e qu en c e><x s d : e l emen t name=”SourceDatabase ” minOccurs=”0”>
<xsd :complexType><x s d : a t t r i b u t e name=” t a x I d ” type=” x s d : i n t ” use=” r e q u i r e d ”>
<x s d : a n n o t a t i o n><x sd :documenta t i on>NCBI taxonomy ID f o r v a r i a t i o n</ x sd :documenta t i on>
</ x s d : a n n o t a t i o n></ x s d : a t t r i b u t e><x s d : a t t r i b u t e name=” organ i sm ” type=” x s d : s t r i n g ” use=” r e q u i r e d ”>
<x s d : a n n o t a t i o n><x sd :documenta t i on>common name f o r s p e c i e s used as pa r t o f da tabase name .</ x sd :documenta t i on>
</ x s d : a n n o t a t i o n></ x s d : a t t r i b u t e><x s d : a t t r i b u t e name=”dbSnpOrgAbbr” type=” x s d : s t r i n g ”>
<x s d : a n n o t a t i o n><x sd :documenta t i on>organ i sm a b b r e v i a t i o n used i n dbSNP . </ x sd :documenta t i on>
</ x s d : a n n o t a t i o n></ x s d : a t t r i b u t e><x s d : a t t r i b u t e name=”gpipeOrgAbbr ” type=” x s d : s t r i n g ”>
<x s d : a n n o t a t i o n><x sd :documenta t i on>organ i sm a b b r e v i a t i o n used w i t h i n NCBI genome p i p e l i n e data dumps .</ x sd :documenta t i on>
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Using the XML schemaCompiling the XML Schema for dbSNP with XJC
$ x j c −d . ” f t p : // f t p . n cb i . nlm . n i h . gov/ snp/ spe c s /docsum 3 . 4 . xsd ”p a r s i n g a schema . . .c omp i l i n g a schema . . .h t t p s /www ncb i n lm n ih gov / snp/docsum/Assay . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Assembly . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/BaseURL . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Component . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/ExchangeSet . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/FxnSet . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/MapLoc . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Ob j e c tFac to r y . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Pr imarySequence . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Rs . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/RsL inkout . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/ RsSt ruc t . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Ss . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/package−i n f o . j a v a
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Using the XML schemaCompiling the XML Schema for dbSNP with XJC
Search the non-genomic rs# in dbSNP.1 import h t t p s . www ncb i n lm n ih gov . snp . docsum .∗ ;2 import j a v a x . xml . b ind .∗ ;3 import j a v a x . xml . s t ream .∗ ;4 import j a v a x . xml . s t ream . e v en t s .∗ ;5 c l a s s ParseDbSnp6 {7 pub l i c s t a t i c vo id main ( S t r i n g [ ] a r g s ) throws Excep t i on8 {9 JAXBContext j a xbC t x t=JAXBContext . new In s tance ( ” h t t p s . www ncb i n lm n ih gov
. snp . docsum” ) ;10 Unmar sha l l e r u nma r s h a l l e r=j a xbC t x t . c r e a t eUnma r s h a l l e r ( ) ;11 XMLInputFactory i f a c t o r y = XMLInputFactory . new Ins tance ( ) ;12 XMLEventReader r= i f a c t o r y . createXMLEventReader ( System . i n ) ;13 wh i l e ( r . hasNext ( ) )14 {15 XMLEvent ev t=r . peek ( ) ;16 i f ( ! ( e v t . i s S t a r t E l emen t ( ) && ev t . a sS t a r tE l emen t ( ) . getName ( ) .
g e t Lo c a lPa r t ( ) . e q u a l s ( ”Rs” ) ) )17 {18 ev t=r . nex tEvent ( ) ;19 cont inue ;20 }2122 Rs r s=unma r s h a l l e r . unmarsha l ( r , Rs . c l a s s ) . ge tVa lue ( ) ;23 i f ( ” genomic ” . e qua l s ( r s . getMolType ( ) ) ) cont inue ;24 System . out . p r i n t l n ( ” r s ”+r s . g e tRs I d ( )+” ”+r s . getMolType ( ) ) ;25 }26 r . c l o s e ( ) ;27 }28 }
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Using the XML schemaCompiling the XML Schema for dbSNP with XJC
compile...$ j a v a c ParseDbSnp . j a v a h t t p s /www ncb i n lm n ih gov / snp/docsum/∗ . j a v a
and run...$ c u r l −s ” f t p : // f t p . n cb i . n i h . gov/ snp/ o rgan i sms /human 9606/XML/ ds ch1 . xml . gz” |\gunz ip −c |\j a v a ParseDbSnp
#!/ u s r / b i n / p e r l( . . . )# PUBLIC DOMAIN NOTICE# Nat i o na l Cente r f o r B i o t e chno l ogy I n f o rma t i o nuse LWP: : S imple ;use LWP: : UserAgent ;use Net : : FTP ;
my $de l a y = 0 ;my $maxdelay = 3 ;my $base = ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /” ;
$params{ ema i l} = ”nobody@nowhere . com” ;$params{db} = ” nuccore ” ;$params{ t o o l} = ” ebot ” ;$params{term} = ”Mammuthus+p r im i g e n i u s [ORGN] ” ;%params = e s e a r c h (%params ) ;
$params{retmode} = ”xml” ;$params{ o u t f i l e } = ” r e s u l t . xml” ;$params{ r e t t y p e} = ” na t i v e ” ;e f e t c h b a t c h (%params ) ;
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
c u r l −o p r o t e i n . f a . gz \” f t p : // f t p . n cb i . n i h . gov/genomes/ A p i s m e l l i f e r a / p r o t e i n / p r o t e i n . f a . gz”
gunz ip p r o t e i n . f a . gz
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Standalone BlastCreate a Blast database with makeblastdb
Getting help...
$ makeb lastdb −h e l p( . . . )−dbtype <S t r i n g , ‘ nuc l ’ , ‘ p rot ’>
M o l e c u l e type o f t a r g e t db− i n <F i l e I n >
I n p u t f i l e / d a t a b a s e nameD e f a u l t = ‘− ’
− i n p u t t y p e <S t r i n g , ‘ a s n 1 b i n ’ , ‘ a s n 1 t x t ’ , ‘ b l a s t d b ’ , ‘ f a s t a ’>Type o f the data s p e c i f i e d i n i n p u t f i l eD e f a u l t = ‘ f a s t a ’
( . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Standalone BlastCreate a Blast database with makeblastdb
Create the BLAST database:
$ makeb lastdb − i n p r o t e i n . f a −dbtype p r o t
B u i l d i n g a new DB, c u r r e n t t ime : 09/02/2013 1 8 : 2 9 : 3 8New DB name : p r o t e i n . f aNew DB t i t l e : p r o t e i n . f aSequence type : P r o t e i nKeep L i n k o u t s : TKeep MBits : TMaximum f i l e s i z e : 1000000000BAdding s e q u e n c e s from FASTA ; added 10570 s e q u e n c e s i n 1 .84458 s e c o n d s .
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n&r e t t y p e=f a s t a&i d =187956781” |\
b l a s t p −db p r o t e i n . f a
Query= g i |187956781 | gb |AAI40897 . 1 | EIF4G1 p r o t e i n [Homo s a p i e n s ]( . . . )
Score ESequences p roduc i ng s i g n i f i c a n t a l i g nmen t s : ( B i t s ) Value
g i |328782175 | r e f |XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o n . . . 189 4e−49g i |328779480 | r e f | XP 003249661 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38 .1 0 .017g i |110762568 | r e f | XP 001121713 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38 .1 0 .018
( . . . )> g i |328782175 | r e f |XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o ni n i t i a t i o n f a c t o r 4 gamma 2− l i k e [ Ap i s m e l l i f e r a ]Length=899
Score = 189 b i t s (479) , Expect = 4e−49, Method : Compos i t i ona l mat r i x a d j u s t .I d e n t i t i e s = 115/319 (36%) , P o s i t i v e s = 175/319 (55%) , Gaps = 39/319 (12%)
Query 717 KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSI 774++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR I
Sb j c t 22 RKPSETTVGLVIKDDIRSLSTEQRWIPPSTLRRDALTPE−−−−−−−−SRNNFIFRKVRGI 73
Query 775 LNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCL−−−−− 829LNKLTP+ F +L + + ++++ LKGVI LIFEKA+ EP +S YA +C+ L
Sb j c t 74 LNKLTPEKFAKLSNDLLNVELNSDVILKGVIFLIFEKALDEPKYSSMYAQLCKRLSDEAA 133
Query 830 −MALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLK 888K E F LLL++C+ EFE E FE + DE EE
Sb j c t 134 NFEPKKALIESQKGQSTFTFLLLSKCRDEFENRSKASEAFENQ−−−−DELGPEEE−−−−− 184Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses
Standalone BlastBlast human EIF4G1 gi:187956781 , ouput XML
$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n&r e t t y p e=f a s t a&i d =187956781” |\
b l a s t p −db p r o t e i n . f a −outfmt 5
( . . . )<H i t h s p s>
<Hsp><Hsp num>1</Hsp num><Hsp b i t−s c o r e>189.119</Hsp b i t−s c o r e><Hsp sco r e>479</ Hsp sco r e><Hsp eva l ue>3.78314 e−49</ Hsp eva l ue><Hsp query−from>717</Hsp query−from><Hsp query−to>1017</Hsp query−to><Hsp h i t−from>22</Hsp h i t−from><Hsp h i t−to>319</Hsp h i t−to><Hsp query−f rame>0</Hsp query−f rame><Hsp h i t−f rame>0</Hsp h i t−f rame><Hs p i d e n t i t y>115</ H s p i d e n t i t y><Hs p p o s i t i v e>175</ H s p p o s i t i v e><Hsp gaps>39</Hsp gaps><Hsp a l i gn−l e n>319</ Hsp a l i gn−l e n><Hsp qseq>KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCL−−−−−−MALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARD
VAKRKMLGNIKFIGELGKLGIVSETILHRCILQLLEKKRRRRSRGDTAEDIECLCQIMRTCGRILDSDKGRGLMDQYFKRMNSLAESRDLPLRIKFMLRDVIELRRDGWVPRKATSTEGPMPINQIRNDNE</Hsp hseq><Hsp m id l i n e>++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR ILNKLTP+ F +L
+ + ++++ LKGVI LIFEKA+ EP +S YA +C+ L K E F LLL++C+ EFEE FE + DE EE ER +A+R+ LGNIKFIGEL KL +++E I+H C+++LL + E +ECLC+++ T G+ LD +K + MDQYF +M
+ + + RI+FML+DV++LR WVPR+ +GP I+QI + E</ Hsp m id l i n e></Hsp>
$ c u r l ” h t t p s : //www. ncb i . nlm . n i h . gov/ b l a s t / B l a s t . c g i ?CMD=Put&QUERY=PAERLMERKADIE&DATABASE=nr&PROGRAM=b l a s t p&FILTER=L&HITLIST SZE=500”
( . . . )
<!−−QBla s t I n f oBeg i nRID = 1NRYGX9K014RTOE = 29
QBlas t In foEnd−−>
( . . . )
Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses