NCBI Entrez Digital Tools and Utilities Jonathan A. Kans, Ph.D. Staff Scientist, NCBI [email protected] 1
Dec 13, 2015
Topics
• Advanced Features of Entrez (to help separate the wheat from the chaff)
• Programmatic Access with EUtils (automate repeatable multi-step queries)
• EBot Generated Scripts (if you really don't want to write a program)
2
Comparative Analysis
• Anatomy
• Physiology
• Biochemistry
• Gene Sequences
3
Central Dogma of Molecular Biology
DNA(information)
RNA(expression)
Protein(function)
transcription(polymerase)
translation(ribosome)
mRNA
CDS
4
Genetic Diseases
• Specific molecular defects explain disease
• β-globin gene and protein sequences ...ATGGTGCATCTGACTCCTGAGGAGAAG...AAGTATCACTAA... (M) V H L T P E E K ... K Y H (*)
• Sickle-cell anemia variant ...ATGGTGCATCTGACTCCTGTGGAGAAG...AAGTATCACTAA... (M) V H L T P V E K ... K Y H (*)
5
Evolutionary Conservation3000 M yr
1000 M yr
500 M yr
HumanFlyWormYeastBacteria Mouse
Human 638 RHACVEVQDEIAFIPNDVYFEKDKQMFHIITGPNMGGKSTYIRQTGVIVLMAQIGCFVPC 697Yeast 657 RHPVLEMQDDISFISNDVTLESGKGDFLIITGPNMGGKSTYIRQVGVISLMAQIGCFVPC 716E.coli 584 RHPVVEQVLNEPFIANPLNLSPQRR-MLIITGPNMGGKSTYMRQTALIALMAYIGSYVPA 642
Colon cancer gene sequence (DNA mismatch repair protein)
6
Design of Entrez
Amino acid sequence similarity
Coding region
features
Literature citations in sequence
Literature citations in sequence
MEDLINE
Nucleotide Protein
Term frequency statistics
Nucleotide sequence similarity
7
Entrez Databases
8
PubMed Search
9
PubMed Fields
10
Advanced Search
11
Field AbbreviationsAffiliation [AFFL] Issue [ISS]All Fields [ALL] Journal [JOUR]Author [AUTH] Language [LANG]Author - Corporate [COLN] Location ID [LID]Author - First [FAUT] MeSH Major Topic [MAJR]Author - Full [FULL] MeSH Subheading [SUBH]Author - Last [LAUT] MeSH Terms [MESH]Book [BOOK] Pagination [PAGE]Date - Completion [CDAT] Pharmacological Action [PAPX]Date - Create [CRDT] Publication Type [PTYP]Date - Entrez [EDAT] Publisher [PUBN]Date - MeSH [MHDA] Publisher ID [PID]Date - Modification [MDAT] Secondary Source ID [SI]Date - Publication [PDAT] Supplementary Concept [SUBS]EC/RN Number [ECNO] Text Word [WORD]Editor [ED] Title [TITL]Filter [FILT] Title/Abstract [TIAB]Grant Number [GRNT] Transliterated Title [TT]ISBN [ISBN] UID [UID]Investigator [INVR] Volume [VOL]Investigator - Full [FINV]
12
MeSH CategoriesAnatomyOrganismsDiseasesChemicals and DrugsAnalytical, Diagnostic and Therapeutic Techniques and EquipmentPsychiatry and PsychologyPhenomena and ProcessesDisciplines and OccupationsAnthropology, Education, Sociology and Social PhenomenaTechnology, Industry, AgricultureHumanitiesInformation ScienceNamed GroupsHealth CarePublication CharacteristicsGeographicals
13
Organism HierarchyEukaryota Alveolata Amoebozoa Animals Animal Population Groups Choradata Invertebrates Choanoflagellata Cryptophyta Diplomonadida Euglenozoa Fungi Haptophyta Mesomycetozoea Oxymonadida Parabasalidea Plants Retortamonadidae Rhizaria StramenopilesArchaeaBacteriaVirusesOther Forms
14
Useful Querieshumans [MESH]pharmacokinetics [MESH]chemically induced [SUBH]all child [FILT]loprovflybase [FILT]randomized controlled trial [FILT]clinical trial, phase ii [PTYP]
mammalia [ORGN]mammalia [ORGN:noexp]cds [FKEY]lacz [GENE]beta galactosidase [PROT]biomol genomic [PROP]dbxref flybase [PROP]gbdiv phg [PROP]src cultivar [PROP]srcdb refseq validated [PROP]150:200 [SLEN]
15
Structured Query
transposition [TITL] AND (protease OR peptidase) NOT humans [MESH]
16
Using History
17
History Results
18
PubMed Record
19
Neighbor Hyperlink
20
Related Citations
21
Relevant Publication
22
Selecting Target
23
GenBank Record
24
Graphical View
25
LOCUS HUMADH1CB 1400 bp mRNA PRI 15-JUN-1989DEFINITION Homo sapiens class I alcohol dehydrogenase (ADH1) alpha subunit mRNA, complete cds.ACCESSION M12271KEYWORDS alcohol dehydrogenase; dehydrogenase.SOURCE Human liver, cDNA to mRNA, clone pUCADH-alpha-15L. ORGANISM Homo sapiens Eukaryota; Animalia; Metazoa; Chordata; Vertebrata; Mammalia; Theria; Eutheria; Primates; Haplorhini; Catarrhini; Hominidae; Homo; sapiens.REFERENCE 1 (bases 1 to 1400) AUTHORS Ikuta,T., Szeto,S. and Yoshida,A. TITLE Three human alcohol dehydrogenase subunits: cDNA structure and molecular and evolutionary divergence JOURNAL Proc. Natl. Acad. Sci. U.S.A. 83 (3), 634-638 (1986) STANDARD full staff_reviewCOMMENT A draft entry and printed copy of the sequence in [1] were kindly provided by A.Yoshida, 30-MAY-1986. The other human class I ADH1 alpha subunit sequence is found under accession M11307.FEATURES Location/Qualifiers mRNA <1..1400 /note="ADH1 mRNA" CDS 16..1143 /note="alcohol dehydrogenase alpha subunit (EC 1.1.1.1)" /map="'4q21' /hgml_locus_uid='LJ0082S'" /gene="ADH1"BASE COUNT 400 a 294 c 340 g 366 tORIGIN 52 bp upstream of PvuII site; chromosome 4q21. 1 gaagacagaa tcaacatgag cacagcagga aaagtaatca aatgcaaagc agctgtgcta 61 tgggagttaa agaaaccctt ttccattgag gaggtggagg ttgcacctcc taaggcccat 121 gaagttcgta ttaagatggt ggctgtagga atctgtggca cagatgacca cgtggttagt 181 ggtaccatgg tgaccccact tcctgtgatt ttaggccatg aggcagccgg catcgtggag 241 agtgttggag aaggggtgac tacagtcaaa ccaggtgata aagtcatccc actcgctatt 301 cctcagtgtg gaaaatgcag aatttgtaaa aacccggaga gcaactactg cttgaaaaac 361 gatgtaagca atcctcaggg gaccctgcag gatggcacca gcaggttcac ctgcaggagg 421 aagcccatcc accacttcct tggcatcagc accttctcac agtacacagt ggtggatgaa 481 aatgcagtag ccaaaattga tgcagcctcg cctctagaga aagtctgtct cattggctgt 541 ggattttcaa ctggttatgg gtctgcagtc aatgttgcca aggtcacccc aggctctacc 601 tgtgctgtgt ttggcctggg aggggtcggc ctatctgcta ttatgggctg taaagcagct 661 ggggcagcca gaatcattgc ggtggacatc aacaaggaca aatttgcaaa ggccaaagag 721 ttgggggcca ctgaatgcat caaccctcaa gactacaaga aacccatcca ggaggtgcta
26
ENTRY DEHUAA #Type ProteinTITLE Alcohol dehydrogenase alpha chain - Human #EC - number 1.1.1.1DATE 28-Dec-1987 #Sequence 28-Dec-1987 #Text 30-Sep-1989PLACEMENT 27.0 1.0 1.0 1.0 1.0 SOURCE Homo sapiens # Common-name manACCESSION A25428REFERENCE (Sequence translated from the mRNA sequence) #Authors Ikuta T., Szeto S., Yoshida A. #Journal Proc. Nat. Acad. Sci. USA (1986) 83:634-638 #Title Three human alcohol dehydrogenase subunits: cDNA structure and molecular and evolutionary divergence.GENETIC #Map-position 4q21-q25 #Name ADH1SUPERFAMILY #Name alcohol dehydrogenaseKEYWORDS oxidoreductase SUMMARY #Molecular-weight 39858 #Length 375 #Checksum 7545SEQUENCE 5 10 15 20 25 30 1 M S T A G K V I K C K A A V L W E L K K P F S I E E V E V A 31 P P K A H E V R I K M V A V G I C G T D D H V V S G T M V T 61 P L P V I L G H E A A G I V E S V G E G V T T V K P G D K V 91 I P L A I P Q C G K C R I C K N P E S N Y C L K N D V S N P 121 Q G T L Q D G T S R F T C R R K P I H H F L G I S T F S Q Y 151 T V V D E N A V A K I D A A S P L E K V C L I G C G F S T G 181 Y G S A V N V A K V T P G S T C A V F G L G G V G L S A I M 211 G C K A A G A A R I I A V D I N K D K F A K A K E L G A T E 241 C I N P Q D Y K K P I Q E V L K E M T D G G V D F S F E V I 271 G R L D T M M A S L L C C H E A C G T S V I V G V P P D S Q 301 N L S M N P M L L L T G R T W K G A I L G G F K S K E C V P 331 K L V A D F M A K K F S L D A L I T H V L P F E K I N E G F 361 D L L H S G K S I R T I L M F ///
27
Same Publication?
JOURNAL Proc. Natl. Acad. Sci. U.S.A. 83 (3), 634-638 (1986)
#Journal Proc. Nat. Acad. Sci. USA (1986) 83:634-638
28
Exponential Growth
29
Sequence Identifiers
Accession: AH006997GI Number: 6849043Accn.Ver: AH006997.2FASTA: >gi|6849043|gb|AH006997.2
30
Sequence AssemblyNC_000022.9
NT_028395.3 NT_011519.10
AP000522.1
AP000523.1GATCTGATAAGTCCCAGGAC …
… TGGTATCCACCTGGGGCCTG …
join(gap(14430000),gi|89058412:1..647850,gap(150000),gi|29806588:1..3661581 …)
join(gi|5931500:1..37693,gi|5931501:2273..41306 …)
… …
… … …
31
Features and Qualifiersgene 1..417 /gene="INS" /db_xref="GeneID:449570"CDS 60..392 /gene="INS" /codon_start=1 /product="proinsulin precursor" /protein_id="NP_001008996.1" /translation="MALWMRLLPLL ... YQLENYCN"sig_peptide 60..131 /gene="INS"mat_peptide 132..389 /gene="INS" /product="Insulin"
32
Graphical Views
33
Translation ValidationDNA ...cgaaaagGTGGTAGTGTAGGAGACGGTGAAGctaaga.../translation - V V * E T V KProtein M V V L E T E K
SEQ_FEAT_StartCodon SEQ_FEAT_MismatchAA
SEQ_FEAT_InternalStop SEQ_FEAT_NotSpliceConsensusDonor
34
Alignments
• Describe relationships between sequences
• Can reflect evolutionary conservation, structural similarity, functional similarity
• Can be generated algorithmically (e.g., BLAST) or manually
MRLTLLC-------EGEEGSELPLCASCGQRIELKYKPECYPDVKNSLHVMRLTLLCCTWREERMGEEGSELPVCASCGQRLELKYKPECFPDVKNSIHAMRLTCLCRTWREERMGEEGSEIPVCASCGQRIELKYKPE-----------
35
Original Databases
Amino acid sequence similarity
Coding region
features
Literature citations in sequence
Literature citations in sequence
MEDLINE
Nucleotide Protein
Term frequency statistics
Nucleotide sequence similarity
36
Discovery Space
Nucleotide sequences
Protein sequences
Taxon
Phylogeny 3-D Structure
MMDB
3 -D Structure
PubMed abstracts
Complete Genomes
PubMed Entrez Genomes
Publishers Genome Centers
37
Data Integration
38
Leveraging ResourcesGenBank
RefSeq
Human Genome
Bacterial Genome
Virus Genome
MMDB
PubMed
UniGene(s)
LocusLink
OMIM
Taxonomy
GEO
PopSet
BLAST
Entrez
ePCR
Sequin
39
Entrez Utilities• EInfo
• ESearch
• ESummary
• EFetch
• ELink
• EPost
40
EUtils Base URL
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/program.fcgi?arguments
41
EUtils Argumentsdb pubmed | nucleotide | protein
term transposition+AND+(protease+OR+peptidase)id 172344,U54439.1
rettype abstract | acc | seqid | gb | fasta | countretmode text | xml | asn.1retstartretmax
datetype mdat | pdat | edatreldate 60
dbfrom pubmed | nucleotide | proteincmd neighborlinkname gene_snp_genegenotype
usehistory yWebEnv NCID_1_216999436_130...086_61936294query_key 1
version 2.0tool
42
rettype=abstract1. Mol Microbiol. 2012 Feb;83(4):805-20.
Separate structural and functional domains of Tn4430 transposasecontribute to target immunity.
Lambin M, Nicolas E, Oger CA, Nguyen N, Prozzi D, Hallet B.
GSK Biologicals, Rue Flemming, 20, 1300 Wavre, [email protected]
Like other transposons of the Tn3 family, Tn4430 exhibits targetimmunity, a process that prevents multiple insertions of thetransposon into the same DNA molecule. Immunity is conferred bythe terminal inverted repeats of the transposon and is specificto each element of the family, indicating that the transposase...transposition. One class of mutations was found to stimulatetransposition, whereas other mutations appeared to reduce TnpAactivity. The data are discussed with respect to alternativemodels in which TnpA acts as a specific determinant to bothestablish and respond to immunity.
PMID: 22624153 [PubMed - indexed for MEDLINE]
43
rettype=medlinePMID- 22624153OWN - NLMSTAT- MEDLINEDA - 20120523DCOM- 20120529IS - 1365-2958 (Electronic)IS - 0950-382X (Linking)VI - 83IP - 4DP - 2012 FebTI - Separate structural and functional domains of Tn4430 transposase contribute to target immunity.PG - 805-20AB - Like other transposons of the Tn3 family, Tn4430 exhibits target immunity, a process that prevents multiple insertions of the ...AD - GSK Biologicals, Rue Flemming, 20, 1300 Wavre, Belgium. [email protected] - 10.1111/j.1365-2958.2012.07967.x [doi]PST - ppublishSO - Mol Microbiol. 2012 Feb;83(4):805-20.
44
EInfo URLs
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pubmed
45
curl Command in Terminal
https://itservices.stanford.edu/service/sharedcomputing/loggingin
curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi"
46
Entrez Databases
<?xml version="1.0"?><!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD...> <eInfoResult> <DbList> <DbName>pubmed</DbName> <DbName>protein</DbName> <DbName>nuccore</DbName> <DbName>nucleotide</DbName> <DbName>nucgss</DbName> <DbName>nucest</DbName> <DbName>structure</DbName> <DbName>genome</DbName> ...
47
PubMed Fields<?xml version="1.0"?><!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD...><eInfoResult> <DbInfo> <DbName>pubmed</DbName> <MenuName>PubMed</MenuName> <Description>PubMed bibliographic record</Description> <Count>22006701</Count> <LastUpdate>2012/08/04 03:30</LastUpdate> <FieldList> ... <Field> <Name>TIAB</Name> <FullName>Title/Abstract</FullName> <Description>Free text associated with Abstract/Title</Description> <TermCount>38990504</TermCount> <IsDate>N</IsDate> <IsNumerical>N</IsNumerical> <SingleToken>N</SingleToken> <Hierarchy>N</Hierarchy> <IsHidden>N</IsHidden> </Field> ...
48
PubMed Links<?xml version="1.0"?><!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD...><eInfoResult> <DbInfo> <DbName>pubmed</DbName> <MenuName>PubMed</MenuName> ... <LinkList> ... <Link> <Name>pubmed_pubmed</Name> <Menu>Related Citations</Menu> <Description>Calculated set of PubMed ...</Description> <DbTo>pubmed</DbTo> </Link> ... <Link> <Name>pubmed_structure</Name> <Menu>Structure Links</Menu> <Description>Three-dimensional structure ...</Description> <DbTo>structure</DbTo> </Link> ...
49
ESearch URL
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=transposition+immunity
50
ESummary URL
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&version=2.0&id=2539356
51
EFetch URL
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&rettype=abstract&id=2539356
52
ELink URL
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&db=pubmed&cmd=neighbor&linkname=pubmed_pubmed&
id=2539356
53
curl GET and POST
curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=transposition+immunity"
curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"-d "db=pubmed&id=22624153,22555593,22253773,21729108,..."
54
Cluttered Result<?xml version="1.0" ?><!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN""http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd"><eSearchResult><Count>94</Count><RetMax>20</RetMax><RetStart>0</RetStart><IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id> <Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id> <Id>20481492</Id> <Id>20004590</Id> <Id>19464182</Id> <Id>19431236</Id> <Id>19237527</Id> <Id>19188259</Id> <Id>19144000</Id> <Id>19120617</Id> <Id>18931389</Id> <Id>18838147</Id> <Id>18396069</Id> <Id>17966893</Id> <Id>17709741</Id> </IdList><TranslationSet><Translation> <From>immunity</From> <To>"immunity"[MeSH Terms] OR "immunity"[All Fields]</To> </Translation></TranslationSet><TranslationStack> <TermSet> <Term>transposition[All Fields]</Term> <Field>All Fields</Field> <Count>19362</Count> <Explode>Y</Explode> </TermSet> <TermSet> <Term>"immunity"[MeSH Terms]</Term> <Field>MeSH Terms</Field> <Count>252127</Count> <Explode>Y</Explode> </TermSet> <TermSet> <Term>"immunity"[All Fields]</Term> <Field>All Fields</Field> <Count>189033</Count> <Explode>Y</Explode> </TermSet> <OP>OR</OP> <OP>GROUP</OP> <OP>AND</OP> <OP>GROUP</OP> </TranslationStack><QueryTranslation>transposition[All Fields] AND ("immunity"[MeSH Terms] OR "immunity"[All Fields])</QueryTranslation></eSearchResult>
55
Cleaned for Parsing<?xml version="1.0"?><!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD...> <eSearchResult> <Count>94</Count> <RetMax>20</RetMax> <RetStart>0</RetStart> <IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id> <Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id> <Id>20481492</Id> <Id>20004590</Id> <Id>19464182</Id> ...
56
Reformat XML
xmllint --format -
...<IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id><Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id>
...
... <IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id> <Id>21729108</Id> <Id>21695252</Id> <Id>21347312</Id> <Id>20603074</Id>
...
57
Extract ID Numbers
perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g'
...<IdList> <Id>22624153</Id> <Id>22555593</Id> <Id>22253773</Id>
...
226241532255559322253773
...
58
Remove Blank Lines
grep [0-9]
226241532255559322253773
...
2262415322555593
22253773...
59
UNIX Pipes
curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi" \-d "db=pubmed&term=transposition+immunity" | \
xmllint --format - | \perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | \
grep [0-9]
60
Resulting List of IDs
22624153225555932225377321729108216952522134731220603074204814922000459019464182...
61
UNIX Shell Script#!/bin/sh
encoded=$(echo "$2" | sed -e 's/ /%20/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' \ -e 's/(/%28/g' -e 's/)/%29/g' -e 's/,/%2c/g' -e 's/\[/%5b/g' -e 's/\]/%5d/g')
base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'suffix="&rettype=xml&retmax=200"if [ -n "$3" ]; thensuffix="&rettype=xml&retmax=200&reldate=$3"fi
res=`curl -s "$base/esearch.fcgi?db=$1&term=$encoded$suffix"`
flt=`echo $res | xmllint --format - | \ perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | grep [0-9]`
for uid in $fltdoecho "$uid"done
./esrch.sh pubmed "transposition immunity Tn3" 365
62
ESearch -> ESummary
#!/bin/shencoded=$(echo "$2" | sed -e 's/ /%20/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' \ -e 's/(/%28/g' -e 's/)/%29/g' -e 's/,/%2c/g' -e 's/\[/%5b/g' -e 's/\]/%5d/g')base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'res=`curl -s "$base/esearch.fcgi?db=$1&term=$encoded&rettype=xml&retmax=200"`flt=`echo $res | xmllint --format - | \ perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | grep [0-9]`
for uid in $fltdores=`curl -s "$base/esummary.fcgi?db=$1&version=2.0&id=$uid"`sum=`echo $res | xmllint --format -`echo "$sum"done
63
ESearch -> IDs
#!/bin/shencoded=$(echo "$2" | sed -e 's/ /%20/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' \ -e 's/(/%28/g' -e 's/)/%29/g' -e 's/,/%2c/g' -e 's/\[/%5b/g' -e 's/\]/%5d/g')base='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'res=`curl -s "$base/esearch.fcgi?db=$1&term=$encoded&rettype=xml&retmax=200"`flt=`echo $res | xmllint --format - | \ perl -nle 'print /(?<=<Id>).*?(?=<\/Id>)/g' | grep [0-9]`for uid in $fltdoecho "$uid"done
64
IDs -> ESummary
#!/bin/shbase='http://eutils.ncbi.nlm.nih.gov/entrez/eutils'while read uid; dores=`curl -s "$base/esummary.fcgi?db=$1&version=2.0&id=$uid"`sum=`echo $res | xmllint --format -`echo "$sum"done
./esrch.sh pubmed "transposition immunity" | ./esmry.sh pubmed
65
IDs -> E-Mail Notification
#!/bin/shwhile read uid; doecho $uid | mail -s "$1" "$2"done
./esrch.sh pubmed "Competitor JQ [AUTH]" 30 | \
./eping.sh "Read this new publication" "[email protected]"
66
Document Summaries<eSummaryResult> <DocumentSummarySet status="OK"> <DocumentSummary uid="22624153"> <PubDate>2012 Feb</PubDate> <EPubDate/> <Source>Mol Microbiol</Source> <Authors> <Author> <Name>Lambin M</Name> <AuthType> Author </AuthType> <ClusterID>0</ClusterID> </Author> <Author> <Name>Nicolas E</Name> <AuthType> Author </AuthType>
67
Use Historycurl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
db=pubmed&term=transposition+immunity&usehistory=y"
<eSearchResult> <Count>94</Count> <RetMax>20</RetMax> <RetStart>0</RetStart> <QueryKey>1</QueryKey> <WebEnv>NCID_1_216310091_130.14.18.97_5555_1343867165_1026563511</WebEnv> <IdList> <Id>22624153</Id> <Id>22555593</Id> ...
68
WebEnv and query_key
curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&version=2.0&query_key=1&
WebEnv=NCID_1_216310091_130.14.18.97_5555_1343867165_1026563511"
69
PERL Script#!/usr/bin/perluse LWP::Simple;
$dbase = shift or die "Must supply database on command line\n";$query = shift or die "Must supply query on command line\n";$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "esearch.fcgi?db=$dbase&term=$query&retmax=0&usehistory=y";$output = get($url);
$web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);$key = $1 if ($output =~ /<QueryKey>(\S+)<\/QueryKey>/);
$url = $base . "efetch.fcgi?db=$dbase&query_key=$key&WebEnv=$web";$url .= "&rettype=fasta&retmode=text";$data = get($url);
print "$data";
close (STDOUT);
./efaftch.pl nucleotide M65061+OR+U54469
70
ESearch -> XML#!/usr/bin/perluse LWP::Simple;
$dbase = shift or die "Must supply database on command line\n";$query = shift or die "Must supply query on command line\n";$days = shift or "";
$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "esearch.fcgi?db=$dbase&term=$query&retmax=0&usehistory=y";if ( $days ne "" ) { $url .= "&reldate=$days";}
$output = get($url);
print "$output";
close (STDOUT);
71
XML -> EFetch [1]#!/usr/bin/perluse LWP::Simple;
$dbase = shift or die "Must supply database on command line\n";$type = shift or die "Must supply rettype on command line\n";$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
while ($thisline = <STDIN>) { $thisline =~ s/\r//; $thisline =~ s/\n//; $web = $1 if ($thisline =~ /<WebEnv>(\S+)<\/WebEnv>/); $key = $1 if ($thisline =~ /<QueryKey>(\S+)<\/QueryKey>/); $num = $1 if ($thisline =~ /<Count>(\S+)<\/Count>/);}
...
72
XML -> EFetch [2]...
$start = 0;$chunk = 500;
while ( $num > 0 ) { $url = $base . "efetch.fcgi?db=$dbase&query_key=$key&WebEnv=$web"; $url .= "&retstart=$start&retmax=$chunk&rettype=$type&retmode=text";
$data = get($url);
print "$data";
$start += $chunk; $num -= $chunk;
sleep 1;}
close (STDIN);close (STDOUT);
./esrch.pl nucleotide 1322283 | ./eftch.pl nucleotide fasta
73
EBot
74
Text Query
75
Second Step
76
Output Format
77
Generate Script
78
EBot ResultDEFINITION alcohol dehydrogenase [Cyberlindnera jadinii].ACCESSION BAM34535VERSION BAM34535.1 GI:398298384DBSOURCE accession AB649224.1KEYWORDS .SOURCE Cyberlindnera jadinii ORGANISM Cyberlindnera jadinii Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomycotina; Saccharomycetes; Saccharomycetales; Phaffomycetaceae; Cyberlindnera.REFERENCE 1 AUTHORS Tamakawa,H., Tomita,Y., Yokoyama,A., Konoeda,Y. and Yoshida,S....FEATURES Location/Qualifiers source 1..348 /organism="Cyberlindnera jadinii" /strain="NBRC0988" /db_xref="taxon:4903" /note="anamorph: Candida utilis" Protein 1..348 /product="alcohol dehydrogenase" CDS 1..348 /gene="ADH1" /coded_by="AB649224.1:1..1047"ORIGIN 1 msipktqkgv ifyenggple ykdipvptpk pneilvnvky sgvchtdlha wkgdwplpvk 61 lplvgghega gvvvakgsev knfeigdyag ikwlngscms cefceksfea ncpkadlsgy 121 thdgsfqqya tadavqaaki skgtdlaeia pilcagvtvy kalktadlep gewvaisgag 181 gglgslaiqf akamglrvla idggddkkql cqelgaevfi dftktkdivk siqdatnggp 241 hgvinvsvse kaieqsteyv rncgtvvlvg lpagavaraq vfaavvksis vkgsyvgnra 301 dtreaidffe rglvkapiki vglselpevy klmeegkilg ryvvdtsk//
LOCUS EJF61282 496 aa linear PLN 12-JUL-2012DEFINITION alcohol dehydrogenase [Dichomitus squalens LYAD-421 SS1].ACCESSION EJF61282VERSION EJF61282.1 GI:395328892DBSOURCE accession JH719411.1...
79
• Entrez Programming Utilities Help
• EBot
• MeSH Browser
References
http://www.ncbi.nlm.nih.gov/books/NBK25501/
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/ebot/ebot.cgi
http://www.nlm.nih.gov/mesh/MBrowser.html
80