Page 1
Overview of the ISA format and software suite
Help researchers to
curate, store, analyse, share and publish their experiments
Susanna-Assunta Sansone, PhD (associate director, PI)
Philippe Rocca-Serra, PhD (technical coordinator)
Alejandra Gonzalez-Beltran, PhD (senior developer)
Eamonn Maguire, DPhil candidate (senior developer)
Pavlos Georgiou, MSc candidate (developer)
and new team member to be recruited
Page 2
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
user community
Focus on the experimental context and compliance to standards
Page 3
Rationale for developing ISA
Researchers and bioinformaticians in both
academic and commercial arenas, along with
funding agencies and publishers, embrace
the concept that community-developed
standards are pivotal to structure and enrich
the annotation of
• entities of interest (e.g., genes,
metabolites, phenotypes) and
• experimental steps (e.g., provenance of
study materials, technology and
measurement types)
Page 4
Rationale for developing ISA
Capture all salient features of
the experimental workflow
Make annotation explicit and
discoverable
Support data provenance
tracking
Use community standards
Page 5
Rationale for developing ISA
transcriptomics proteomics genomics
Page 6
A wealth of community, different norms and standards, e.g.:
report the same core,
essential information
use the same word and
refer to the same ‘thing’ allow data to flow from
one system to another
To track provenance of the information and ensure richness of data and experimental
metadata descriptions, to maximize sharing and reusability
Key challenges:
lack of coordination, fragmentation and uneven coverage
Page 7
Technologically-delineated
views of the world
Biologically-delineated
views of the world
Generic features (‘common core’) - description of source biomaterial
- experimental design components
Arrays
Scanning Arrays & Scanning
Columns
Gels
MS MS
FTIR
NMR
Columns
transcriptomics proteomics
metabolomics
plant biology epidemiology
microbiology
To compare and integrate data we need interoperable standards
Page 8
See more at:
+ 130
Estim
ate
d
+ 150
So
urc
e: M
IBB
I,
EQ
UA
TO
R
+ 303
So
urc
e: B
ioP
orta
l
MIAME
MIAPA
MIRIAM
MIQAS MIX
MIGEN
CIMR MIAPE
MIASE
MIQE
MISFISHIE….
REMARK
CONSORT
MAGE-Tab
GCDML
SRAxml SOFT
FASTA DICOM
MzML
SBRML
SEDML…
GELML
CML
MITAB
AAO
CHEBI
OBI
PATO ENVO
MOD
BTO
IDO…
TEDDY
PRO
XAO
DO
VO
Mapping the landscape of standards, work in progress
…. …. …. ….
…. ….
Databases, annotation,
curation tools
Page 10
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
user community
Dealing with fragmented standards for the experimental context
Page 11
General-purpose, configurable format, designed to
support:
- several omics standards checklists, terminologies
- reference to CDISC SDTM file(s), and
- conversions to (a growing number of) other metadata
formats, used by public repositories
Page 13
1
Create template(s) to fit the type of
experiments to be described
Create templates detailing the steps to be
reported for different investigations, complying
to community standards, e.g. configuring the
value(s) allowed for each field to be
• text (with/without regular expression testing),
• ontology terms,
• numbers etc.
We now have configurations for submission
to EBI repositories, complying to several
community standards.
Page 14
1
Or describe, curate your experiment using a
desktop-based tool
Report and edit the description using this tool,
(also customized using the templates) with a
spreadsheet like look and feel, packed with
functionalities such as
• ontology search (access via )
• term-tagging features
• import from spreadsheets etc…
Page 16
1
Describe, curate your experiment with
geographically- distributed collaborators
Report and edit the description of the
investigation using customized Google
Spreadsheets (importing the ‘template’ created
by the ISA configurator) enabled with ontology
search and term-tagging features.
Page 23
• New open-access, online-only publication for descriptions of scientifically valuable datasets
• Only content type: Data Descriptor, narrative + structured parts
• Initially focused on the life, environmental and biomedical sciences
• Data Descriptor will be complementary to traditional research journals and data repositories
• Designed to foster data sharing and reuse, and ultimately to accelerate scientific discovery
www.nature.com/scientificdata
Page 24
A grass-root collaborative that works to facilitate collection, curation and
sharing of experiments using a common, structured representation of the
experiments that
• transcends individual biological and technological domains and
• can be ‘configured’ to implement (several of) the community standards
Page 25
A grass-root collaborative that works to facilitate collection, curation and
sharing of experiments using a common, structured representation of the
experiments that
• transcends individual biological and technological domains and
• can be ‘configured’ to implement (several of) the community standards
proteomics
stem cell discovery
system biology
transcriptomics
toxicogenomics
environmental health
genomics
metabolomics
metagenomics
nanotechnology
Page 26
Community involvement and uptake
Core developments
2008 2009 2010
1st ISA-Tab
workshop 3rd ISA-Tab
workshop 2nd ISA-Tab
workshop
Final ISA-Tab spec Database instance
at EBI
ISA software v1
2011
1st public instance:
Harvard Stem Cell
Discovery Engine
RDF/OWL format starts
Conversions to
Pride-XML/SRA-XML/
MAGE-Tab
User workshops/visits - start
Growing number of
systems starts to adopt
ISA framework
Publications
2007 2012
Straw man
ISA-Tab spec
Other tools
implement ISA-Tab
Links to
analysis tools
starts
2013
Bioinformatics
The ISA software suite:
supporting standards-
compliant curation at the
community level
Bioinformatics
OntoMaton: a Bioportal
powered ontology widget for
Google Spreadsheets.
Woodhead Publishing
ISA chapter in : Open Source
Software in Life Science
Research
Page 27
Nanotechnology
Informatics Working Group
2012
2012
2013
2013
2013