caNanoLab Data Curation Overview
NCI Nano WGJune 6, 2013
Data Curation Procedures
2
Publication Identification
Data Extraction
caNanoLab Submission
ISA-TAB-Nano Creation
Author Notification
Data Publication
Publication Identification
• NCI Nanotechnology Alliance representatives identify publications based on criteria for curation:1. Publication is meaningful to the cancer nanotechnology
field (cutting-edge science)
2. Associated meaningful data is available in the publication –or- from the investigator
3. Data is complete (e.g. contains material composition details and linkage information)
1. NCI Nanotechnology Alliance representatives prioritize list of identified publications
3
Data Extraction• The curator reviews the prioritized publication and establishes
the number of samples, characterizations, and available data and figures
• Sample names are created following the established sample naming convention:– Abbreviation(s) of: institution names - name of the first author
(without middle name), journal title, year of publication - and sample sequence number (e.g. SNL_UNM-CAshleyACSNano2012-01).
• Information on the association of samples and characterizations is maintained in a text file
• Definitions are established for new terms and recorded, if applicable
• Questions and any issues (e.g. discrepancies) are identified for future correspondence with the publication author
4
Example Data Extraction
1 Plain mesoporous silica nanoparticle TEM, SEM, DLS, nitrogen sorption, zeta potential
2 AEPTMS modified silica nanoparticle zeta potential
3 AEPTMS modified silica nanoparticle loaded with Silencer Select negative control siRNA, cytotoxicity
4 DOPC protocells (AEPTMS modified) loaded with Silencer Select negative control siRNA cytotoxicity
5 DOTAP lipid nanoparticle loaded with Silencer Select negative control siRNA cytotoxicity
6 DOPC protocell (unmodified core no AEPTMS) siRNA concentration
7 DOTAP protocell (unmodified core no AEPTMS) siRNA concentration
8 DOPC protocell (AETMS modified core) siRNA concentration, siRNA release, size, zeta potential
9 DOPC lipid nanoparticle siRNA concentration, siRNA release, size, zeta potential
10 DOTAP lipid nanoparticle siRNA concentration, zeta potential, siRNA release, size
5
caNanoLab Submission
6
caNanoLab Submission Workflow
Sample Submission
7
General Sample Information
Sample Composition Submission
8
Sample Constituents
Chemical Associations
Functionalizing Entities
Characterization Submission
9
Characterization Information and Findings
Publication Submission
10
Publication Information with PubMed I/F
ISA-TAB-Nano Creation• The curator creates the Investigation File and identifies
applicable ontologies, and associated studies, protocols, and assays
• The curator creates a Material File for each sample in the investigation– The Material File represents the composition of the sample
• The curator creates Study Files for each identified study– The Study File associates samples with the study
• Details of biospecimens are included in the Study File• References to nanomaterials are included in the Study File
– For studies involving physico-Chemical characterizations, the sample is the nanoparticle
– For studies involving in vitro or in vivo characterizations, the sample is the biospecimen (e.g. cell line, animal) and the nanoparticle is the study factor (e.g. treatment)
• The curator creates Assay Files for each identified assay
11
Author Notification
• The publication author is contacted, when possible, to obtain additional data and/or clarification on questions or discrepancies
• The caNanoLab data is updated based on author feedback or additional information
• The ISA-TAB-Nano files are updated based on author feedback or additional information
12
Data Publication
• Once the sample submission into caNanoLab has been finalized, the curator generates the data availability matrix and makes the data available for public viewing in caNanoLab
• The curator posts the completed ISA-TAB-Nano Files to the ISA-TAB-Nano Wiki
13
Data Availability Matrix
Sample Access
Data Curation StatisticsNanomaterial Type # Description
biopolymer 43 a polymer formed by a living organism
carbon black 2 a material produced by the incomplete combustion of carbon-rich organic fuels in low oxygen conditions
carbon nanotube 50 a nanotube comprised of one or more graphite sheets (graphene) of hexagonal arrays of carbon rolled into seamless cylinders with capped ends
carbon particle 1 an amorphous nanopowder formed by laser techniques
dendrimer 74 a polymeric molecule that has a highly-branched, three-dimensional tree-like architecture, synthesized with monomers where shells of branched molecules are added in discrete steps to a central core
emulsion 88 a colloid in which both phases are liquids that are immiscible with each other
fullerene 16 any cagelike, hollow molecule composed of hexagonal and/or pentagonal groups of carbon atoms
liposome 34 a supramolecular structure, which is a closed vesicle that forms on hydration of dry phospholipids above its transition temperature
metal oxide 186 a nanomaterial composed of a metal oxide
metal 132 a nanomaterial composed of a metal
metalloid 36 a nanomaterial with properties between a metal or non-metal
nanohorn 7 a single-walled carbon nanostructure with an irregular horn-like shape
nanorod 33 a nanoscale rod composed of either metallic or semiconductor material or a mixture of both
nanoshell 1 a three-dimensional nanostructure that is composed of a spherical core surrounded by a few nanometers in thickness. If the shell is made of metal, then it is called a metallic nanoshell.
polymer 188 a nanomaterial composed of single or multiple monomers
quantum dot 73 a nanometer size fragment of semiconductor material, whose excitons (electron-hole pairs) are confined in three spatial dimensions
silica 43 a nanomaterial composed of a silicon oxide
14
caNanoLab: data sharing to expedite the use of nanotechnology in biomedicine Nanotechnology Informatics Special Edition 2013 (Submitted)
Data Curation Challenges and Opportunities• Challenges
– Making primary data supporting publications available to and re-usable by the research community
– Inefficiencies associated with manual data curation from publications
• Opportunities– Emphasize policies and resources that promote and incentivize
standards-based data capture directly by the data producers– Participate in efforts that encourage primary data sharing in the
scientific community (e.g., http://www.fged.org, http://www.force11.org/, http://biosharing.org/) and adopt and support the best practices of these communities
– Work together with the ISA community (http://isacommons.org/) to extend the ISA Tools software suite to support the nanotechnology data extensions to ISA-TAB (ISA-TAB-Nano) and make it easier to share nanotechnology data among different data resources in a standards based manner
15
caNanoLab: data sharing to expedite the use of nanotechnology in biomedicine Nanotechnology Informatics Special Edition 2013 (Submitted)
References
• caNanoLab References– Application: https://cananolab.nci.nih.gov– Wiki:
https://wiki.nci.nih.gov/display/caNanoLab/caNanoLab+Wiki+Home+Page
• ISA-TAB-Nano References– Wiki: https://wiki.nci.nih.gov/display/ICR/ISA-TAB-Nano
• Publication (Submitted)– Gaheen S, Hinkal GW, Morris SA, Lijowski M, Heiskanen,
M, Klemm J. caNanoLab: data sharing to expedite the use of nanotechnology in biomedicine Nanotechnology Informatics Special Edition 2013
16