1 International Human Microbiome Standards Grant Agreement: HEALTH-F4-2010-261376 DELIVERABLE REPORT Work package WP3 – Improved standards for sequencing Work package leader Partner 5 – CEA Genoscope Deliverable D3.2 – Improved standards for sequencing Delivery date* 01/08/2013 Dissemination level** PU (Public) * Please refer to IHMS Calendar on IHMS intranet * *Please highlight the dissemination level appropriate for the deliverable. You can find the corresponding information in the IHMS Calendar Summary report From January 2012 until now, we have received some other 22 DNA extractions from faecal samples from INRA partner and 217 DNA extractions from the other partners. All the samples have been treated according to our validated pipeline which includes: i) sample quality control at arrival; 2) Illumina sequencing library preparation from samples which passed the QC, by applying our standardized protocol; iii) 100 bp lenght paired end sequencing of each library; iv) sequence quality control and validation; v) data delivery to partner 7. In order to help in the establishment of standards for faecal sample extraction protocol, a particular attention has been paid to the check of quality of the DNA samples. In this report we will describe the analysis applied to sample QC and the exclusion criteria used. All the INRA samples passed the QC and were sequenced. Of the other 217 samples, 192 passed the QC and were sequenced. All sequencing data have been transferred to partner 7 and analyses are under progress.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
International Human Microbiome
Standards
Grant Agreement: HEALTH-F4-2010-261376
DELIVERABLE REPORT
Work package WP3 – Improved standards for sequencing
Work package leader Partner 5 – CEA Genoscope
Deliverable D3.2 – Improved standards for sequencing
Delivery date* 01/08/2013
Dissemination level** PU (Public)
* Please refer to IHMS Calendar on IHMS intranet
* *Please highlight the dissemination level appropriate for the deliverable. You can find the
corresponding information in the IHMS Calendar
Summary report
From January 2012 until now, we have received some other 22 DNA extractions from faecal
samples from INRA partner and 217 DNA extractions from the other partners. All the samples
have been treated according to our validated pipeline which includes: i) sample quality control at
arrival; 2) Illumina sequencing library preparation from samples which passed the QC, by
applying our standardized protocol; iii) 100 bp lenght paired end sequencing of each library; iv)
sequence quality control and validation; v) data delivery to partner 7.
In order to help in the establishment of standards for faecal sample extraction protocol, a
particular attention has been paid to the check of quality of the DNA samples. In this report we will
describe the analysis applied to sample QC and the exclusion criteria used. All the INRA samples
passed the QC and were sequenced. Of the other 217 samples, 192 passed the QC and were
sequenced. All sequencing data have been transferred to partner 7 and analyses are under
progress.
2
sd3.2.1 – Improved inventory of standards for genomic sequencing
sd3.2.2 – Improved standards and recommendation for metagenomic long contiguous reference
sequence
In the period January 2012 – January 2013, the INRA partner sent to Genoscope 239 DNA
extractions. The INRA partner extracted 22 of them by using the same protocol applied for
extraction of the 20 samples previously processed. The other 217 samples were extracted by the
other IHMS partner starting from the same two faecal samples aliquots (A and B) by using their
own extraction protocols.
Upon arrival at Genoscope, all the samples were recorded in our LIMS system for internal follow
up at any stage of the processing. They were stored at –20°C until processing according to our
well established and standardized pipeline described below.
defines stopping points : the
experiment must fill some well defined
criteria, otherwise it is stopped
3
i) Sample quality control
Our standardized protocol for genomic DNA quality control was initially applied on all the samples.
A SOP for genomic DNA QC is described in appendix 1. We recommend to the laboratory where
extractions are performed to use this protocol in order to evaluate DNA quality.
Briefly, the protocol includes two steps:
- Quantity evaluation: quantification by two independent measures by Qbit BR Assay kit is
performed. A mean concentration is calculated. For library preparation protocol established for
IHMS project, 250 ng input DNA are required. In our standard procedure, if total DNA quantity is
less than 500 ng (2 fold the minimal quantity), the sample is not valid and the QC ends at this
stage. In the context of this project, we have decided to check the quality also of samples with an
insufficient quantity (<250 ng) to perform the library.
- Quality evaluation: samples are loaded on a 0,4 % agarose gel and migration is performed at
100V during one hour. A photo is taken and quality of DNA is visually checked. If RNA
contamination is present, an RNAse treatment is applied to the sample, after which the sample
repeats the QC from the beginning. DNA integrity is visually checked. For standard paired end
library preparation, DNA passes the QC if the majority of the DNA is located on a tight band at high
molecular weight. Anyway, we wanted to check the IHMS samples quality much more carefully in
order to produce the most of information about DNA quality. This should be helpful in order to
evaluate the different extraction protocols used by the IHMS partner and to establish a
standardized protocol to produce good quality DNA. For this aim, we took advantage of the
availability in our laboratory of a gel image analysis system (GeneTools, Syngene) which is able to
calculate the % of DNA present at different size ranges selected by the user. Based on the size of
the DNA ladder bands as reference, we have chosen four size ranges: > 9 kb, between 9 and 5
kb, between 5 and 1,8kb and < 1,8kb. We have manually delimited these size regions on each gel
image and the analysis system has calculated the % of DNA for each region. We have combined
the results of the software analysis with our visual interpretation of the images and finally we have
established four DNA quality categories:
Qualitative classification colour code
Group 1
Very good quality DNA.
Optimal for sequencing Group 2
Majority of high molecular weight DNA.
Good for sequencing
Group 3
Presence of degraded DNA mostly > 1.8kb.
Acceptable for standard PE sequencing (not for MatePair)
Group 4
Presence of degraded DNA with most fragments < 1.8kb.
Not suitable for standard sequencing.
Group 5
Totally degraded DNA.
Not acceptable for sequencing
4
Here is an example of the QC control on a subset of 10 IHMS samples. For each sample, a pure
1µl aliquot and a 1:10 diluted aliquot have been loaded on the agarose gel.
Ge
no
sc
op
e
ID
Sa
mp
le ID
Qualitative analysis Quantitative analysis
Validation
decision
% DNA
> 9 kb
% DNA
5-9 kb
% DNA
1,8-5kb
% DNA
<1,8kb
RNA
cont
Qualitative
classification
Reported
volume
(ul)
Reported
quantity
(ng)
Measure
d
volume
(µl)
Measured
quantity
(ng)
ES
A1-
002 87,57 7,50 1,44 3,50 -
50 19177 49,50 9356 Valid
ET
A1-
052 91,96 5,52 0,38 2,14 -
50 15318 50,40 9097 Valid
EV
A1-
102 86,52 9,96 1,20 2,32 -
50 13065 55,40 8487 Valid
FA
A1-
152 88,05 9,18 1,46 1,31 -
50 18372 50,20 14427 Valid
FB
B1-
002 75,43 22,03 1,13 1,41 -
50 13983 48,00 8112 Valid
FC
B1-
052 75,77 21,04 0,05 3,14 -
50 22250 47,00 15566 Valid
FD
B1-
102 70,30 29,09 0,00 0,61 -
50 17150 51,80 13126 Valid
FE
B1-
152 1,54 21,53 46,52 30,41 -
50 50272 60,50 19481 Valid
FF
C1-
002 1,19 1,21 1,72 95,88 -
50 15649 50,00 988 Invalid
FG
C1-
022 0,29 0,71 2,82 96,18 -
50 16398 51,50 883 Invalid
5
Based on this classification, all INRA samples were classified in the first group. The following table
resumes the classification results for the remaining 217 samples:
Very good quality DNA. Optimal for sequencing
76
Majority of high molecular weight DNA. Good for sequencing
56
Presence of degraded DNA mostly > 1.8kb. Acceptable for standard PE sequencing (not for MatePair)
45
Presence of degraded DNA with most fragments < 1.8kb. Not suitable for standard sequencing.
19
Totally degraded DNA. Not acceptable for sequencing
21
Total sequenced libraries 192
Total invalid samples (including 4 samples with good quality but insufficient quantity)
25
Even if, according to our QC criteria for sample exclusion, samples classed in the Group 4 should
not have been processed further, we decided in agreement with the project coordinator, to process
them anyway in order to establish if the low DNA quality will affect library preparation and
sequence data results.
Finally, of 217 samples analysed by this way at QC stage, 192 were considered valid and were
then used to prepare libraries.
ii) Illumina library preparation and QC
Library preparation was performed according to the protocol described in the D3.1 report. A SOP
for library preparation is included in Appendix 2.
All the samples were successfully processed.
iii) Sequencing
Each indexed library was sequenced on one eight fraction of an Illumina HiSeq2000 lane in order
to obtain at least 20 millions reads/sample. Standard Illumina operating procedures have been
followed for cluster generation and sequencing run.
iv) Data QC
Raw fastq files sorting from the sequencer are treated by the Genoscope internal pipeline
schematized below
6
First of all, a read quality check is performed on a subsample of the reads, in order to detect
possible biases in the library construction or sequencing problems. After manual validation of the
sequencing run, the whole reads dataset is treated for removal of adapters and low quality
nucleotides from both ends (low quality threshold is fixed at 20). The cleaned reads (fastx_clean)
continue next steps which include: i) removal of sequences between the second unknown
nucleotide (N) and the end of the read; ii) discarding of reads shorter than 30 nucleotides after
trimming; iii) removal of reads and their mates that mapped onto run quality control sequences
(PhiX genome) with at max 2 mismatches. QC charts and contamination screening are then
performed on a clean reads subsetset.
Raw Fastq
checkReadsQuality
20000
reads
Adaptors
fastx_clean
Cleaned Fastq
decontamFastq
checkContamination
checkReadsQuality
• Composition biais
• N Distribution
• Quality
• Primer search
• Adaptors < 0.5
• Quality >20
• N < 2
• length >= 30
• Phix
• Other …
20000
reads
7
APPENDIX 1 : Genomic DNA QC using standard electrophoresis
Summary
This protocol describes how to evaluate the quality and quantity of genomic DNA samples using
run a standard agarose gel as well as Qubit™ fluorometer
Reagents and consumables
Reagent / consommable Supplier
Seakem Agarose Biorad
50x TBE buffer Biorad
SYBR® Safe DNA gel stain (10,000X concentrate in DMSO) Invitrogen
5x loading dye General lab supplier
RNAse A 100 mg/ml Qiagen
0.1x TE buffer General lab supplier
Resuspension buffer (10mM TrisHCl, pH 7,5) General lab supplier
Agilent DNA HS kit Agilent
Quant-iTTM
dsDNA BR assay kit Life Technologies
DNA molecular weight marker II (0,1 – 23 kb) Roche
Equipment
Equipment Supplier
Mini horizontal device 15-wells combs Biorad
Mini horizontal Gel electrophoresis device with 7x10
cm tray Biorad
Gel imager system Different lab suppliers
Qubit™ fluorometer 1.0 or 2.0 Life Technologies
8
Procedure
Upon arrival, store the sample at –20 °C until use.
STEP 1: gDNA quantification using Qbit™ fluorometer
Use the Quant-iTTM
dsDNA BR assay kit following the manufacturer instructions for use of the
kit and the Qbit fluorometer. Perform two independent measurements using 1 µl of the DNA
sample for each measure. Calculate the mean concentration in ng/µl.
STEP2: gDNA integrity check by agarose gel electrophoresis
All reagents and stock solution should be prepared prior to the start of the procedure.
Gel & Sample Preparation
a) Cast a ~40ml 0,6% Seakem agarose gel with 1X TBE and 10 µl SYBR® Safe DNA gel stain
(10,000X concentrate in DMSO). Use a narrow well comb.
b) For each sample to be tested prepare two clean labeled tubes
Tube 1: transfer 1µl DNA and complete with 5 µl H2O and 2µl 5x loading dye
Tube 2: prepare a 1:10 dilution of the initial sample in TE buffer and use 1 µl of the dilution.
Complete with 5 µl H2O + 2µl 5x loading dye
Gel Electrophoresis a) Load the gel by leaving an empty well between two samples. Load 100-150 ng of the DNA
molecular weight marker II in the two wells located on the left and right edgex of the gel
b) Run gel for 30 min at ~100V in 1X TBE buffer.
c) Remove gel from gel box and image.
This first image capture allows to better evaluate the presence of RNA contamination
d) Return gel to gel box and run again for 30 min at 100V
e) Remove gel from gel box and image
DNA QC Gel Analysis
Evaluate genomic DNA integrity and RNA contamination
a) RNA contamination
If RNA is massively present in the sample (visible as a cloud at < 1 kb and /or two bands at at
~ 5kb and 1,8 kb corresponding to rRNA), treat the initial sample with RNAse A: use 1 µl
RNAse A for each 100 µl sample, incubate 90 min at 37 °C and reload 1µl of the treated
sample on the gel. If RNA has disappeared, perform a new quantification by Qbit assay as
previously described. If RNA is still present, retreat sample with RNAse A.
b) DNA integrity
The majority of DNA shoud appear as a tight band > 23 kb. If a smear is present, this means
that DNA is partially degraded. If no tight high molecular weight band is visible and DNA is
present only in the smear, the degradation is massive and DNA is not suitable for sequencing.
If a quantification software system is available, refer to the software instructions analyze DNA
quality on gels.
If DNA has to be used for large long mate-pair library construction, the size of DNA needs to
be in the high molecular weight. In this case, DNA band should be above the 23kb band. It is
highly recommended to check the integrity of DNA by pulsed field electrophoresis to properly
determine the molecular weight.
9
APPENDIX 2: Library Preparation Recommendations for Illumina sequencing
of metagenomic samples
Summary
The purpose of this procedure is to generate a 180-480 bp insert size DNA library that will be
used for sequencing on the Illumina HiSeq2000 on 100 bp paired end lengths. Starting material is
500 ng genomic DNA extracted from fecal samples. Genomic DNA is broken into smaller
fragments via Covaris instrument and barcoded adapters are added so that the DNA can be
hybridized to a FlowCell before being put on the HiSeq instrument. During library preparation,
end repair, A tailing, adaptors ligation and size selection are perfomed by a semi automatized
instrument, the SPRI TE instrument supplied by Beckmann Coulter
Reagents and consumables
Reagent / consommable Supplier
6-mm × 16-mm AFA microtubes and snap caps Covaris
LoBind tubes, 1.5 mL Eppendorf
Agencourt AMPure XP beads Beckman Coulter
SPRI Works Fragment Library System I Beckmann Coulter
Platinum Pfx Taq Polymerase kit Life Technologies
0.1x TE buffer
Resuspension buffer (10mM TrisHCl, pH 7,5) General lab supplier