Archives

The Norway spruce genome sequence and conifer genome evolution

The Norway spruce genome sequence and conifer genome evolution

文章地址: Nature   http://www.nature.com/nature/journal/vaop/ncurrent/full/nature12211.html

Supplementary Material

Haploid whole genome shotgun sequencing and assembly

Seeds from the P. abies clone Z4006 were stored at -20° C at Skogforsk Sävar, then soaked in water overnight, and manually dissected under a microscope to free the haploid megagametophyte tissue. Total DNA was isolated from a single, dissected megagametophyte (seed identification: 466) using a DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s protocols. Shotgun Illumina paired-end (PE) libraries with insert sizes of 180 bp, 300 bp and 625 bp were made from a total of 600 ng haploid DNA preparation (Supplementary Table 1.2), and sequenced on an Illumina HiSeq 2000. The reads were trimmed based on their quality scores, as follows: each read was cut off at the first base (from the 5’ end) that had Q<10, and the remaining read was kept only if it had a length of >50 bp and >95% of the bases had Q>20. Overlapping reads from the 180 bp library were aligned to form longer single-end reads using a custom tool provided by CLCbio (“join-pairs”; CLCbio, Aarhus, Denmark). All quality filtered reads (760 Gbp; ~38X) were assembled using CLC Assembly Cell (Beta-4.0.6)(CLCbio, Aarhus, Denmark), on a 2TB RAM computer in ~5 days, with the scaffolding option turned off, thus utilizing paired read information, but disallowing scaffolding where the sequence between read pairs could not be fully resolved. Due to the low quantity of DNA used for library construction, the haploid libraries approached saturation, resulting in some read redundancy (in particular for the 625 bp library).

Merge PE reads 这里是一个值得探讨的地方:

“reads from the 180 bp library were aligned to form longer single-end reads using a custom tool provided by CLCbio (“join-pairs”; CLCbio, Aarhus, Denmark).   http://www.clcsupport.com/clcgenomicsworkbench/650/index.php?manual=Merge_overlapping_pairs.html

join-pairs 实现过程中考虑了一下几个参数(括号是默认值):Mismatch cost(2) ,Gap cost(3),Max unaligned(0),Minimum score(10)

Related Tools:

  1. AllPath-LG , High quality genome assembly from low cost data.  http://www.broadinstitute.org/software/allpaths-lg/blog/
  2. Stitch , Join overlapping paired-end Illumina reads. https://github.com/audy/stitch
  3. fastq-join , Merge overlapping paired-end reads. http://code.google.com/p/ea-utils/wiki/FastqJoin
  4. PANDAseq , Paired-end assembler for illumina sequences. https://github.com/neufeld/pandaseq
  5. FLASH , A fast accurate software to increase the length of reads by overlapping and merging mate pairs. http://ccb.jhu.edu/software/FLASH/
  6. COPEread , Connecting Overlapped Pair-End Reads.  http://sourceforge.net/projects/coperead/

Related Post:
Tools to merge overlapping paired-end reads. http://thegenomefactory.blogspot.com/2012/11/tools-to-merge-overlapping-paired-end.html

继续补充!

Leave a Reply

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>