Archives

IDBA-UD: 针对单细胞以及元基因组的序列组装软件

标题:

IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth

摘要:

Motivation: Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. However, both technologies suffer from the problem that sequencing depth of different regions of a genome or genomes from different species are highly uneven. Most existing genome assemblers usually have an assumption that sequencing depths are even. These assemblers fail to construct correct long contigs.

Results: We introduce the IDBA-UD algorithm that is based on the de Bruijn graph approach for assembling reads from single-cell sequencing or metagenomic sequencing technologies with uneven sequencing depths. Several non-trivial techniques have been employed to tackle the problems. Instead of using a simple threshold, we use multiple depthrelative thresholds to remove erroneous k-mers in both low-depth and high-depth regions. The technique of local assembly with paired-end information is used to solve the branch problem of low-depth short repeat regions. To speed up the process, an error correction step is conducted to correct reads of high-depth regions that can be aligned to highconfident contigs. Comparison of the performances of IDBA-UD and existing assemblers (Velvet, Velvet-SC, SOAPdenovo and Meta-IDBA) for different datasets, shows that IDBA-UD can reconstruct longer contigs with higher accuracy.

Availability: The IDBA-UD toolkit is available at our website http://www.cs.hku.hk/~alse/idba_ud

Contact: chin@cs.hku.hk

© The Author 2012. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

地址:

http://bioinformatics.oxfordjournals.org/content/28/11/1420.abstract

源码:

https://github.com/loneknightpy/idba

安装:

axel https://github.com/loneknightpy/idba/archive/1.1.3.tar.gz
tar xzvf idba-1.1.3.tar.gz
cd idba-1.1.3
./build.sh

导读:

IDBA-UD 是一款单细胞基因组或者元基因组从头拼装软件,默认参数 short reads 只能设置 128碱基,当前主流高通量测序平台 Hiseq X Ten / Hiseq 3000/4000 测序长度 150bp, 因此需要修改源代码使得可以执行 大于 128bp短序列拼装,修改方式见: https://groups.google.com/forum/#!topic/hku-idba/GL-1VZnhLI0, 如果看不到,见下面工作组内容:

Hi,

I’ve started a new thread for this in case anyone wants to do the same thing. I wanted idba_ud to run with larger kmers. It seems to work pretty well.

I changed /idba-1.1.2/src/sequence/short_sequence.h to longer kMaxShortSequence to:

static const uint32_t kMaxShortSequence = 500;

and changed the max kmer size in idba-1.1.2/src/basic/kmer.h byt changing the number of bits to:

static const uint32_t kNumUint64 = 16;

Then recompiled:

./configure make

Now IDBA is working with my 300 bp paired end illumina data with kmers of 100, 200 and 300. Assembly looks much better so far than it was with kmers limited to 124 bp. I don’t really understand why assemblers limit kmer size, but I’m not a mathematician. The highest kmer size always seems to give the ‘best’ assembly. If you run into a memory problem it might be worth limiting the number of threads used, although I haven’t tested this, when I use fewer than maximum, it works.

T.

IDBA家族的几个应用都在这个包里, 比如 idba/idba-trans/idba_hybrid等,而且如果执行 metatranscriptome 拼装,可以使用 idba-mt 以及 idba-mtp 等, 后面会在另外的一个帖子提及。

metagenome 数据拼装除了 IDBA-UD,还有 MetavelvetOmega 等, 另外一个对手是 MEGAHIT 被广泛用于 metagenome 数据组装。

版本:

2016-11-24.v1

2 comments to IDBA-UD: 针对单细胞以及元基因组的序列组装软件

Leave a Reply

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>