IDBA-MT: 元转录组数据拼装工具


IDBA-MT: De Novo Assembler for Metatranscriptomic Data Generated from Next-Generation Sequencing Technology


High-throughput next-generation sequencing technology provides a great opportunity for analyzing metatranscriptomic data. However, the reads produced by these technologies are short and an assembling step is required to combine the short reads into longer contigs. As there are many repeat patterns in mRNAs from different genomes and the abundance ratio of mRNAs in a sample varies a lot, existing assemblers for genomic data, transcriptomic data, and metagenomic data do not work on metatranscriptomic data and produce chimeric contigs, that is, incorrect contigs formed by merging multiple mRNA sequences. To our best knowledge, there is no assembler designed for metatranscriptomic data. In this article, we introduce an assembler called IDBA-MT, which is designed for assembling reads from metatranscriptomic data. IDBA-MT produces much fewer chimeric contigs (reduce by 50% or more) when compared with existing assemblers such as Oases, IDBA-UD, and Trinity.




git clone
#edit: idba_mt/idba_mtp libheader.h, add  #include <stdint.h>


metatranscriptome 的拼装软件不是很多,一般都是使用老牌的转录组拼装软件,比如 Trinity 、Oasos 等, 可以参考一些测评文章 : Comparison of assembly algorithms for improving rate of metatranscriptomic functional annotation 也有一些直接使用DNA的拼装软件,比如 IDBA-UDMetavelvet ,针对 metatranscriptome 的拼装的最大问题就是嵌合体问题,所以针对metatranscriptome的组装软件都在尝试解决这些问题,有的需要使用 Paired-End 序列 IDBA-MT , 也的需要辅助蛋白质序列,比如 IDBA-MTP

IDBA-MT 的软件包托管在 Google Code ,已经将其导入到了 Github页面, 方便下载使用, IDBA-MT 需要先使用 IDBA-UD 完成组装 在使用 IDBA-MT纠正一些嵌合体序列。



