Archives

Categories

RAPSearch2: 快速、高效 NGS reads 序列比对工具,无碰撞哈希表实现蛋白质序列库索引

标题:

RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data

摘要:

Summary: With the wide application of next-generation sequencing (NGS) techniques, fast tools for protein similarity search that scale well to large query datasets and large databases are highly desirable. In a previous work, we developed RAPSearch, an algorithm that achieved a ~20–90-fold speedup relative to BLAST while still achieving similar levels of sensitivity for short protein fragments derived from NGS data. RAPSearch, however, requires a substantial memory footprint to identify alignment seeds, due to its use of a suffix array data structure. Here we present RAPSearch2, a new memory-efficient implementation of the RAPSearch algorithm that uses a collision-free hash table to index a similarity search database. The utilization of an optimized data structure further speeds up the similarity search—another 2–3 times. We also implemented multi-threading in RAPSearch2, and the multi-thread modes achieve significant acceleration (e.g. 3.5X for 4-thread mode). RAPSearch2 requires up to 2G memory when running in single thread mode, or up to 3.5G memory when running in 4-thread mode.

Availability and implementation: Implemented in C++, the source code is freely available for download at the RAPSearch2 website: http://omics.informatics.indiana.edu/mg/RAPSearch2/.

Contact: yye@indiana.edu

Supplementary information: Available at the RAPSearch2 website.

地址:

http://bioinformatics.oxfordjournals.org/content/28/1/125.abstract

源码:

http://omics.informatics.indiana.edu/mg/RAPSearch2/
https://github.com/zhaoyanswill/RAPSearch2 非最新版本
https://sourceforge.net/projects/rapsearch2/files/ 最新版本

安装:

axel https://sourceforge.net/projects/rapsearch2/files/RAPSearch2.24_64bits.tar.gz/download
tar xzvf  RAPSearch2.24_64bits.tar.gz
mv  RAPSearch2.24_64bits  RAPSearch-2.24

备注:

Zhang, X. (2013). A New Module in RAPSearch2 for Fast Protein Similarity Search of Paired-end Sequences.

导读:

RAPSearch 的升级版, RAPSearch2 改变了 RAPSearch 算法实现,由先前的suffix array 数据结构变更了collision-free hash table 对库做索引,进一步降低了内存使用情况, 从使用情况看,还是没有 Diamond 等后期新秀速度快,另外RAPSearch2实现了一个功能模块支持PEreads序列。

版本:

2016-12-11.v1

Leave a Reply

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>