RAPSearch2: 快速、高效 NGS reads 序列比对工具,无碰撞哈希表实现蛋白质序列库索引


RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data


Summary: With the wide application of next-generation sequencing (NGS) techniques, fast tools for protein similarity search that scale well to large query datasets and large databases are highly desirable. In a previous work, we developed RAPSearch, an algorithm that achieved a ~20–90-fold speedup relative to BLAST while still achieving similar levels of sensitivity for short protein fragments derived from NGS data. RAPSearch, however, requires a substantial memory footprint to identify alignment seeds, due to its use of a suffix array data structure. Here we present RAPSearch2, a new memory-efficient implementation of the RAPSearch algorithm that uses a collision-free hash table to index a similarity search database. The utilization of an optimized data structure further speeds up the similarity search—another 2–3 times. We also implemented multi-threading in RAPSearch2, and the multi-thread modes achieve significant acceleration (e.g. 3.5X for 4-thread mode). RAPSearch2 requires up to 2G memory when running in single thread mode, or up to 3.5G memory when running in 4-thread mode.

Availability and implementation: Implemented in C++, the source code is freely available for download at the RAPSearch2 website:


Supplementary information: Available at the RAPSearch2 website.


源码: 非最新版本 最新版本


tar xzvf  RAPSearch2.24_64bits.tar.gz
mv  RAPSearch2.24_64bits  RAPSearch-2.24


Zhang, X. (2013). A New Module in RAPSearch2 for Fast Protein Similarity Search of Paired-end Sequences.


RAPSearch 的升级版, RAPSearch2 改变了 RAPSearch 算法实现,由先前的suffix array 数据结构变更了collision-free hash table 对库做索引,进一步降低了内存使用情况, 从使用情况看,还是没有 Diamond 等后期新秀速度快,另外RAPSearch2实现了一个功能模块支持PEreads序列。



127 comments to RAPSearch2: 快速、高效 NGS reads 序列比对工具,无碰撞哈希表实现蛋白质序列库索引

Leave a Reply

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>