Archives

eggNOG-mapper: COG/GO 功能注释新方案

标题:

Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper

摘要:

Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines relatively inaccessible, less precise homology-based functional transfer is still the default for (meta-)genome annotation. We therefore developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from eggNOG. To validate our method, we benchmarked Gene Ontology predictions against two widely used homology-based approaches: BLAST and InterProScan. Compared to BLAST, eggNOG-mapper reduced by 7% the rate of false positive assignments, and increased by 19% the ratio of curated terms recovered over all terms assigned per protein. Compared to InterProScan, eggNOG-mapper achieved similar proteome coverage and precision, while predicting on average 32 more terms per protein and increasing by 26% the rate of curated terms recovered over total term assignments per protein. Through strict orthology assignments, eggNOG-mapper further renders more specific annotations than possible from domain similarity only (e.g. predicting gene family names). eggNOG-mapper runs ~15x than BLAST and at least 2.5x faster than InterProScan. The tool is available standalone or as an online service at http://eggnog-mapper.embl.de.

文章:

http://biorxiv.org/content/early/2016/09/22/076331

源码:

https://github.com/jhcepas/eggnog-mapper

安装:

axel https://github.com/jhcepas/eggnog-mapper/archive/0.12.7.tar.gz
tar xzvf eggnog-mapper-0.12.7.tar.g

摘要:

eggNOG 被广泛用于基因功能注释,提供了NCBI COG 提供的广泛的扩展,现在基于最新版本 eggNOG-4.5 开发了 GO 的功能注释可选防线, 先前主要是基于BLAST2GO、Interproscan,可以使用序列谱相似性搜索或者直接序列相似性搜索,可以用来注释基因组/转录组以及metagenome, 在线服务网站: http://beta-eggnogdb.embl.de/#/app/emapper, 本地话安装也不是很困难,但是和Interproscan一样,很耗磁盘空间。

eggNOG注释文件就接近20G, eggNOG序列文件也接近 20G, 另外还有接近130G的HMM序列谱文件,内存消耗也是相当可观: 真核数据库需要90G内存, 细菌数据库需要32G内存,古菌序列搜索需要 10G,使用PCIe-SSD也许是一种解决低内存的方案, 安装和使用也比较简单,待本帖更新。

eggNOG-mapper

版本:

2016-11-12.v1

Comments are closed.