Archives

序列功能注释神器:eggNOG-mapper,KEGG/COG/KOG/GO/BiGG 一网打尽

一、引言

对于测序基因组进行 KEGG(Kyoto Encyclopedia of Genes and Genomes)和COG(clusters of orthologous groups,对直系同源基因进行聚类)功能注释,基本成为基因组注释的标配内容, 特别是微生物基因组基因注释,其基因功能注释逻辑基础是 直系同源基因具有相同的功能,最经典的鉴定直系同源基因策略是 BBH(bi-directional best hit)策略,但是通常最直接的直系同源基因很难鉴定,而对同源基因进行聚类并定义一个簇会是更好的策略:每一个簇会包含直系同源基因(伴随物种形成事件出现)和旁系同源基因(伴随拷贝事件出现),每一簇共享同一个功能, KO(KEGG Orholog), COG, eggNOG 等都是基于聚类的方式定义簇,并对簇进行注释。

今天要讲的是eggNOG, eggNOG的出现要从COG说开,下面看看NCBI COG的数据库主要更新历史:

从 1997 年 第一个公布版本,7个完整基因组,720个COG分类, 包含原核基因组和单细胞真核基因组(酵母),2003 年和2014 年进行了版本升级,最后只保留了细菌和古菌,包含了711个基因组以及4,631个COG分类, 26个功能分类。 2013 年构建真核分支COG(KOG, Eukaryotic orthologous groups); 2007 年构建古菌分支COG(arCOG, Archaeal Clusters of Orthologous Genes),2012 年和2014 年arCOG进一步升级,arCOG比较适合用于古菌基因组注释; 2011 年构建Phage分支COG(POG,phage orthologous groups),2013 年进行了升级;

由于计算资源需求,NCBI COG 构建了不同系统分类分支的COG簇,比如arCOG,KOG, POG等,推荐使用这些分支对新测序基因组进行注释,其实eggNOG […]

COGNIZER: metagenome 功能注释框架

标题:

COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets

摘要:

Recent advances in sequencing technologies have resulted in an unprecedented increase in the number of metagenomes that are being sequenced world-wide. Given their volume, functional annotation of metagenomic sequence datasets requires specialized computational tools/techniques. In spite of having high accuracy, existing stand-alone functional annotation […]

Genix:细菌基因组在线自动化注释流程

标题:

Genix: A New Online Automated Pipeline for Bacterial Genome Annotation

摘要:

Next-Generation Sequencing (NGS) has significantly reduced the cost of genome sequencing projects, resulting in an expressive increase in the availability of genomic data in public databases. The cheaper and easier is to sequence new genomes, the more accurate the annotation steps have to […]

Resfams:基于HMM谱的抗性基因注释

标题:

Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology

摘要:

Antibiotic resistance is a dire clinical problem with important ecological dimensions. While antibiotic resistance in human pathogens continues to rise at alarming rates, the impact of environmental resistance on human health is still unclear. To investigate the relationship between human-associated and […]

CheckM:微生物基因组组装/metagenome基因组重构完整度和杂合度评估

标题:

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

摘要:

Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. While this increasing breadth of draft genomes is providing key information regarding the […]

eggNOG-mapper: COG/GO 功能注释新方案

标题:

Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper

摘要:

Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines relatively inaccessible, less precise homology-based functional transfer is still the default for (meta-)genome annotation. We therefore developed eggNOG-mapper, a tool for functional annotation […]

BUSCO:使用单拷贝直系同源基因评估基因组和注释完整性

文章:

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs

摘要: BUSCO completeness assessment employs sets of Benchmarking Universal Single-­Copy Orthologs from OrthoDB (www.orthodb.org) to provide quantitative measures of the completeness of genome assemblies, annotated gene sets, and transcriptomes in terms of expected gene content. Genes that make up the BUSCO sets for each […]

在Ubuntu上安装InterProScan(Ubuntu)

在Ubuntu上安装InterProScan(Ubuntu)

Related:BLAST2GO数据库本地化安装

基本介绍

Interproscan 组合了不同的蛋白质特征(protein signature)数据库,包括了 Pfam , PIR,SMART,Gene3D,TigrFam,SuperFamily,TMHMM,SignalPHMM,FPrintScan以及PatternScan等。通过InterPro,我们可以利用这些数据库更好的注释和了解蛋白质功能等详细信息,此外,我们也可以容易的和GO进行关联,极大的丰富蛋白质的GO功能注释。

基本步骤 安装Perlbrew

为了安装本地的IPRscan,需要安装好多Perl的包,在Ubuntu 12.04 上Perl 版本是 5.14.2,最新的是5.18.0,很多包不都这么支持了,比如 File::Basename, English,不过 perlbrew , 可以很方便的安装多个Perl版本,以及随意切换, 在Perl 5.18.0 环境下可以将IPRscan需要的包顺利安装完成。

设置perlbrew 安装路径:

export PERLBREW_ROOT=/home/biostack/tools/perl5

安装perlbrew

wget -O – http://install.perlbrew.pl | bash

bashrc 中添加:

#perlbrew export PERLBREW_ROOT=/home/biostack/tools/perl5 source /home/biostack/tools/perl5/etc/bashrc

初始化

perlbrew init

选择镜像

perlbrew mirror

查看可以安装的perl 版本

perlbrew available

安装perl-5.18.0

perlbrew install […]

BLAST2GO 的本地安装( Ubuntu )

BLAST2GO 的本地安装( Ubuntu )

基本介绍

Blast2GO是一套集成的比较成熟的序列功能注释和分析平台, 可以整合NR, Swiss-prot 以及Interproscan的结果对序列进行功能Gene Ontology(GO)的功能分类。 Blast2GO是针对实验人员设计的,具有友好直观的界面。在处理大量序列时,虽然可以提交上千条序列在线分析,但是很费时且不能有效的控制软件版本,所以本地安装并基于命令行的运行模式显得尤其必要。

基本步骤 Mysql安装及其创建用户 Mysql 安装 sudo apt-get install mysql-server mysql-client mysql-workbench

安装过程中会提示输入root的密码: 你也可以后面使用下面的命令设置root密码:

sudo mysqladmin -u root -hlocalhost password ‘biostack’

测试

$ mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 41 Server version: 5.5.31-0ubuntu0.12.04.2 (Ubuntu)

Mysql安装版本:Mysql […]