Archives

BUSCO:使用单拷贝直系同源基因评估基因组和注释完整性

文章:

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs

摘要:
   
BUSCO completeness assessment employs sets of Benchmarking Universal Single-­Copy Orthologs from OrthoDB (www.orthodb.org) to provide quantitative measures of the completeness of genome assemblies, annotated gene sets, and transcriptomes in terms of expected gene content. Genes that make up the BUSCO sets for each major lineage are selected from orthologous groups with genes present as single-­copy orthologs in at least 90% of the species. While allowing for rare gene duplications or losses, this establishes an evolutionary informed expectation that these genes should be found as single-­copy orthologs in the genome of any newly-­sequenced species.

官方主页:

http://busco.ezlab.org/

文章导读:

BUSCO 是 Benchmarking Universal Single-Copy Orthologs的缩写,使用单拷贝直系同源基因评估基因组/基因集合/转录本的完整性,单拷贝基因应用很大部分用于评估基因组完整性,相关应用包括,CheckM。

当前版本注意事项:

  1. 真核生物基因组使用tblastn 确定库基因集合在基因组中定位,然后使用 Augustus 进行训练,然后进行基因预测,最后是HMMsearch 对HMM库就行搜索,鉴定这些基因是否存在,这个模式存在一定问题,基因预测可能不是很完整,导致评估结果存在问题,
    建议:使用多个基因预测工具预测的结果,包括 Genewise2, Genemark-ES, GeneID , Augustus 等, 然后使用OGS(蛋白质序列集合)模式进行评估,或者直接使用这些蛋白序列使用HMMscanhe HMM 库进行搜索。
  2. 原核微生物原理差不多,使用 Prodigal/Glimmer/Genemark 预测CDS, 直接用OSG模式做

软件安装:

   数据库下载:


   wget http://busco.ezlab.org/files/fungi_buscos.tar.gz
   wget http://busco.ezlab.org/files/bacteria_buscos.tar.gz
   wget http://busco.ezlab.org/files/eukaryota_buscos.tar.gz

   程序安装:

   wget  http://busco.ezlab.org/files/BUSCO_v1.1b1.tar.gz

相关工具:

BUSCO:http://busco.ezlab.org/
orthodb:http://orthodb.org/
CEGMA:http://korflab.ucdavis.edu/datasets/cegma/
Goodbye CEGMA, hello BUSCO! :http://www.acgt.me/blog/2015/5/18/goodbye-cegma-hello-busco
CheckM: https://ecogenomics.github.io/CheckM/
Augustus : http://bioinf.uni-greifswald.de/augustus/

2015/6/4 7:51

Comments are closed.