CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes


Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. While this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of ‘marker’ genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate, single cell and metagenome derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination, and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities.




pip install checkm-genome
box install pplacer-1.1a18


CheckM 可以用来单菌基因组组装或者Metagenome Binning(重构)的基因组的完整度、杂合度质量评估信息等,以及根据Marker基因鉴定基因组的系统分类。CheckM已被用于meta_seq数据分析流程,用于评估重构的完整度检测。 更多详细的功能描述见:Github Wiki 页面。



Comments are closed.