The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes


Background Random community genomes (metagenomes) are now commonly used to study microbes in different environments. Over the past few years, the major challenge associated with metagenomics shifted from generating to analyzing sequences. High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers.

Results A high-throughput pipeline has been constructed to provide high-performance computing to all researchers interested in using metagenomics. The pipeline produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. Phylogenetic and functional summaries of the metagenomes are generated, and tools for comparative metagenomics are incorporated into the standard views. User access is controlled to ensure data privacy, but the collaborative environment underpinning the service provides a framework for sharing datasets between multiple users. In the metagenomics RAST, all users retain full control of their data, and everything is available for download in a variety of formats.

Conclusion The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes. With built-in support for multiple data sources and a back end that houses abstract data types, the metagenomics RAST is stable, extensible, and freely available to all researchers. This service has removed one of the primary bottlenecks in metagenome sequence analysis – the availability of high-performance computing for annotating the data.





MG-RAST 提供在线的 Metagenome/Metatranscriptome 数据平台,直接原始的reads也支持拼装的contig,先不管分析内容的好坏,首先提供的project编号 可以很方便的出现在文章中,给可重复性的研究也提供的有力保证。 此外在线 Metagenome/Metatranscriptome 数据分析,EBI Metagenome 也有这样的优势,也是不错的选择。 目前MG-RAST 收录已经完成了268,325个样本的数据分析, 版本刚刚升级到了4.0 版本,针对单样本分析,可以获得以下信息:

1. 序列统计、质量控制(GC含量图,核酸组成、序列长度分布)
2. 序列预测(功能分类 rRNA/ protein coding);
3. 重复序列预测(使用 DRISEE);
4. Kmer谱 (rank abundance 可视化);
5. 序列比对结果统计
6. COG/NOG 功能谱;
7. KEGG 分类(KEGG  Ortholog 分类);
8. the SEED 注释;
9. 物种组成分布;
10. 多样性分析(稀释曲线/多样性指数)
11. 元数据

所有的数据,包括绘制图表的数据都可以自己有下载,的确很方便,这也是那么受欢迎的原因,另外MG-RAST使用M5NR 非冗余数据库进行序列比对,这样减少了需要比对很多库的麻烦。



