Archives

UNOISE2:通过对Illumina测序平台结果错误纠正进行微生物多样性分析

标题:

UNOISE2: Improved error-correction for Illumina 16S and ITS amplicon reads

摘要

Amplicon sequencing of tags such as 16S and ITS ribosomal RNA is a popular method for investigating microbial populations. In such experiments, sequence errors caused by PCR and sequencing are difficult to distinguish from true biological variation. I describe UNOISE2, an updated version of the UNOISE algorithm for denoising (error-correcting) Illumina amplicon reads and show that it has comparable or better accuracy than DADA2.

地址:

http://biorxiv.org/content/early/2016/10/15/081257

软件:

http://www.drive5.com/usearch/

导读:

Usearch 先前在微博上介绍了很多次,主要三点: 1. 序列相似性比对, 2. 微生物多样性数据处理,逐渐构成了小生态,3. 序列处理瑞士军刀, 三点上竞争对手都很多, 第一点上 diamondRAPSearch等都是竞争对手, 第二点 vsearch 紧随其后,另外还有 [QIIMEMothur 等老牌工具, 第三个问题太多了,主要有 seqtkseqkit 等。 不过这个帖子提到的是新出炉的 UNOISE2,就是错误纠正(这类工具也很多), 包括了去除测序错误的序列,嵌合体序列,Phix 污染序列以及低复杂度序列等, 然后就可以直接构建 OTU表了, UNOISE2 流程推荐直接从最原始的序列开始, 合并双端序列、过滤、去冗余、错误纠正、序列比对、构建OTU表、一气呵成。 另外:可以增加调整序列方向这一步,需要参考序列库,比如 RDP 的序列库,或者使用 Silva 的库。

不等不提 Usearch 工具使用序列: 32位版本不管是工业界还是学术界随便用,免费, 64位版本需要进行收费了,学术界要比工业界便宜不少,现在刚进入 9.0版本,销售策略也进行了调整,从先前按年订阅, 变成现在 按大版本号订阅 ,更人性化了。

官网的主要介绍:

-. UNOISE algorithm

The UNOISE algorithm performs error-correction (denoising) on amplicon reads. It is implemented in the unoise command. UNOISE is designed for Illumina reads, not earlier technologies such as 454 pyrosequencing.

Correct biological sequences are recovered from the reads, resolving distinct sequences down to a single difference (sometimes) or two or more differences (almost always). I consider this approach superior to traditional OTU clustering at 97% identity because OTUs may merge different species (or more generally, different phenotypes) with distinct sequences while denoising gives the best possible resolution.

Errors are corrected as follows: – Reads with sequencing error are identified and removed. – Abundances are corrected (when the OTU table is generated). – Chimeras are removed. – PhiX sequences are removed. – Low-complexity sequences due to Illumina artifacts are removed.

Using denoised sequences as OTUs has two possible drawbacks: a single species may be split into two OTUs due to different strains or paralogs, and the sensitivity is slightly lower because UPARSE can make robust OTUs from unique sequences with abundance as low as 2 while the minimum abundance for UNOISE is around 4. I consider splitting of strains to be a good thing, because they may have different phonotypes and hence different ecological roles. Splitting due to paralogs is relatively benign (what does it matter?), and is not solved by clustering at 97% identity because paralogs have identities <97% in some cases. Splitting or lumping is unadvoidable regardless of whether the clustering identity is 97% or 100% so I would argue that it is better to resolve as many distinct biological sequences as possible. Sensitivity to unique sequences with abundance <8 (summed over all samples) is rarely important in practice.

Denoised sequences are valid OTUs (the clustering identity is 100%, if you like) and can be used to generate an OTU table in just the same way as 97% OTUs.

-. UNOISE pipeline

A UNOISE pipeline recovers biological sequences from an amplicon sequencing experiment by performing error-correction (denoising) of Illumina reads. UNOISE is not designed for other sequencing technologies, e.g. 454 pyrosequencing reads. The UNOISE algorithm is implemented in the unoise command.

See Tutorials for example scripts & data.

Reads in FASTQ format I strongly recommended starting from “raw” reads, i.e. the reads originally provided by the sequencing machine base-calling software. You should do quality filtering with USEARCH rather than using reads that have already been filtered by third-party software.

Reads in FASTA format The unoise command supports reads in FASTA format. You may need to do this if your reads have already been quality filtered by some other method and you don’t have access to the original FASTQ reads.

Sample pooling I recommend combining reads from as many samples as possible. See sample pooling for discussion.

Read quality filtering Quality filtering of the reads should be done using USEARCH because maximum expected error filtering method is much more effective at suppressing reads with high error rates than other filters, e.g. those based on average Q scores. Using a maximum expected errors of 1.0 is a good default choice (-fastq_maxee 1.0 option to fastq_filter or fastq_merge_maxee 1.0 option of fastq_mergepairs). You can use fastx_learn to estimate the error rate after filtering.

Global trimming You should trim reads to a fixed length unless the sequences are contigs generated by a paired read assembler, in which case it may not be necessary. You should also trim any primer-binding sequences at the ends of the reads. See global trimming for discussion.

Unique sequences Get the set of unique sequences with abundances using the fastx_uniques command with the -sizeout option. This will be the input file for the unoise command.

Creating an OTU table Denoised sequences are valid OTUs (the clustering identity is 100%, if you like) and can be used to generate an OTU table in just the same way as 97% OTUs. Reads must have sample identifiers for this to work. The simplest way to do this is usually to use the -relabel @ option of fastq_filter or fastq_mergepairs.

Example commands For typical Illumina reads with one pair of FASTQ files (R1 and R2) per sample.

usearch -fastq_mergepairs _R1.fastq -relabel @ -fastaout reads.fq

usearch -fastq_filter reads.fq -fastq_maxee 1.0 -fastaout filtered.fa

usearch -fastx_uniques filtered.fa -fastaout uniques.fa -sizeout

usearch -unoise uniques.fa -tabbedout out.txt -fastaout denoised.fa

usearch -usearch_global reads.fq -db denoised.fa -strand plus -id 0.97 -otutabout otu_table.txt

版本:

2016-11-17.v1

MG-RAST:经典的Metagenome在线数据分析平台,完美解决物种组成和功能解析

标题:

The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes

摘要:

Background Random community genomes (metagenomes) are now commonly used to study microbes in different environments. Over the past few years, the major challenge associated with metagenomics shifted from generating to analyzing sequences. High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers.

Results A high-throughput pipeline has been constructed to provide high-performance computing to all researchers interested in using metagenomics. The pipeline produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. Phylogenetic and functional summaries of the metagenomes are generated, and tools for comparative metagenomics are incorporated into the standard views. User access is controlled to ensure data privacy, but the collaborative environment underpinning the service provides a framework for sharing datasets between multiple users. In the metagenomics RAST, all users retain full control of their data, and everything is available for download in a variety of formats.

Conclusion The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes. With built-in support for multiple data sources and a back end that houses abstract data types, the metagenomics RAST is stable, extensible, and freely available to all researchers. This service has removed one of the primary bottlenecks in metagenome sequence analysis – the availability of high-performance computing for annotating the data.

http://metagenomics.nmpdr.org

地址:

http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-386

源码:

https://github.com/MG-RAST/MG-RAST

导读:

此处输入图片的描述

MG-RAST 提供在线的 Metagenome/Metatranscriptome 数据平台,直接原始的reads也支持拼装的contig,先不管分析内容的好坏,首先提供的project编号 可以很方便的出现在文章中,给可重复性的研究也提供的有力保证。 此外在线 Metagenome/Metatranscriptome 数据分析,EBI Metagenome 也有这样的优势,也是不错的选择。 目前MG-RAST 收录已经完成了268,325个样本的数据分析, 版本刚刚升级到了4.0 版本,针对单样本分析,可以获得以下信息:

1. 序列统计、质量控制(GC含量图,核酸组成、序列长度分布)
2. 序列预测(功能分类 rRNA/ protein coding);
3. 重复序列预测(使用 DRISEE);
4. Kmer谱 (rank abundance 可视化);
5. 序列比对结果统计
6. COG/NOG 功能谱;
7. KEGG 分类(KEGG  Ortholog 分类);
8. the SEED 注释;
9. 物种组成分布;
10. 多样性分析(稀释曲线/多样性指数)
11. 元数据

所有的数据,包括绘制图表的数据都可以自己有下载,的确很方便,这也是那么受欢迎的原因,另外MG-RAST使用M5NR 非冗余数据库进行序列比对,这样减少了需要比对很多库的麻烦。

版本:

2016-11-16.v1

Miniasm+Racon:快速准确完成三代测序数据拼装

标题:

Fast and accurate de novo genome assembly from long uncorrected reads

摘要:

The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource intensive error correction and consensus generation steps to obtain high quality assemblies. We show that the error correction step can be omitted and high quality consensus sequences can be generated efficiently with a SIMD accelerated, partial order alignment based stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore datasets we show that Racon coupled with Miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.

地址:

http://biorxiv.org/content/early/2016/08/05/068122

源码:

https://github.com/isovic/racon

安装:

git clone https://github.com/isovic/racon.git && cd racon && make modules && make tools && make -j

导读:

三代测序拼装软件,三代测序平台 Nanopore / Pacbio 产生的数据的一个共同点就是,读长长,错误率高,在用于分析之前需要对数据进行特殊处理(consensus,错误纠正),再进行拼装任务,Liheng 开发的 Miniasm 可以直接使用未处理的长读长序列进行快速拼装,但是Miniasm对拼装的Contig序列进行抛光处理,所以会出现不少SNP/INDEL, Racon 就是为了解决这个问题,支持 (GFA, FASTA, FASTQ, SAM, MHAP and PAF) 等文件输入格式,相对于 Quiver / Nanopolish 通用性更高,这样一套新的组合 Miniasm+Racon 出现了,高校快速。

版本:

2016-10-15.v1

Resfams:基于HMM谱的抗性基因注释

标题:

Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology

摘要:

Antibiotic resistance is a dire clinical problem with important ecological dimensions. While antibiotic resistance in human pathogens continues to rise at alarming rates, the impact of environmental resistance on human health is still unclear. To investigate the relationship between human-associated and environmental resistomes, we analyzed functional metagenomic selections for resistance against 18 clinically relevant antibiotics from soil and human gut microbiota as well as a set of multidrug-resistant cultured soil isolates. These analyses were enabled by Resfams, a new curated database of protein families and associated highly precise and accurate profile hidden Markov models, confirmed for antibiotic resistance function and organized by ontology. We demonstrate that the antibiotic resistance functions that give rise to the resistance profiles observed in environmental and human-associated microbial communities significantly differ between ecologies. Antibiotic resistance functions that most discriminate between ecologies provide resistance to β-lactams and tetracyclines, two of the most widely used classes of antibiotics in the clinic and agriculture. We also analyzed the antibiotic resistance gene composition of over 6000 sequenced microbial genomes, revealing significant enrichment of resistance functions by both ecology and phylogeny. Together, our results indicate that environmental and human-associated microbial communities harbor distinct resistance genes, suggesting that antibiotic resistance functions are largely constrained by ecology.

文章:

http://www.nature.com/ismej/journal/v9/n1/full/ismej2014106a.html

源码:

https://github.com/dantaslab/resfams
http://www.dantaslab.org/resfams/

安装:

wget http://dantaslab.wustl.edu/resfams/Resfams-proteins.tar.gz
tar xzvf  Resfams-proteins.tar.gz
cat proteins/*  >resfams.faa
diamond  makedb  --in  Resfams.fa  -d  Resfams

wget  http://dantaslab.wustl.edu/resfams/Resfams.hmm.gz   ./
gunzip Resfams.hmm.gz
hmmpress Resfams.hmm

导读:

抗生素抗性基因注释在病原微生物基因组测序、metagenome测序等项目中的关注度很高,先前有 Antibiotic Resistance Database (ARDB)Resistance Database (CARD)等序列库用于抗性基因注释,Resfams 提供了基于谱序列相似性搜索的策略,用于基因序列注释(功能谱注释),对于拼装的序列来说,HMMER还是可以提供比较快的执行速度,但是对于 Metagenome项目来说, 如果不拼装,直接使用reads翻译的ORF注释的话,计算序列还是很大,基于序列相似性搜索的工具Diamond/Usearch等可以快速鉴定可能的抗性基因(使用阈值足够大确保不遗漏),然后在使用HMM谱过滤掉一些假阳性序列可以达到加速目的。

版本:

2016-11-14.v1

FASTQSim:高通量测序数据模拟应用,支持 illumina/ion/pacbio/roche平台

标题:

FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets.

摘要:

BACKGROUND:
High-throughput next generation sequencing technologies have enabled rapid characterization of clinical and environmental samples. Consequently, the largest bottleneck to actionable data has become sample processing and bioinformatics analysis, creating a need for accurate and rapid algorithms to process genetic data. Perfectly characterized in silico datasets are a useful tool for evaluating the performance of such algorithms. Background contaminating organisms are observed in sequenced mixtures of organisms. In silico samples provide exact truth. To create the best value for evaluating algorithms, in silico data should mimic actual sequencer data as closely as possible.
RESULTS:
FASTQSim is a tool that provides the dual functionality of NGS dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with specific error profiles obtained in the characterization step.
CONCLUSIONS:
FASTQSim enables users to assess the quality of NGS datasets. The tool provides information about read length, read quality, repetitive and non-repetitive indel profiles, and single base pair substitutions. FASTQSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. In this regard, in silico datasets generated with the FASTQsim tool hold several advantages over natural datasets: they are sequencing platform independent, extremely well characterized, and less expensive to generate. Such datasets are valuable in a number of applications, including the training of assemblers for multiple platforms, benchmarking bioinformatics algorithm performance, and creating challenge datasets for detecting genetic engineering toolmarks, etc.

文章:

http://bmcresnotes.biomedcentral.com/articles/10.1186/1756-0500-7-533

源码:

https://sourceforge.net/projects/fastqsim

安装:

    axel  https://sourceforge.net/projects/fastqsim/files/FASTQsim_v2.0.tgz/download
    tar xzvf  FASTQsim_v2.0.tgz
    mv   FASTQsim_v2.0  FASTQsim-2.0

导读:

FASTQSim:高通量测序数据模拟应用,支持 illumina/ion/pacbio/roche平台,被广泛用于metagenome数据模拟,比如文章:
Evaluating performance of metagenomic characterization algorithms using in silico datasets generated with FASTQSim

版本:

2016-11-13.v1

CheckM:微生物基因组组装/metagenome基因组重构完整度和杂合度评估

标题:

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

摘要:

Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. While this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of ‘marker’ genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate, single cell and metagenome derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination, and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities.

文章:

http://genome.cshlp.org/content/early/2015/05/14/gr.186072.114.abstract

源码:

https://github.com/Ecogenomics/CheckM

安装:

pip install checkm-genome
wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_v1.0.7.tar.gz
box install pplacer-1.1a18

摘要:

CheckM 可以用来单菌基因组组装或者Metagenome Binning(重构)的基因组的完整度、杂合度质量评估信息等,以及根据Marker基因鉴定基因组的系统分类。CheckM已被用于meta_seq数据分析流程,用于评估重构的完整度检测。 更多详细的功能描述见:Github Wiki 页面。https://github.com/Ecogenomics/CheckM/wiki

版本:

2016-11-13.v1

eggNOG-mapper: COG/GO 功能注释新方案

标题:

Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper

摘要:

Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines relatively inaccessible, less precise homology-based functional transfer is still the default for (meta-)genome annotation. We therefore developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from eggNOG. To validate our method, we benchmarked Gene Ontology predictions against two widely used homology-based approaches: BLAST and InterProScan. Compared to BLAST, eggNOG-mapper reduced by 7% the rate of false positive assignments, and increased by 19% the ratio of curated terms recovered over all terms assigned per protein. Compared to InterProScan, eggNOG-mapper achieved similar proteome coverage and precision, while predicting on average 32 more terms per protein and increasing by 26% the rate of curated terms recovered over total term assignments per protein. Through strict orthology assignments, eggNOG-mapper further renders more specific annotations than possible from domain similarity only (e.g. predicting gene family names). eggNOG-mapper runs ~15x than BLAST and at least 2.5x faster than InterProScan. The tool is available standalone or as an online service at http://eggnog-mapper.embl.de.

文章:

http://biorxiv.org/content/early/2016/09/22/076331

源码:

https://github.com/jhcepas/eggnog-mapper

安装:

axel https://github.com/jhcepas/eggnog-mapper/archive/0.12.7.tar.gz
tar xzvf eggnog-mapper-0.12.7.tar.g

摘要:

eggNOG 被广泛用于基因功能注释,提供了NCBI COG 提供的广泛的扩展,现在基于最新版本 eggNOG-4.5 开发了 GO 的功能注释可选防线, 先前主要是基于BLAST2GO、Interproscan,可以使用序列谱相似性搜索或者直接序列相似性搜索,可以用来注释基因组/转录组以及metagenome, 在线服务网站: http://beta-eggnogdb.embl.de/#/app/emapper, 本地话安装也不是很困难,但是和Interproscan一样,很耗磁盘空间。

eggNOG注释文件就接近20G, eggNOG序列文件也接近 20G, 另外还有接近130G的HMM序列谱文件,内存消耗也是相当可观: 真核数据库需要90G内存, 细菌数据库需要32G内存,古菌序列搜索需要 10G,使用PCIe-SSD也许是一种解决低内存的方案, 安装和使用也比较简单,待本帖更新。

eggNOG-mapper

版本:

2016-11-12.v1

KneadData:微生物组实验质量控制

标题:

KneadData: a tool designed to perform quality control on metagenomic and metatranscriptomic sequencing data

摘要:

KneadData is a tool designed to perform quality control on metagenomic sequencing data, especially data from microbiome experiments. In these experiments, samples are typically taken from a host in hopes of learning something about the microbial community on the host. However, metagenomic sequencing data from such experiments will often contain a high ratio of host to bacterial reads. This tool aims to perform principled in silico separation of bacterial reads from these “contaminant” reads, be they from the host, from bacterial 16S sequences, or other user-defined sources.

地址:

http://huttenhower.sph.harvard.edu/kneaddata

源码:

https://bitbucket.org/biobakery/kneaddata

安装:

wget  https://bitbucket.org/biobakery/kneaddata/downloads/kneaddata_v0.5.1.tar.gz
tar xzvf  kneaddata_v0.5.1.tar.gz
sudo python setup.py install --bypass-dependencies-install

解读:

KneadData 为 Huttenhower实验室出品,设计用来实现微生物组数据质量控制,尤其是metagenome 或者 metatranscriptome 数据, 也可以作为常规的污染序列去除工具,可以使用 bowtie2 或者 bmtagger作为序列比对引擎,经过具体测试 bmtagger 去除效率要比 bowtie2 好,bmtagger为人类微生物项目的SOP工具,KneadData 多线程处理效果不佳,按照下面思路也许重新基于bmtagger造一个小轮子更合适,(1)、对fastq文件切分 (2)、使用 parallel/xargs/gargs提交 (3)、合并结果文件 (4)、数据清理。

Biostack.ORG 开始社区建设

关注我们: Biostack.ORG , 专业的生物信息社区

社区建设中。

Biostack.ORG

biostack Weekly | 2015-07-19 (第八期)

13’th July, 2015
23:26 RT @DrJCThrash Automated and accurate estimation of gene family abundance from shotgun metagenomes w/ @phylogenomics @tjsharpton http://t.co/5djMe0WTpA

14’th July, 2015
18:14 RIG: Recalibration and Interrelation of Genomic Sequence Data with the GATK http://t.co/25OPkzHBrX
19:30 Bandage: interactive visualization of de novo genome assemblies http://t.co/86o6LbONwp
19:35 Good laboratory practice for clinical next-generation sequencing informatics pipelines http://t.co/z1kxP5pK9W
21:13 RT @moorejh RT @craigbrownphd MIT proves flash is as fast as RAM, and cheaper, for #bigdata: http://t.co/SInwJoHT5i #technology
21:22 RT @BioMickWatson Cpipe appears to be GATK best practices implemented in Bpipe: http://t.co/k7O2HvWE0j
21:34 RT @gilbertjacka @hollybik talking aout http://t.co/rMY4lIR6M5 for data visualization – which uses colors and shapes to make data more accessible. #EEGen15
22:20 RT @Carlybacter Stringtie – new #transcriptome assembly http://t.co/7P9RU90P9R from @StevenSalzberg1 lab #EEGen15
22:35 RT @genetics_blog Accelerating Scientific Publication in Biology http://t.co/yv4Gig0cYT
22:36 RT @genetics_blog The bacterial pangenome as a tool for analyzing pathogenic bacteria http://t.co/PuS5QdOqiQ (review) http://t.co/dRH3RaQ2IR
22:38 RT @DrSLJ38 Read these blogs http://t.co/ajDRW6eGbc https://t.co/wDb9odsjOF http://t.co/XwYI6mlQeH https://t.co/WkH1APJlRw #IDRNgenomics
22:42 RT @genetics_blog quantro: a data-driven approach to guide the choice of an appropriate normalization method http://t.co/ChVnsSEsji http://t.co/sgfASgHsPG

15’th July, 2015
22:04 Metagenomics of toilet waste from long distance flights http://t.co/YshekTGF0a
22:21 RT @moorejh #machinelearning #datascience RT @newsycbot A Step by Step Backpropagation Example http://t.co/FElQqECtmS http://t.co/pu5CJ0SewV
23:06 Cpipe: a shared variant detection pipeline designed for diagnostic settings http://t.co/6bPNJSVRVP
23:27 MetaPathways v2.5: quantitative functional, taxonomic and usability improvements http://t.co/pmucUeh5tV
23:27 Hyperscape: visualization for complex biological networks http://t.co/uD0SvFI4PL
23:29 Investigating microbial co-occurrence patterns based on metagenomic compositional data http://t.co/GgpGaRgaK5
23:29 Correcting Illumina data http://t.co/ooX85lPeKG

16’th July, 2015
07:53 RT @kc31958 RW: “First genome of any species should be of highest quality possible”. #UGMAsia @Pacbio http://t.co/rCUDOqV9S3
21:28 RT @pathogenomenick #mgen journal launched! Read the interview with Stanley Falkow in the ‘standing on the shoulders of giants’ section: http://t.co/TdrkiJnT1M
21:28 Big Data: Astronomical or Genomical? http://t.co/J0vXhJc9bV

21:33 RT @metagenomic_lit MBBC: an efficient approach for metagenomic binning based on clustering. http://t.co/uIOjls7DDC
22:08 RT @andrewjpage Gubbins is now available on #homebrew http://t.co/iAcxXiLUwj
22:45 RT @assemblathon IEEE Xplore Abstract – SWAP-Assembler 2: Scalable Genome Assembler towards Millions of Cores http://t.co/ljPELnIf99

17’th July, 2015
22:07 RT @JavaScriptDaily Fundamental Node.js Design Patterns: http://t.co/VDuh7D2bIL
22:16 RT @genetics_blog CSHL protocols: RNA Sequencing & Analysis http://t.co/259IOwVXNu
22:21 How to Succeed at Clinical Genome Sequencing http://t.co/rfW9j8lBFt
23:15 RT @HeathrTurnr New #Rpackage UpSetR provides alternative to Venn diagrams, with optional plots highlighting specific intersections. http://t.co/GbCFklLufD

18’th July, 2015
07:53 RT @KevinADavies Lee Hood, who knows a thing or two about DNA sequencing, predicts the $100 genome in 5-8 years using synthetic nanopores #evolseq
08:39 RT @bioinformer Roche’s 454 Sues Thermo Fisher’s @IonTorrent for Patent Infringement | GenomeWeb https://t.co/v4aDfCZpuZ @rochesequencing #genomics #biotech
10:06 RT @Amazing_Maps Visualizing city densities –http://t.co/SuNWZGqTCP
10:07 RT @KevinADavies Bill Efcavitch: according to @thermofisher still some 15,000 Sanger instruments running today (out of ~30,000 install base). #evolseq
10:23 RT @nanopore You can now join the queue for the PromethION Access Programme https://t.co/lMzBXvuYeB
10:36 RT @pathogenomenick ete2 – where have you been all my life, quite the best way to annotate trees! http://t.co/SGzCea5iPB
21:06 RT @EricTopol Nice that the human interactome is so simple (not!) http://t.co/tHjVYZZU7V @CellCellPress http://t.co/MsvQa5ZJ82
22:38 RT @DrJCThrash Investigating microbial co-occurrence patterns based on metagenomic compositional data http://t.co/O51kZRF0nW

19’th July, 2015
10:56 Maize pan-transcriptome provides novel insights into genome complexity and quantitative trait variation | bioRxiv http://t.co/6bmyYp4fCE
12:58 RT @pypi_updates metameta 0.0.0.33: Toolkit for analyzing meta-transcriptome/metagenome mapping data http://t.co/O7PuDtkWxC
13:02 RT @KantorKantor Impact of the Gut Metagenome on Autoimmunity http://t.co/vL8xORxvyJ
19:03 RT @koadman Alm describing Smillie’s Strainfinder2 software to predict strains from metagenome & a close reference genome #UrbanGenome