宏基因组软件工具

clustering

======================

cd-hit-est: fast DNA clustering

CD-HIT User Gulde

cd-hit-est (http://weizhong-lab.ucsd.edu/cd-hit/) is a very widely used program for clustering and comparing large sets of DNA sequences. cd-hit-est is very fast and can handle extremely large databases. cd-hit helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

1. "Clustering of highly homologous sequences to reduce the size of large protein database", Weizhong Li, Lukasz Jaroszewski and Adam Godzik Bioinformatics (2001) 17:282-283.
2. "Tolerating some redundancy significantly speeds up clustering of large protein databases", Weizhong Li, Lukasz Jaroszewski and Adam Godzik Bioinformatics (2002) 18:77-82.
3. "Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences", Weizhong Li and Adam Godzik Bioinformatics (2006) 22:1658-1659.
4. "CD-HIT Suite: a web server for clustering and comparing biological sequences", Ying Huang, Beifang Niu, Ying Gao, Limin Fu and Weizhong Li Bioinformatics (2010) 26:680-682.

----------------------------------------------

cd-hit:fast protein clustering

cd-hit (http://weizhong-lab.ucsd.edu/cd-hit/) is a very widely used program for clustering and comparing large sets of protein sequences. cd-hit is very fast and can handle extremely large databases. cd-hit helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

----------------------------------------------------------------

h-cd-hit: fast hierarchical protein clustering

cd-hit (http://weizhong-lab.ucsd.edu/cd-hit/) is a very widely used program for clustering and comparing large sets of protein sequences. cd-hit is very fast and can handle extremely large databases. cd-hit helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

In this program, you can create a non-redudant protein database hierarchically in two steps by using two sets of parameters. First, we cluster using sequence identity cutoff=0.9. Based the clustering results of the first step, we cluster again using sequence identity cutoff=0.6 for the second step. The final goal is to generate a non-redundant protein sequences (60% sequence identity) for downstream analysis.

=========================

rRNA prediction

===========================

blastn_rRNA:rRNA prediction by blastn program

This program predicts rRNA by using BLASTN to identify DNA reads containing rRNA sequences.

1. "Basic Local Alignment Search Tool", S. F. Altschul, et al. Journal of Molecular Biology (1990) 215(3):403-410.
2. "5S Ribosomal RNA database", M. Szymanski. et al. Nucleic Acids Res. (2002) 30: 176-178.
3. "The European ribosomal RNA database", J. Wuyts et al. Nucleic Acids Res. (2004) 32: D101-D103.

-----------------------------------------------------------------------------

hmm_rRNA: rRNA prediction by hmmer 3.0 program

This program predicts rRNA by using HMMER 3.0 to identify DNA reads containing rRNA sequences.

1."Profile hidden Markov models", S. R. Eddy Bioinformatics (1998) 14(9):755-763.
2."Identification of ribosomal RNA genes in metagenomic fragments", Y. Huang, P. Gilna and W. Li Bioinformatics (2009) 25: 1338-1340.
3. "5S Ribosomal RNA database", M. Szymanski. et al. Nucleic Acids Res. (2002) 30: 176-178.
4. "The European ribosomal RNA database", J. Wuyts et al. Nucleic Acids Res. (2004) 32: D101-D103.

=======================

tRNA prediction

===========================

tRNA: tRNA prediction by tRNAscan-SE program

This program predicts tRNA by using program tRNAscan-SE to identify DNA reads containing tRNA sequences.

1. "tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence", T. M. Lowe and S. R. Eddy Nucleic Acids Research (1997) 25(5):955-964.

========================

orf prediction

========================

orf_finder: orf prediction by six-reading-frame technique

This program predicts ORF by six-reading-frame technique.

metagene: This program predicts ORF by metagene program.

MetaGeneAnnotator

1. "MetaGene: prokaryotic gene finding from environmental genome shotgun sequence", H. Noguchi, J. Park and T. Takagi Nucleic Acids Research (2006) 34(19):5623-5630.

fraggene_scan: orf prediction by fraggene_scan program

This program predicts ORF by Fraggenescan program.

1. "FragGeneScan: predicting genes in short and error-prone reads", M. Rho, H. Tang and Y. Ye Nucleic Acids Research (2010) 38(20).

=======================

function annotation

==========================

cog: protein function annotation by COG database

This program performs function annotation by using RPSBLAST program on COG database (prokaryotic proteins).

1. "Basic Local Alignment Search Tool", S. F. Altschul, et al. Journal of Molecular Biology (1990) 215(3):403-410.

pfam:protein function annotation by pfam database

This program performs function annotation by using HMMER 3.0 program on PFAM database.

1."Profile hidden Markov models", S. R. Eddy Bioinformatics (1998) 14(9):755-763.
2. "The Pfam protein families database", R. D. Finn, et al. Nucleic Acids Rese arch (2010) 38: D211-D222.

tigrfam: protein function annotation by tigrfam database

This program performs function annotation by using HMMER 3.0 program on TIGRFAM database.

1."Profile hidden Markov models", S. R. Eddy Bioinformatics (1998) 14(9):755-763.
2. "The TIGRFAMs database of protein families", D. H., Haft et al. Nucleic Acids Research (2010) 38: D211-D222.

===============================================

pathway annotation

==============================

kegg: pathway annoation by KEGG database

This program uses BLAST to search protein sequences against KEGG protein database. The KEGG number and its pathway/functions will be outputted.

1. "Basic Local Alignment Search Tool", S. F. Altschul, et al. Journal of Molecular Biology (1990) 215(3):403-410.
2. "Kyoto Encyclopedia of Genes and Genomes", H. Ogata, et al. Nucleic Acids Research (1999) 27(1):29-34.

==============================================

taxonomy binning

rdp_binning:taxonomic binning by rdp classifier program

1. "Nave Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy", Q. Wang, G. M. Garrity, J. M. Tiedje, and J. R. Cole Appl Environ Microbiol (2007) 73(16):5261-5267.

frhit_binning: taxonomic binning by frhit program

1. "FR-HIT, a Very Fast Program to Recruit Metagenomic Reads to Homologous Reference Genomes", B. Niu, Z. Zhu, L. Fu, S. Wu, W. Li, Bioinformatics (2011).

CONCOCT

CONCOCT’s documentation

 

 

=============================

 

最新四款工具

One Codex

CLARK

Kraken

MGmapper | web版 (据称比Kraken要好很多)

IMSA + A (用RNAseq数据来做种系分类)

--------------------

Metagenomics

Genome annotation

→ Prokka
Annotation tool for bacterial, archaeal, and viral genomes

→ RAST server
for bacterial and archaeal genomes

→ Maker
eukaryotic and prokaryotic genomes

→ GeneMark
- eukaryotics; prokaryotics; viruses, pages and plasmids
- part of genome annotation pipelines at NCBI

→ MetaGeneMark
gene identification in metagenomic sequences

MULTI LOCUS SEQUENCE TYPING

Metagenome assembly

IDBA-UD (Citation=456), 2012年出品

Ray-Meta

short reads assembler
http://bioinf.spbau.ru/spades

SPAdes (Citation=1216), 2012年出品

QUAST (Citation=356) :评估基因组拼接效果的工具

MetaQUAST : 评估宏基因组拼接效果的工具

MEGAHIT (Citation=456),2015年出品

for assembling large and complex metagenomics data
https://github.com/voutcn/megahit
MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph


Velvet http://www.ebi.ac.uk/~zerbino/velvet/ Celera http://www.cbcb.umd.edu/research/assembly.shtml#software Metasim(Simulator-used to compare predictions) http://ab.inf.uni-tuebingen.de/software/metasim/welcome.html#Download Euler http://nbcr.sdsc.edu/euler/ JAZZ

Gene calling

genemark.hmm(using HMM models to identify genes) http://exon.gatech.edu/GeneMark/metagenome/Prediction/

MetaGeneMark 
FragGeneScan 
MetaGeneAnnotator
Orphelia 
Metagene

Microbial diversity Analysis

MLST(http://www. mlst.net) http://www.mlst.net/

MOTHUR http://www.mothur.org/

Mothur 的使用 

Mothur 命令手册-Mothur命令中文解释(一)

Mothur 命令手册-Mothur命令中文解释(二)

Welcome to the mothur wiki

EstimateS http://viceroy.eeb.uconn.edu/EstimateS/

QIIME http://qiime.org/install/virtual_box.html

PHACCS http://phaccs.sourceforge.net/

 

Binning

Composition based binning
TETRA http://www.megx.net/tetra/index.html

Phylopathia http://cbcsrv.watson.ibm.com/phylopythia.html

DUDes高精度地识别低丰度的分类群体

OTU finder

cd-hit-otu:OTU finder by cd-hit-otu program

This program performs Operantional Taxonomic Units (OTUs) finding. It uses a three-step clustering for identifying OTUs. The first-step clustering is raw read filtering and trimming. The second step is error-free reads picking.. At the last step, we do OTU clustering at different distanct cutoffs (0.01, 0.02, 0.03... 0.12).

Please consult CD-HIT-OTU web site for detailed description of CD-HIT-OTU.

The whole CD-HIT-OTU program was zipped into one file and can be downloaded here

------------------------

1. "Ultrafast Clustering Algorithms for Metagenomic Sequence Analysis", W. Li, L. Fu, B. Niu, S. Wu & J. Wooley Briefings in Bioinformatics, (2012) 13 (6):656-668. doi: 10.1093/bib/bbs035
2. "WebMGA: a Customizable Web Server for Fast Metagenomic Sequence Analysis", S. Wu, Z. Zhu, L. Fu, B. Niu & W. Li BMC Genomics 2011, 12:444. PDF Pubmed Citations

Sequence similiarity based binning

CARMA http://www.cebitec.uni-bielefeld.de/brf/carma/carma.html

Phymm http://www.cbcb.umd.edu/software/phymm/

Taxator-tk

 

 

Functional Annotation

MEX(Motif Extraction) http://adios.tau.ac.il/SPMatch/

MG-RAST http://metagenomics.anl.gov/

RAMMCAP(Rapid analysis of Multiple Metagenomes with Clustering and Annotation Pipeline)
http://weizhong-lab.ucsd.edu/rammcap/cgi-bin/rammcap.cgi

prodigal使用教程

MetaGeneAnnotator

tRNAscan-SE 的使用

MEGAN5 - MEtaGenome ANalyzer

Comparitive Metagenomics

MEGAN http://metagenomics.anl.gov/

MG-RAST http://metagenomics.anl.gov/

Camera http://camera.calit2.net/#

ShotgunFunctionalizeR http://shotgun.math.chalmers.se/

UniFrac http://bmf.colorado.edu/unifrac/

MetaStats http://metastats.cbcb.umd.edu/detection.html

Galaxy https://main.g2.bx.psu.edu/u/aun1/w/metagenomic-analysis

MetaMine http://www.megx.net/metamine/

MetaLook http://www.megx.net/metalook/index.php

IMG/M http://img.jgi.doe.gov/cgi-bin/m/main.cgi

 

Mapping to reference genome

Bowtie http://bowtie-bio.sourceforge.net/index.shtml

BWA http://bio-bwa.sourceforge.net/

SOAPZ

MCQ

 

visualization tool

ICoVeR

 

宏基因组数据库

MG-RAST

一个开放提交数据门户该系统目前拥有超过20万个数据集,并不断更新。 在过去24个月中,提交的数量增加了4倍

EMP: Earth Microbiome Project

GOS: Global Ocean Sampling Expedition

CoML: Census of Marine Life

IMG: http://img.jgi.doe.gov/

EBI metagenomics

 

Pipe Lines

MetLab

The main function of the MetLab is to run a metagenomic classification pipeline. The pipeline is based on input from NGS sequencing data, and can perform data cleaning and pre-processing, host-genome mapping to remove contamination, assembly, as well as taxonomic binning.

Megannotator(宏基因组注释平台)

Manual

 

 

Online tools for NGS data analysis

VSEARCH

Parallel-META 3

Commercial

CLC Genomics Workbench 10

ERA-7

Quality analysis

FastQC

Ttrimmomatic

PRINSEQ @ SourceForge.net

 

PRINSEQ软件使用说明

几种clean data的软件用法

 

用二代测序数据发现病毒

VIP

Quick Start Guide

Virus Identification Pipeline (VIP) was developed for metagenomic identification of viral pathogen. VIP performs the following steps to achieve its goal: (i) map and filter out background-related reads, (ii) extensive classification of reads on the basis of nucleotide and remote amino acid homology, (iii) multiple k-mer based de novo assembly and phylogenetic analysis to provide evolutionary insight.

The results of VIP were displayed in HTML format. A demo result was available at http://yang.hukaa.com/1/

Please feel free to join the mailing list as well to ask any questions about VIP: https://groups.google.com/forum/#!forum/virus-identification-pipeline

 

http://baijiahao.baidu.com/s?id=1577425474036936057&wfr=spider&for=pc

http://www.bioon.com.cn/news/showarticle.asp?newid=66577

https://bioconda.github.io/recipes/maxbin2/README.html

Bioconda

Miniconda

Anaconda、Miniconda、Conda、pip的相互关系

Conda一些重要信息:

CONCOCT

Bioconda

(conda config --add channels r)  
conda config --add channels defaults  
conda config --add channels conda-forge  
conda config --add channels bioconda

 

Conda主要命令

source activate snowflakes

conda create -h

conda info -e

source deactivate psh2

onda create --name snowflake biopython

onda create --name snowflake python=3

conda update conda

conda --version

conda list

onda install --name bunnies beautifulsoup4

conda remove -n bunnies iopro //移除包

conda remove -n snakes --all //移除环境

conda install--channel https://conda .anaconda.ort/pandas bottleneck

rm -rf ~/miniconda //删除conda

python --version

Anaconda清华大学镜像源

30分钟Anaconda快速入门英文版

用Prokka进行宏基因组注释

 

取样和DNA提取方法集

SR:提取宏基因组DNA的靠谱改进方法!

肠道菌群2016:魅力不减,辉煌一年!(综述)

AJE:收集粪便样本,不必太纠结取样方法

FEMS ME:不同DNA提取方法,得到的菌群信息不一样

Microbiome:结肠灌洗样本可代表结肠活检样本的菌群组成

Microbiome:确保含微量DNA的样品被准确测序的新方法

SR:室温收集、储存和运输菌群样本的新方法

SR:野外考察收集粪便样本,不用过于纠结存储方法

MMB:一种制备粪菌移植样本的快速低成本方法

Microbiome:粪便样本,必须存-80℃或2天内提DNA

Microbiome:提取慢性肺病患儿样本,怎样最科学?

CPMB:一种更强大的微生物组DNA富集方法

AEM:测口腔菌群,哪种采样和储存方法靠谱?

Gut:粪便样品室温保存而菌群不变,有何妙招?

SR:提取宏基因组DNA的靠谱改进方法!

广州医大团队:粪便放室温一定时间,菌群有何变化?

APT:新工具,搞定不被交叉污染的肠黏膜样本!

 

综述及研究

Nature Reviews:类风湿性关节炎中的菌群与免疫细胞(年度综述)

肠道菌群2016:魅力不减,辉煌一年!(综述)

肠脑轴2016:肠道菌群影响情绪、代谢和行为(综述)

Nature子刊:菌群如何促癌或抗癌(长图+综述)

一图读懂:肥胖和有效减肥方法(迄今最权威,没有之一)

Nature子刊:一图读懂菌群-免疫-神经互作(必读综述)

一图读懂:到底是什么影响和决定婴儿肠道菌群

Circulation:一图读懂肠道菌群到底如何促进心血管疾病

Nature Reviews:一图读懂呼吸道菌群(必读综述)

一图读懂:44分JAMA发布的粪菌移植超精华知识!

一图读懂+67页干货:彻底了解婴儿肠道菌群!(必读超强综述)

MNFR:肠道菌群驱动或抑制大肠癌的机制(综述)

Nature:新型抗生素就在我们鼻子底下

Nature:靶向肿瘤并定时同步给药的神奇细菌

Nature:人工改造沙门氏菌,靶向抗癌并自动循环给药!

Nature Medicine:如何靶向肠脑轴治疗疾病?

Nature Reviews:彻底理解粪菌移植的机制(综述)

JAMA:粪菌移植,冷冻或新鲜,效果基本无差别

Cell:靶向肠道菌群的疗法,有挑战但更可为!

AJG副刊:好坏细菌如何决定肠道感染的后果?(综述)

Nature:一图读懂粪菌疗法的历史

JI:菌群对一型糖尿病的影响(综述)

JI:菌群调控的代谢产物如何影响免疫力?(综述)

AJG副刊:肠道感染与失衡如何影响大脑功能?(综述)

Nature子刊:利用微生物组序列分析,合成新抗生素

Immunity:两种帮助环磷酰胺抗癌的细菌被鉴定

 

本网页由麦科教育提供网站空间