SEQUENCE ANALYSIS

Sequence Analysis normally carried out by analyzing various parameters  nucleotide sequences.  Sometimes, it starts with protein sequence.    These steps generally include 10 steps but it might be increased in number depending upon the type of analysis.  The ten steps are:

1. Sequence Retrieval

2. Gene Finding - ORF

3. Translation of Nucleotide Sequence

4. Similarity Analysis

5. Primary Structure Analysis

6. Secondary Structure Prediction

7. Tertiary  & Quaternary Structure Prediction

8. Validate Predicted Structure

9. Pattern / Motif search

10. Functional Analysis

1. Sequence Retrieval: 

    Nucleotide Sequence for a Particular Gene might be obtained from any one of the following databases because they are nonredundant databases.

- From Genbank - (www.ncbi.nlm.nih.gov)

- From EMBL - (www.ebi.ac.uk)

- From DDBJ - (www.ddbj.nig.ac.jp)


    Protein Sequence for a Particular protein might be obtained from any one of the following databases.

- From UNIPROT - (http://www.uniprot.org/)

2. Gene Finding - ORF

    This step is generally carried out to find out the number of genes present in the nucleotide sequence and also number of introns and exons and also the nature of the gene.  ORF means Open Reading Frame, it refers to the single translating frame with starting codon at the beginning and stop codon at the end.

1. GENSCAN - (http://genes.mit.edu/GENSCAN.html)

2. GRAIL - (http://compbio.ornl.gov/Grail-1.3/)

3. GeneId - (http://www1.imim.es/software/index.php#GENEID)

4. ORF Finder - (www.ncbi.nlm.nih.gov/gorf/gorf.html )

5. t-RNA gene finder - (http://lowelab.ucsc.edu/tRNAscan-SE/)

3. Translation of Nucleotide Sequence :

    Translation refers to conversion of nucleotide sequence into possible aminoacid sequence.  Several tools used for this purpose.  They usually provide translated sequence in six frame.  We should choose suitable frame and utilize them for further analysis.

1. EXPASY Translation Tool - (http://ca.expasy.ch/tools/dna.html)

2.  Transeq - (http://www.ebi.ac.uk/Tools/st/emboss_transeq/)

4. Similarity Analysis :

    Similarity analysis refers to analyzing either the nucleotide sequence or translated protein sequence along withe other sequences present in the databases locally or globally with single or multiple sequences at a time.  This is carried out  in two different headings namely Local Alignment and Global Alignment.  Global Alignment achieved in two ways namely Pairwise alignment and Multiple alignment.

a. Local Alignment:

1. From NCBI - BLAST - (www.ncbi.nlm.nih.gov/BLAST/)

2. From EBI - FASTA - (http://www.ebi.ac.uk/Tools/sss/fasta/)

b. Global Alignment:

i) Pairwise Alignment :

1. From EBI - EMBOSS - (http://www.ebi.ac.uk/Tools/psa/)

2. Bayes Block aligner -  (http://www.wadsworth.org/resnres/bioinfo/) or (http://bayesweb.wadsworth.org/balsa/balsa.html)

3. Pairwise alignment tool - http://pir.georgetown.edu/pirwww/search/pairwise.shtml

4. Pairwise alignment tool SMS - http://www.bioinformatics.org/sms2/pairwise_align_dna.html

ii) Multiple Alignment:

1. From EBI - ClustalW - (http://www.ebi.ac.uk/Tools/msa/clustalo/)

2. BLOCKS - (http://blocks.fhcrc.org/blocks/)

3. SAM - (http://www.cse.ucsc.edu/research/compbio/sam.html)

4. Phylogeny - (http://www.ebi.ac.uk/Tools/phylogeny/clustalw2_phylogeny/)

5. Primary Structure Analysis :

    This refers to the analysis of the protein sequence for identifying Molecular weight, PI, Hydrophobicity etc and physical and chemical properties.

1. Compute PI / Mw  - (http://ca.expasy.org/tools/pi_tool.html)

2. Peptide Mass - (http://www.expasy.org/tools/peptide-mass.html)

3. Statistical Analysis of Protein Sequences - (http://helixweb.nih.gov/tools/saps.html) or (http://www.ebi.ac.uk/Tools/seqstats/saps/)

4. ProtParam - (http://ca.expasy.org/tools/protparam.html)

5. ProtScale - ( http://ca.expasy.org/cgi-bin/protscale.pl)

6. Hydrophobic Cluster Analysis - (http://bioserv.impmc.jussieu.fr/hca-form.html)

7. PESTfind - (http://emboss.bioinformatics.nl/cgi-bin/emboss/pestfind)

6. Secondary Structure Prediction :

    Secondary structure of Protein sequence can be predicted by following any one of the servers.

1.GOR Method - (http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.html)

2. NNmethod - SIMPA96 - (http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_simpa96.html)

3. Neural network method - HNN - (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_simpa96.html)

4. Self Optimized Prediction Method - (SOPMA) - (http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_sopma.html)

5. PsiPred - (http://bioinf.cs.ucl.ac.uk/psipred/)

6. Coil Prediction - (www.ch.embnet.org/software/COILS_form.html)

7. Superfamily : - (http://supfam.mrc-lmb.cam.ac.uk)

7. Tertiary  & Quaternary Structure Prediction :

1. ESYPRED PREDICTION : - (http://www.fundp.ac.be/sciences/biologie/urbm/bioinfo/esypred/)

2. HHprediction method : - (http://toolkit.tuebingen.mpg.de/ )

3. SWISSMODEL prediction : - (www.expasy.ch/swissmod/SWISS-MODEL.html)

4. 3DJIGSAW : - (http://bmm.cancerresearchuk.org/~3djigsaw/)

5. Phyre : -(http://www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index)

6. ROSETTA SERVER :  (http://www.bioinfo.rpi.edu/bystrc/hmmstr/server.php)

7.GENO3D  : - (http://geno3d-pbil.ibcp.fr/cgi-bin/geno3d_automat.pl?page=/GENO3D/geno3d_home.html)

8. Quaternary Structure Prediction : - (http://www.mericity.com/)

8. Validate Predicted Structure :

    Predicted Protein strucutre validated using any one of the following servers:

1. WHATIF SERVER : - (http://swift.cmbi.ru.nl/servers/html/index.html)

2. VADAR : - (http://redpoll.pharmacy.ualberta.ca/vadar/)

3. JCSG structure Validation centre : - (http://www.jcsg.org/scripts/prod/validation1.cgi) or (http://www.jcsg.org/scripts/prod/validation/sv_final.cgi)

4. SAVES Server - (http://services.mbi.ucla.edu/SAVES/)

9. Pattern / Motif search :

    Different Patterns and Motifs in the protein sequence can be predicted using the following servers:

1. CDD - Domain Search - (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi)

2. PROSITE Search : - (www.expasy.ch/prosite)

3. PFAM Search : - (http://pfam.sanger.ac.uk/) or (http://pfam.xfam.org/)

4. Motif & Profile search : (http://motif.genome.jp) or (http://www.genome.jp/tools/motif/)

5. PatScan : (http://blog.theseed.org/servers/2010/07/scan-for-matches.html)

6. PRATT: (http://web.expasy.org/pratt/)

 10. Functional Analysis :

    This field is utilized to find out function of proteins using sequence and structure.

1. HNB network : (http://dag.embl-heidelberg.de/hnb_cgi/show_overview_page.pl?MenuPath=%2Ftool_index)

2. Protein Function Prediction Server : - (http://dragon.bio.purdue.edu/pfp/) or (http://kiharalab.org/web/pfp.php)

3. TagIdent : (http://us.expasy.org/tools/tagident.html) or (http://web.expasy.org/tagident/)

4. Protein function 2.2 : - (http://www.cbs.dtu.dk/services/ProtFun/)

OTHER LINKS:

1. Prediction of protein localization sites in cells : - (http://psort.hgc.jp/)

2. Meta Predict Protein : - (http://www.cs.bgu.ac.il/~dfischer/predictprotein/submit_meta.html)

3. Predict Protein Server : - (http://www.predictprotein.org/)

4. EXPASY TOOLS : - (http://www.expasy.org/proteomics)

5. PROTEIN FUNCTION PREDICTION : - (http://www.ebi.ac.uk/services)

6. Format Convertor : - ( http://www.ebi.ac.uk/Tools/sfc/readseq/) or  (http://www.cs.ucdavis.edu/~gusfield/seqio.html)

7. Kyoto Encyclopedia of Genes and Genomes (KEGG) : -  (http://www.genome.ad.jp/kegg/)

8. Codon Usage Database for Different Species : - (http://www.kazusa.or.jp/codon/)

9. Structural comparison : - (VAST -   http://www.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml) or (DALI - http://www.ebi.ac.uk/dali/Interactive.html)

10. Molecular tool Box : - (http://bip.weizmann.ac.il/toolbox/overview.html)

11. Membrane Dipping Loop Prediction : - (http://membraneproteins.swan.ac.uk/TMLOOP/)

12. Balance Subcellular Localization Prediction : - (http://gpcr.biocomp.unibo.it/bacello/)

13. Microarray Analysis : - (http://brainarray.mbni.med.umich.edu/Brainarray/Database/Database.asp)

14. Munich Information center for Protein Sequences (MIPS) : - (http://www.helmholtz-muenchen.de/en/ibis)

15. Evaluation of Structure Prediction (EVA) : - (http://pdg.cnb.uam.es/eva/)

16. Protein Structure Annotation Server (ProSAT2) : - (http://projects.villa-bosch.de/dbase/ps2/)

17. Protein refinement : - (http://sysbio.rnet.missouri.edu/3Drefine/) or (http://sysbio.rnet.missouri.edu/REFINEpro/)