Bioinformatics for Proteomics

Page 1

BIOINFOMATIC ANALYSIS FOR PROTEOMICS FUNCTIONAL ANNOTATION AND ENRICHMENT ANALYSIS

Func�onal annota�on is the process of a�aching biological informa�on to a gene or protein sequence. The func�onal annota�on includes three main steps: a. Iden�fy the part of the genome that does not encode a protein; b. Iden�fy the elements in the genome (gene predic�on); c. A�ach biological informa�on to these elements. Func�onal enrichment analysis determines the classes of gene or protein that is overexpressed in a large number of genes or proteins, and may be related to the disease phenotype. Sta�s�cal methods are o�en used to determine the significantly enriched genome. The general steps of enrichment analysis include: a. Calculate a p value (the value represents the overexpression of the protein in the list); b. Evaluate the sta�s�cal significance of nodes or paths according to the p value; c. Normalize the p value of each set of protein and calculate the false discovery rate for mul�ple hypothesis tests.

350 300 250 200 150 100 50 0

Molecular Func�on Cellular Component

non-membrane- bounded organelle intracellular non-membrane-bounded organelle nuclear lumen ribonucleoprotein complex nucleoplasm cytoskeleton

Biological Process

0 RNA binding nucleic acid binding guanyl ribonucleo�de binding guanyl nucleo�de binding GTP binding nucleoside-triphosphatase ac�vity

gene expression organelle organiza�on mRNA processing RNA processing RNA splicing macromolecule localiza�on nucleic acid meta bolic process mRNA metabolic process posi�ve regula�on of protein complex assembly organic substance transport small GTPase mediated signal transduc�on

1

2

3

4

5

m me acroextra mb mo cel en ce ran lec lula velo ll e-e ula r re pe ncl r co gio ose mp n an� d l oxi or lum ex da ga en car nt ne bo ac� lle hyd v e en lectr catarate bindiity zym on ly� bin ng mo e r carr c a din egu ier c�v g lec ula hy lat ac ity r tr dro or �vi an la ac ty oxi nuc sducse ac�vity do leo er �v r s ed �d ac� ity t tra ruc nsc tur pucta e bin vity rip al m rot se a din �o o ein c� g n l v tra nsl t reguecule bind ity a� ran lat ac ing an on sfe or �v ato mi tr regu rase ac�vity cal an lat ac ity str spo or �v cel b uct rt ac� ity cel lula b iolog ure fer ac�vity lula r c ios ica orm vi r co om ynt l re a� ty mp pon he� gula on on ent c pr �o en bi oc n t o e nit m rog ul cel orga gene ss � lula niz sis en ce l com lul m r p a� a po r or etab loca roceon un ga ol liz ss d m nis ic p a� m r o pri ma oxidetabo al procesn ry a� lic oce s m o res eta n reproc ss po bol du ess nse ic c� to proc on s�m es ulus s

Gene ontology (GO) unifies the representa�on of genes and gene product a�ributes in all species. Applica�on: A. Integra�on of proteomic data from different species B. Classify the differen�al proteins C. Predict the func�ons of specific protein domains D. Iden�fy genes involved in certain diseases

Cellular Component

6

Molecular Func�on

Biological Process

Enrichment analysis of gene or protein sets: Useful for exploring func�onal and biological significance from very large data sets (such as mass spectrometry data and microarray results). GO enrichment analysis also helps to organize data from novel (or fully annotated) genomes, and compare biological func�ons between clade members and across clades.

DNA damage checkpoint

KEGG is a database that systema�cally analyzes the metabolic pathways of gene products in cells and the func�ons of these gene products. In organisms, different gene products coordinate with each other to perform biological func�ons. Annota�on analysis on the pathways of differen�ally expressed genes helps to further interpret the func�on of genes. - KEGG Pathway Annota�on - KEGG Pathway Classifica�on

Bub2

MEN

DNA

mmu04144 Endocytosis - Mus musculus (mouse) mmu04612 An�gen processing and presenta�on

mmu05100 Bacterial invasion of epithelial cells mmu04666 Fc gamma R mediated phagocytosis mmu04145 Phagosome - Mus musculus (mouse)

KEGG enrichment analysis of differen�ally expressed genes can enrich pathways with significant differences and help to find biologically regulated pathways with significant differences under experimental condi�ons.

mmu04360 Axon guidance - Mus musculus (mouse) mmu04810 Regula�on of ac�n cytoskeleton - Mus mmu05134 Legionellosis - Mus musculus (mouse) mmu05168 Herpes simplex infec�on - Mus mmu03040 Spliceosome - Mus musculus (mouse) mmu05166 HTLV-I infec�on . Mus musculus mmu04670 Leukocyte transendothelial migra�on mmu04062 Chemokine signaling pathway- Mus mmu04650 Natural killer cell mediated mmu05132 Salmonella infec�on - Mus musculus mmu04520 Adherens junc�on - Mus musculus mmu05140 Leishmaniasis - Mus musculus (mouse) mmu05203 Viral carcinogenesis - Mus musculus

0

1

2

3

4

5

-log10(Fisher' exact test p value)

Based on whole genomes analysis, COGs can reliably assign paralogs and orthologs of most genes using a simple method based on the search of triangles of bidirec�onal best hits. This method can iden�fy distant homologs and isolate closely related paralogs. Based on family analysis, this method can use the func�ons of the characteris�c members of the protein family (COG) to assign func�ons to the en�re family and describe poten�al func�ons in more than one family.

60

Frequency

70 60 50 40 30 20 10 0

40

20

0

A B C D E F G H I J K LM NO P Q R S T U V W Y Z Func�on Class

Contact Us © Creative Proteomics All Rights Reserved.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.