BIOINFORMATICS REVIEW- SEPTEMBER 2019

Page 9

SOFTWARE

How to cluster peptide/protein sequences using cd-hit software? Image Credit: Stock Photos

“Cd-hit is used for sequence-based clustering by making clusters of a particular cut off provided as an input. It uses a single linkage clustering and finds a representative sequence for each cluster.�

C

d-hit is one of the most widely used programs to cluster biological sequences [1]. It helps in removing the redundant sequences and provides better results in the sequence analyses. Cd-hit is used for sequence-based clustering by making clusters of a particular cut off provided as an input. It uses a single linkage clustering and finds a representative sequence for each cluster. In this article, we will learn how to cluster a set of protein sequences using cd-hit software. Cd-hit package has many programs for clustering different kinds of sequences. For example, the cd-hit program is used to cluster peptide sequences, cd-hit-est is used to cluster nucleotide sequences, and even this package can compare two different databases using cd-hit-2d

and cd-hit-est-2d to compare peptide and nucleotide databases respectively [1]. In this tutorial, we are using the cd-hit program which is used to cluster a group of peptide sequences. The complete package of cd-hit can be downloaded from here. Prepare input file The input file consists of all the peptide or protein sequences in FASTA format. There is no need to format the FASTA header of the sequences. The software manages it on its own. Basic commands $ cd-hit -i input.fasta -o db100 -c 1.00 -n 5 -M 2000

where,

-o = output -c = cut-off -n = word size: n=5 for thresholds 0.7 ~ 1.0 n=4 for thresholds 0.6 ~ 0.7 n=3 for thresholds 0.5 ~ 0.6 n=2 for thresholds 0.4 ~ 0.5 -M = maximum available memory To cluster the sequences at 97% similarity cut-off $ cd-hit -i input.fasta -o db90 -c 0.97 -n 5 -M 2000

Output The output of cd-hit provides two different files:

-i = input

Bioinformatics Review | 8


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.