Paste your protein sequence into the text box above (or use an example) then click 'Search'.

  • The progress bar below will let you know when your results are available.
  • Click Help to find out more information on this search.
  • Click API to find out how to use this service in your programs.

Loading...
Loading...

About the FunHMMer web server

The FunFHMMer web server provides domain-based protein functions (based on Gene Ontology) for query sequences based on the functional classification of the CATH-Gene3D resource.

What is the CATH-Gene3D resource?

CATH (Class, Architecture, Topology, Homology) is a hierarchical protein domain classification database. Protein structures are taken from the Protein Data Bank (PDB), chopped into individual structural domains and then classified into superfamilies based on their evolutionary origin. Structural, sequence and functional data is used to assess the evolutionary origin. The CATH superfamily code is denoted by four numbers corresponding to the CATH classification separated by periods.

For example,

LevelCATH CodeDescription
Class3Alpha Beta
Architecture3.403-Layer(aba) Sandwich
Topology3.40.710Beta-lactamase
Homologous Superfamily3.40.710.10 DD-peptidase/beta-lactamase superfamily

To browse the different levels of the CATH hierarchy, please visit here .

Gene3D is a sister database to CATH which assigns protein domain sequences to their homologous superfamilies.

The latest version (version 4.0) of CATH-Gene3D provides a comprehensive classification of structure and sequence domains into 2735 structure-based superfamilies. For more information on CATH please visit the documentation pages .

Functional Classification of CATH-Gene3D

Protein domain superfamilies in CATH-Gene3D can be functionally and structurally diverse. Therefore, they have been further classified into functional families (FunFams) using a new method - FunFHMMer. The FunFams are associated with a set of Gene Ontology (GO) annotations derived from their annotated sequences.

Functional families are groups of protein sequences and structures with a high probability of sharing the same function(s) and therefore the functionally important residues in a family are also expected to be highly conserved.

Functional family classification helps to improve the functional annotation of uncharacterised protein domain sequences assigned to an annotated functional family within the superfamily and also understand the mechanisms of functional divergence in a superfamily during evolution.

How do we Predict Functions?

The FunFHMMer function prediction server takes a protein sequence in FASTA format or UniProt/GenBank sequence identifiers as input and identifies CATH domains by scanning it against a library of CATH FunFam HMMs. The output of the web server provides the CATH domain superfamily and FunFam assignments within the query sequence and also highlights the multi-domain architecture of the sequence. The Gene Ontology (GO) annotations for the matching FunFam(s) are displayed in a table along with their annotation frequency. Our function prediction workflow is shown below:

For a detailed example please refer to the Example section.

For more information on the webserver, also refer to the Frequently Asked Questions section.

Example

Example Query Sequence:
UniProt sequence of P0AD61 (470 amino acids)
Input:
The FunFHMMer function prediction server takes a protein sequence in the FASTA format or UniProt/GenBank sequence identifiers as input in the text area on the webpage. The search for function predictions for query sequences by FunFHMMer is typically very fast, however, it may take up to several minutes for very long sequences. The progress of the search is shown by a green progress bar below and the user is notified when the search is finished and the results are available.
Output:
The results for the query sequence provides the CATH domain superfamily and FunFam assignments within the query sequence. This also highlights the multi-domain architecture (MDA) of the sequence. For example, the FunFHMMer web server returns three structural domains for the query UniProt sequence P0AD61 along with their significant E-values.
Domains can be either continuous or discontinuous depending on whether the domain is made up of one or more stretches of amino acids in the protein. For example, the E. coli Pyruvate Kinase (UniProt:P0AD61, PDB: 1E0T) consists of three structural domains – one blue discontinuous domain which matches the FunFam 6921 in CATH superfamily 3.20.20.60, the second yellow continuous domain matches FunFam 2014 in the CATH superfamily 2.40.33.10 and the third green continuous domain matches FunFam 2481 in the CATH superfamily 3.40.1380.20.

For each CATH FunFam match, the 'Info' button provides a brief dscription about the FunFam. An example for the FunFam 2481 from CATH superfamily 3.40.1380.20 is shown below. To know more about a FunFam, the 'FunFam' button allows the user to be directed to the CATH FunFam webpage in a new which can provide useful functional and structural information.

For example, information on highly conserved positions in the FunFam alignment (in the Alignment tab of the FunFam webpage) using Scorecons highlighted in green on a representative protein domain structure for the FunFam sequences.

The 'Alignment' button for each FunFam shows the alignment of the query sequence domain region aligned to the CATH FunFam HMM match using HMMER3. For example, the following figure shows the alignment of the third predicted structural domain in the query sequence (residues 323-468) to the FunFam 2481 in the CATH superfamily 3.40.1380.20. The Query sequence line is shown in capital letters. The line starting with Hit shows the consensus of the FunFam model where the capital letters indicate a highly conserved residue (predicted by HMMER3). The Consensus line indicates the matches between the query sequence and the FunFam Hit. For identical matches, the positions are represented by the same amino acid notation and for similar amino acids, the consensus line indicates a '+'.

The EC annotations and GO annotations corresponding to each domain is available in the 'EC Terms' and 'GO Terms' button along with their annotation frequency.

The following figure shows the EC and GO annotations for different FunFams.A non-redundant set of GO annotations for each ontology (Molecular Function, Biological Process and Cellular Component) predicted by FunFHMMer from all the domain regions make up the GO annotations for the query protein sequence.

Frequently Asked Questions

What is a Domain?

Protein domains are distinct, compact units of protein structure that form the functional building blocks of proteins. They often combine with other domains in a mosaic manner giving rise to multi-domain proteins with new or modified functions ('Domain shuffling').

What is a Homologous Superfamily in CATH-Gene3D?

This level of the CATH hierarchy groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous. Similarities are identified either by high sequence identity or structure comparison using SSAP. Structures are clustered into the same homologous superfamily if they satisfy two or more of the following criteria:

  1. Sequence identity >= 35%, overlap >= 60% of larger structure equivalent to smaller.

  2. SSAP score >= 80.0, sequence identity >= 20%, overlap 60% of larger structure equivalent to smaller.

  3. SSAP score >= 70.0, overlap 60% of larger structure equivalent to smaller; domains which have related functions, which is informed by the literature and Pfam protein family database. Significant similarity from HMM-sequence searches and HMM-HMM comparisons using SAM, HMMER and PRC.

What is a FunFam in CATH-Gene3D?

FunFams or Functional Families in CATH-Gene3D represent functionally coherent grouping of protein domain sequences within the CATH-Gene3D homologous superfamily. The FunFams have been generated using the new automated functional classification method, FUnFHMMer. For details, read more about our functional classification protocol.

Why do I get 2 hits to the same CATH superfamily in my query sequence belonging to different families?

A query protein sequence can often have multiple hits to different, albeit related, functional families within a single CATH superfamily. For example, the yeast Pyruvate decarboxylase (Uniprot: P06169) is a TPP-dependant enzyme which consists of three domains: a pyrimidine (Pyr) binding domain, a transhydrogenase dIII - (TH3) domain and a pyrophosphate (PP) binding domain, where the PP and the PYR domains are known to be evolutionarily related (Dalby et. al., 2008). The yeast pyruvate decarboxylase shows matches to three superfamilies in CDD - TPP_enzyme_PYR superfamily (cl11410), TPP_enzyme_M superfamily (cl22435) and TPP_enzymes (cl01629). In contrast, the FunFHMMer webserver matches two hits in the CATH superfamily 3.40.50.970 (with different FunFam matches) and one hit to the CATH superfamily 3.40.50.1220. This is because the PP and PYR domains are homologous domains which result from a gene duplication during evolution have been classified into the same CATH superfamily - a relationship confirmed by structural data.

Why dont I get any matches for my query sequence in the FunFHMMer web server?

The absence of annotations provided by our FunFHMMer server is most likely due to one of the following reasons:

  1. Annotations can only be provided for families which have one or more known structures classified in CATH.

  2. Hits are only reported if the sequence match is within the inclusion threshold for the FunFam matched. This is a much stricter criterion than used by many other resources but results in greater precision by preventing mis-annotations caused by 'over-prediction'. We have chosen to be conservative and focus on higher precision rather than greater coverage.

FASTA scan submit

URL:

POST /search/by_funfhmmer

Description:

Submits your query protein sequence. This will be scanned against a library of structural domains and Functional Families in CATH using HMMER3.

Input:

NameTypeDescription
fasta String

Sequence of the query protein (in FASTA format)

Example:

>tr|G4VGF5|G4VGF5_SCHMA
MCSHYAQRNNFSCGGYGFIDFVSEDAANEALQQIKETHPSFTIKFAKENEKDKTNLYVTN
LPRTWTTKDSDQLKAVFERFGHIQSAFVMMERLTNKTTGVGFVRFVNEQDAVNALESLKL
HPLTLPDCSVPVEAKFADKHNPDTRRRRYPVTATTAAAAAAAAAAAAASATMIANVNYNN
LLNCPLYTAPNGLTLTSHDALASLLNTGLVSPSIVNSQLANFSALQQKSTTDFTSRFNSE							

Output:

NameTypeDescription
task_id String

A unique Task ID that can be used to check the progress and retrieve results of this scan

Example:

58542dcb6fc895dfb7c8f76b4d63cb72							

Example return ('application/json'):

{ "task_id": "58542dcb6fc895dfb7c8f76b4d63cb72" }		

Example usage:
$ curl -w "\n" -s -X POST -H 'Accept: application/json' --data-binary '@/path/to/file.fasta' https://www.cathdb.info/search/by_funfhmmer
Important: data in the file /path/to/file.fasta needs to be in the form 'name=value'.
Example:
fasta=>tr|G4VGF5|G4VGF5_SCHMA
MCSHYAQRNNFSCGGYGFIDFVSEDAANEALQQIKETHPSFTIKFAKENEKDKTNLYVTN
LPRTWTTKDSDQLKAVFERFGHIQSAFVMMERLTNKTTGVGFVRFVNEQDAVNALESLKL
HPLTLPDCSVPVEAKFADKHNPDTRRRRYPVTATTAAAAAAAAAAAAASATMIANVNYNN
LLNCPLYTAPNGLTLTSHDALASLLNTGLVSPSIVNSQLANFSALQQKSTTDFTSRFNSE
#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->default_header( 'Accept' => 'application/json' );

my $url = 'https://www.cathdb.info/search/by_funfhmmer';
my %data = ();
$data{fasta} = <<'_PARAM';
>tr|G4VGF5|G4VGF5_SCHMA
MCSHYAQRNNFSCGGYGFIDFVSEDAANEALQQIKETHPSFTIKFAKENEKDKTNLYVTN
LPRTWTTKDSDQLKAVFERFGHIQSAFVMMERLTNKTTGVGFVRFVNEQDAVNALESLKL
HPLTLPDCSVPVEAKFADKHNPDTRRRRYPVTATTAAAAAAAAAAAAASATMIANVNYNN
LLNCPLYTAPNGLTLTSHDALASLLNTGLVSPSIVNSQLANFSALQQKSTTDFTSRFNSE
_PARAM



my $response = $ua->post( $url , \%data );

if ( $response->is_success ) {
	print $response->decoded_content;
}
else {
	die $response->status_line;
}
# todo

FASTA scan check progress

URL:

GET /search/by_funfhmmer/check/:task_id

Description:

Check the progress of your sequence scan.

Input:

NameTypeDescription
task_id String

Task ID of the scan

Example:

58542dcb6fc895dfb7c8f76b4d63cb72							

Output:

NameTypeDescription
success Boolean

Whether the scan has finished successfully

message String

Information about the current progress of the scan

data Object

Details about the scan

Example return ('application/json'):

{
   "data" : {
      "worker_hostname" : "bsmlx53",
      "status" : "done",
      "id" : "58542dcb6fc895dfb7c8f76b4d63cb72",
      "date_completed" : "2015-01-29T13:10:22",
      "date_started" : "2015-01-29T13:09:49"
   },
   "message" : "done",
   "success" : 1
}		

Example usage:
$ curl -w "\n" -s -X GET -H 'Accept: application/json' https://www.cathdb.info/search/by_funfhmmer/check/58542dcb6fc895dfb7c8f76b4d63cb72
#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->default_header( 'Accept' => 'application/json' );

my $url = 'https://www.cathdb.info/search/by_funfhmmer/check/58542dcb6fc895dfb7c8f76b4d63cb72';


my $response = $ua->get( $url );

if ( $response->is_success ) {
	print $response->decoded_content;
}
else {
	die $response->status_line;
}
# todo

FASTA scan retrieve results

URL:

GET /search/by_funfhmmer/results/:task_id

Description:

Retrieve the results of your sequence scan

Input:

NameTypeDescription
task_id String

Task ID of the scan

Example:

58542dcb6fc895dfb7c8f76b4d63cb72							

Output:

NameTypeDescription
query_fasta String

Original protein sequence that was used for the scan

cath_version String

Version of CATH that was used for the scan

signatures_by_id Object

The regions of the query sequence that match structural domains or Functional Families (FunFams) in CATH

Example return ('application/json'):

{
   "query_fasta" :  "$original_sequence",
   "cath_version" : "v4_1_0",
   "signatures_by_id" : {
      ${query_id} : {
         "id" :     "tr|G4VGF5|G4VGF5_SCHMA"
         "label" :  "tr|G4VGF5|G4VGF5_SCHMA",
         "length" : 587,
         "matches" : [
            {
               "id" :          "3.30.70.330/FF/43574",
               "evalue" :      7.1e-19,
               "description" : ${match_name},
               "data" : {
               	 ...
               },
               "regions" : [
                  {
                     "start" : 15,
                     "end" :   54,
                     "data" : {
                        "evalue" : "0.026",
                        "length" : 45,
                        "hit_string" :      "GVgFirfekreeaeeaikalngktlegasepltvkfaeepskkkk",
                        "query_string" :    "GYGFIDFVSEDAANEALQQIKETHPS-----FTIKFAKENEKDKT",
                        "homology_string" : "G gFi f + + a+ea +++  ++++      t+kfa+e++k k+"
                     }
                  },
                  ...
               ],
            }
         ],
      }
   }
}		

Example usage:
$ curl -w "\n" -s -X GET -H 'Accept: application/json' https://www.cathdb.info/search/by_funfhmmer/results/58542dcb6fc895dfb7c8f76b4d63cb72
#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->default_header( 'Accept' => 'application/json' );

my $url = 'https://www.cathdb.info/search/by_funfhmmer/results/58542dcb6fc895dfb7c8f76b4d63cb72';


my $response = $ua->get( $url );

if ( $response->is_success ) {
	print $response->decoded_content;
}
else {
	die $response->status_line;
}
# todo
CATH-Gene3D is a Global Biodata Core Resource Learn more...