Find the structure of your protein
Protein structure is nearly always more conserved than sequence. As a result, when two proteins share a significant sequence similarity, it is extremely likely they will also share similar 3D structure.
The following instructions demonstrate how to find significant CATH structural domain matches on your own protein sequence.
- Go to "search by sequence"
- Paste your protein sequence into the text box
- Click "Search"
- Wait for your results to finish (should take less than 1 minute)
- The results show the positions of the most similar CATH domain structures matching your protein sequence
- Each result has links to find out more about...
- the particular matching CATH domain
- the superfamily to which the matching domain belongs
Investigate how domains have evolved
CATH-Gene3D provides information on the evolutionary relationships of protein domains through sequence, structure and functional annotation data.
The homologous superfamily (H) level of the CATH hierarchical classification groups domains that are related by evolution (find out more about the classification process).
A relatively small number of CATH superfamilies have developed high levels of sequence, structural and functional diversity during evolution. Despite high levels of structural diversity, these superfamily members are still observed to share a conserved structural core.
The following are considered in the analysis of domain evolution:
- Sequence diversity
- Superfamily relatives are clustered at different degrees of domain sequence similarity: 35, 60, 95 and 100%. Superfamilies are typically described as being diverse in sequence if they have multiple clusters at a sequence identity of 35%. The Classification / Domains tab on the CATH superfamily web pages provides an interactive diagram for exploring all the different types of sequence clusters.
- Structural diversity
- Structural diversity in a superfamily can be measured by looking at the number of structurally-similar groups (or SSGs) that it contains. The Superfamily Superpositions tab on the CATH-Gene3D superfamily web pages demonstrate that even structurally diverse superfamilies have a common structural core.
- Functional diversity
- Superfamily relatives with similar sequence properties are clustered into functional families (or FunFams). The FunFams are useful in the prediction of function and in providing information on the evolution of function.
Investigate protein function
Proteins are typically composed of one or more building blocks, called domains. Protein domains are generally considered as independently-folding units of structure.
Within a superfamily, CATH-Gene3D creates Functional Families (FunFams) that aim to group together domains that share the same function. Therefore, if a region of protein sequence provides a highly significant match to a particular CATH-Gene3D FunFam, then there is a good chance they shares a similar function.
To test this assertion, our function prediction pipeline (based on these FunFams) was submitted to the Critical Assessment of protein Function Annotation (CAFA). Our method was ranked highly for accuracy of function prediction according to a number of different scoring methods.
The following instructions demonstrate how to find function annotations for your own protein sequence.
Go the "search by sequence" page.
- Paste your protein sequence into the text box
- Click "Search"
- Wait for your results to finish
The output of the web server provides:
- the best matches to CATH-Gene3D domain superfamilies and FunFam IDs for the query sequence
- The Enzyme Commission (EC) and Gene Ontology (GO) annotations for the matching FunFam(s).
Investigate conserved sites
Protein domain superfamilies in CATH-Gene3D have been subclassified into functional families (or FunFams), which are groups of protein sequences and structures with a high probability of sharing the same function(s). Therefore, the functionally important residues in a family are also expected to be highly conserved.
Information on conserved positions in CATH-Gene3D FunFam alignments is shown through the Alignment tab of the FunFam webpages. Conservation scores have been calculated using Scorecons and columns in the alignment are coloured using a rainbow colour scheme, where the highly conserved residues are shown in red through to positions that are not conserved at all, shown in blue. The conservation scores are also mapped onto a representative protein domain structure.
To investigate putative conserved sites for your protein sequence, run a sequence search against the FunFams and click on the FunFam match Alignment page.
What is CATH-B?
We aim to provide official releases of the CATH classification every 12 months. This release process is important because is allows us to provide internal validation, extra annotations and analysis. However, it can mean that there is a time delay between new structures appearing in the PDB and the latest official CATH release
In order to address this issue: CATH-B provides a limited amount of information for the very latest domain annotations (e.g. domain boundaries and superfamily classifications). This information is called CATH-B
We are currently working on generating the CATH-Plus database for v4.2 which comprises all the extra derived data from the classification data. This includes: incorporation of the latest Gene3D sequence and functional annotation data; updating the Functional Families (FunFams); creating new superfamily superpositions; producing structural clusters for each superfamily. We will update the web pages when this data is ready.
Find out what 3D structure your protein adopts
Learn about a particular protein family and how it evolved
Investigate the function of your protein
Look at protein sites that are highly conserved and implicated in function
Download data files and query CATH via webservices
Find out how CATH is created and maintained, how to link to CATH and more
What is CATH-Gene3D?
CATH is a classification of protein structures downloaded from the Protein Data Bank. We group protein domains into superfamilies when there is sufficient evidence they have diverged from a common ancestor.
Gene3D uses the information in CATH to predict the locations of structural domains on millions of protein sequences available in public databases. This allows us to include additional annotations to the CATH-Gene3D database such as functional information and active site residues.
Latest Release Statistics Info
|CATH-Plus 4.2.0||CATH (daily snapshot)|
|CATH Domain Predictions||95,665,487|
Citing this resource
If you find the information in this resource useful, please consider using the following citations:
The CATH and Gene3D resources have enjoyed generous funding from a number of research councils.