CATH Documentation
Table of Contents
CATH Data Downloads
This page provides information on the data files that are available to download from the CATH FTP site.
See CATH Releases for more information on CATH and CATH-Plus.
CATH (daily snapshot)
ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/daily-release/newest/
File name | Description |
---|---|
cath-b-newest-all.gz | List the latest domain boundaries and superfamily (C.A.T.H) annotations for all CATH domains |
cath-b-newest-names.gz | Provides the names for each node in the CATH hierarchy |
cath-b-newest-latest-release.gz | List the latest domain boundaries and superfamily annotations for CATH domains in the most recent release of CATH-Plus |
cath-b-newest-putative.gz | List the latest domain boundaries and superfamily annotations for CATH domains released since the most release release of CATH-Plus |
cath-b-s35-newest.gz | List the latest domain boundaries and sequence family (C.A.T.H.S) annotations for all non-redundant sequence representatives |
CATH-Plus (full release)
ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/
For information on the statistics for specific releases, see release notes.
CATH classification data
ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/cath-classification-data/
File name | Description |
---|---|
cath-chain-list-<version>.txt | Lists all of the PDB chain IDs in CATH, whether they are chopped into domains or not. |
cath-domain-boundaries-*-<version>.txt | Description of domain and segment boundaries for domains classified into CATH. |
cath-domain-description-file-<version>.txt | Description of each protein domain in CATH |
cath-domain-list-<S35%|S60|S95|S100|all>-<version>.txt | Lists of domains classified into CATH |
cath-domain-pdb-*-<version>.txt | Description of each domain PDB classified into CATH |
cath-names-<version>.txt | Name description of each node in the CATH hierarchy, along with an example domain |
cath-superfamily-list-<version>.txt | List of all the superfamilies in the CATH hierarchy |
cath-unclassified-list-<version>.txt | List of all unclassified protein chains and domains that are still being processed |
Non-redundant data sets
ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/non-redundant-data-sets/
File name | Description |
---|---|
cath-dataset-nonredundant-S[20|40].atom.fa | The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file) |
cath-dataset-nonredundant-S[20|40].fa | The sequences of the domains in the dataset |
cath-dataset-nonredundant-S[20|40].list | A list of the domains in the dataset; one domain ID per line |
cath-dataset-nonredundant-S[20|40].pdb.tgz | (A gzipped tar file containing) the PDB files of the domains in the data set |
Sequence data
ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/latest-release/sequence-data/
File name | Description |
---|---|
cath-domain-seqs-*-<version>.fa | Sequences for each CATH domain |
cath-S35-<version>-hmm3.lib.gz | HMMs for each CATH representative domain from the sequence clusters at 35% sequence identity |
funfam-hmm3-<version>.lib.gz | HMMs for each functional family (FunFam) |
cath-superfamily-seqs-<superfamily>-<version>.fa | Sequences for each CATH superfamily in FASTA format |