CATH Data

This page provides information on the data files that are available to download from the CATH FTP site:

ftp://ftp.biochem.ucl.ac.uk

For further information on these data files can be found in README.txt on the FTP site.

For information on the statistics from specific releases, see release notes.

Data related to the CATH classification

File name Description
cath-chain-list-<version>.txt Lists all of the PDB chain IDs in CATH, whether they are chopped into domains or not.
cath-domain-boundaries-*-<version>.txt Description of domain and segment boundaries for domains classified into CATH.
cath-domain-description-file-<version>.txt Description of each protein domain in CATH
cath-domain-list-<S35%|S60|S95|S100|all>-<version>.txt Lists of domains classified into CATH
cath-domain-pdb-*-<version>.txt Description of each domain PDB classified into CATH
cath-names-<version>.txt Name description of each node in the CATH hierarchy, along with an example domain
cath-superfamily-list-<version>.txt List of all the superfamilies in the CATH hierarchy
cath-unclassified-list-<version>.txt List of all unclassified protein chains and domains that are still being processed

Data related to non-redundant data sets

File name Description
cath-dataset-nonredundant-S[20|40]-v4_1_0.atom.fa The ATOM sequences of the domains in the dataset (which only contain residues that have ATOM records in the PDB file)
cath-dataset-nonredundant-S[20|40]-v4_1_0.fa The sequences of the domains in the dataset
cath-dataset-nonredundant-S[20|40]-v4_1_0.list A list of the domains in the dataset; one domain ID per line
cath-dataset-nonredundant-S[20|40]-v4_1_0.pdb.tgz (A gzipped tar file containing) the PDB files of the domains in the data set

Data related to sequence data

File name Description
cath-domain-seqs-*-<version>.fa Sequences for each CATH domain
cath-S35-<version>-hmm3.lib.gz HMMs for each CATH representative domain from the sequence clusters at 35% sequence identity
funfam-hmm3-<version>.lib.gz HMMs for each functional family (FunFam)
cath-superfamily-seqs-<superfamily>-<version>.fa Sequences for each CATH superfamily in FASTA format