Superfamily Naming exercise (Last updated in Sept 2023)

Useful websites: https://www.cathdb.info/ http://sfam.cathdb.info/

Part I: Steps followed for naming a superfamily

  • Look through representative domains as: ‘domain only’ to understand common secondary structures; as ‘domain in chain’ to observe the location of the domain in the chain; as ‘domain in PDB’ to understand the domain’s function and location in the protein.
  • Check through FunFams/SwissProt/Keywords and refer to the most abundant name when naming.
  • Check through enzymes (EC number if available), GO terms and species to get a rough idea of domain function.
  • Refer to Pfam and InterPro entries for general idea of protein domain function and/or structure.
  • Check through papers associated with PDB entry for better understanding of protein and protein domain structure and/or function .
  • In ‘Description section’, provide an overview of structure and function. In larger superfamilies, you may have to refer to specific PDB IDs.
  • Check references are correct: [InterPro:] [Pfam:] [PMID:] [DOI:]
  • Check other names in the database, either to avoid duplicate names or to identify potential cross-hits
  • Check names of other domains in the same chain to keep the name similar.

Part II: General observations and tips

Dos

  • Check other names in CATH to not make duplicates (i.e. make sure the assigned name is unique)
  • Make superfamily names consistent with other domains of same protein
  • Start with smaller families until you get the hang of it
  • For larger superfamily- it is a good idea to check FunFam
  • When looking at a protein on InterPro, see if there are other domains that don’t have a name yet on the same protein - it will be easy to name that one
  • Work in groups for larger superfamilies
  • Choose superfamily entries with FunFams, Pfams, or InterPro associated

Don’ts

  • Make description without sourcing references
  • Make description without actually really understanding it
  • Spend 3 hours on a very small superfamily
  • Look at every single PDB for big superfamilies
  • For smaller representative domains, don’t put too much confidence in InterPro/Pfam - it may be better to look at PDB paper for the specific domain
  • Assume it is the exact same domain if it has good mapping to Pfam
  • Choose a superfamily entry with no annotation or too many annotation

(Last updated in September 2023, Written by summer interns since 2020-2023 (Barbara, Oliver, Natalie, Charling, Ruiqi, Lorna, Katie, Charlotte, Hazuki) and CATH curators (Vaishali Waman, Ian Sillitoe)

CATH-Gene3D is a Global Biodata Core Resource Learn more...