UniProtKB query fields

Supported query fields for searching specific data in UniProtKB (see also query syntax).

rest.uniprot.org field rest.uniprot.org example Description
accession accession:P62988 The old behaviour was to list all entries with primary or secondary accession number P62988. The new behaviour will list all primary / canonical isoform accessions P62988. To search over secondary accessions, we have introduced the sec_acc field.
active active:false Lists all obsolete entries.
Refer to the page: Sequence Annotations Lists all entries with:
  1. any general annotation (comments [CC])
  2. any sequence annotation (features [FT])
  3. at least one amino acid modified with a Pyrrolidone carboxylic acid group
lit_author lit_author:ashburner Lists all entries with at least one reference co-authored by Michael Ashburner.
protein_name protein_name:CD233 Lists all entries whose cluster of differentiation number is CD233 (see cdlist.txt).
chebi chebi:18420 Lists all entries which are associated with the small molecule corresponding to ChEBI identifier 18420, Mg(2+) (see How can I search UniProt for chemical or reaction data?).
uniprot_id (/uniref), then uniref_cluster_90 (/uniprotkb)
  1. uniprot_id:A5YMT3 to find cluster UniRef90_P00395
  2. uniref_cluster_90:UniRef90_P00395
Find all entries in the UniRef 90% identity cluster whose representative sequence is UniProtKB entry A5YMT3 (about UniRef).
xrefcount_pdb (or xref_count) xref_count_pdb:[20 TO *] Lists all entries with 20 or more cross-references to PDB
date_created date_created:[2012-10-01 TO *] Lists all entries created since October 1st 2012.
database, xref
  1. database:pfam
  2. xref:pdb-1aut
Lists all entries with:
  1. a cross-reference to the Pfam database
  2. a cross-reference to the PDB database entry 1aut
(see Databases cross-referenced in UniProtKB and Database mapping)
ec ec:3.2.1.23 Lists all beta-galactosidases (Enzyme nomenclature database).
Refer to the pages: Comments or Sequence Annotations Lists all entries with:
  1. a signal sequence whose positions have been experimentally proven
  2. experimentally proven phosphoserine sites
  3. a function manually asserted according to rules
(see Evidence attribution)
existence existence:3 See Protein existence criteria.
family family:serpin Lists all entries belonging to the Serpin family of proteins (Index of protein domains and families).
fragment fragment:true Lists all entries with an incomplete sequence.
gene gene:HPSE Lists all entries for proteins encoded by gene HPSE, but also by HPSE2.
gene_exact gene_exact:HPSE Lists all entries for proteins encoded by gene HPSE, but excluding variations like HPSE2 or HPSE_0.
go go:0015629) Lists all entries associated with the GO term Actin cytoskeleton and any subclasses
virus_host_name, virus_host_id virus_host_id:10090 Lists all entries for viruses infecting Mus musculus (Mouse)
accession_id accession_id:P00750 Returns the entry with the primary accession number P00750.
inchikey inchikey:WQZGKKKJIJFFOK-GASJEMHNSA-N Returns entries associated with the small molecule identified by the InChIKey WQZGKKKJIJFFOK-GASJEMHNSA-N, i.e. D-glucopyranose (see How can I search UniProt for chemical or reaction data?). To get the CHEBI identifier for an Inchikey value, one can now use the advanced search builder.
protein_name protein_name:Anakinra Lists all entries whose protein name includes the "International Nonproprietary Name" is Anakinra.
interactor interactor:P00520 Lists all entries describing interactions with the protein described by entry P00520.
keyword
  1. keyword:toxin
  2. keyword:KW-0800
  1. Lists all entries associated with a keyword matching "Toxin" in its name or description (UniProtKB Keywords).
  2. Lists all entries associated with the UniProtKB keyword Toxin.
length length:[500 TO 700] Lists all entries describing sequences of length between 500 and 700 residues.
mass mass:[500000 TO *] Lists all entries describing sequences with a mass of at least 500,000 Da.
cc_mass_spectrometry cc_mass_spectrometry:maldi Lists all entries for proteins identified by: matrix-assisted laser desorption/ionization (MALDI), crystallography (X-Ray). The method field searches names of physico-chemical identification methods in the 'Biophysicochemical properties' subsection of the 'Function' section, the 'Publications' and 'Cross-references' sections.
date_modified modified:[2012-01-01 TO 2019-03-01] AND active:true Lists all active entries that were last modified between January and March 2019.
protein_name protein_name:"prion protein" Lists all entries for prion proteins.
organelle organelle:Mitochondrion Lists all entries for proteins encoded by a gene of the mitochondrial chromosome.
organism_name, organism_id
  1. organism_name:"Ovis aries"
  2. organism_id:9940
  3. organism_name:sheep
Lists all entries for proteins expressed in sheep (first 2 examples) and organisms whose name contains the term "sheep" (UniProt taxonomy).
plasmid plasmid:ColE1 Lists all entries for proteins encoded by a gene of plasmid ColE1 (Controlled vocabulary of plasmids).
proteome proteome:UP000005640 Lists all entries from the human proteome.
proteomecomponent proteomecomponent:"chromosome 1" AND organism_id:9606 Lists all entries from the human chromosome 1.
sec_acc sec_acc:P02023 Lists all entries that were created from a merge with entry P02023 (see FAQ).
reviewed reviewed:true Lists all UniProtKB/Swiss-Prot entries (about UniProtKB).
scope scope:mutagenesis Lists all entries containing a reference that was used to gather information about mutagenesis (Entry view: "Cited for", See 'Publications' section of the user manual).
sec_acc sec_acc:P62988 Lists all entries containing a secondary accession P62988.
sequence accession:P05067-9 AND is_isoform:true Lists all entries containing a link to isoform 9 of the sequence described in entry P05067. Allows searching by specific sequence identifier.
date_sequence_modified
  1. date_sequence_modified:[2012-01-01 TO 2012-03-01]
  2. date_sequence_modified:[2012-01-01 TO 2012-03-01]
  1. Lists all entries whose sequences were last modified between January and March 2012.
  2. Lists all UniProtKB/Swiss-Prot entries whose sequences were modified after the start of 2012.
strain strain:wistar Lists all entries containing a reference relevant to strain wistar (Lists of strains in reference comments and Taxonomy help: organism strains).
taxonomy_name, taxonomy_id
  1. taxonomy_name:mammal
  2. taxonomy_id:40674
Lists all entries for proteins expressed in Mammals. This field is used to retrieve entries for all organisms classified below a given taxonomic node (taxonomy classification).
tissue tissue:liver Lists all entries containing a reference describing the protein sequence obtained from a clone isolated from liver (Controlled vocabulary of tissues).
cc_webresource cc_webresource:wikipedia Lists all entries for proteins that are described in Wikipedia.