Importing Genomic Variants from VCF

Jalview can annotate nucleotide sequences associated with genomic loci with features representing variants imported from VCF files. This new feature in Jalview 2.11, is currently tuned to work best with tab indexed VCF files produced by the GATK Variant Annotation Pipeline (with or without annotation provided by the Ensembl Variant Effect Predictor), but other sources of VCF files should also work.

If your sequences have genomic loci, then a Taxon name and chromosome location should be shown in the Sequence Details report and the Sequence ID tooltip (providing you have enabled it via the submenu in the View menu). Jalview matches the assembly information provided in the VCF file to the taxon name, using an internal lookup table. If a match is found, Jalview employs the Ensembl API's lift-over services to locate your sequences' loci in the VCF file assembly's reference frame. If all goes well, after loading a VCF, Jalview will report the number of variants added as sequence features via the alignment window's status bar. These are added by default when loci are retrieved from Ensembl.

Standard Variant Attributes

Jalview decorates variant features imported from VCF files with attributes that can be used to filter or shade variant annotation including the following:

Standard attributes were introduced in Jalview 2.11.1.0. VCF field semantics are highly dependent on the source of your VCF file. See https://www.internationalgenome.org/wiki/Analysis/vcf4.0 for more information.

Working with variants without CSQ fields

Jalview 2.11.1's new virtual features mean that peptide sequences are no longer annotated directly with protein missense variants. This makes it harder to filter variants when they do not already include the CSQ field. You can rescue the pre-2.11.1 functionality by:

  1. Download the script at https://www.jalview.org/examples/groovy/ComputePeptideVariants.groovy
  2. Executing the script via the Groovy Console on a linked CDS/Protein view to create missense and synonymous peptide variant features.

Working with variants from organisms other than H.sapiens.

  1. Look in your VCF file to identify keywords in the ##reference header that define what species and assembly name the VCF was generated against.
  2. Look at ensembl.org to identify the species' short name, and the assembly's unique id.
  3. Add mappings to the VCF_SPECIES and VCF_ASSEMBLY properties in your .jalview_properties file. For example:
    VCF_SPECIES=1000genomes=homo_sapiens,c_elegans=celegans
    VCF_ASSEMBLY=assembly19=GRCh37,hs37=GRCh37


    These allow annotations to be mapped from both Human 1000genomes VCF files and C.elegans files.
Work in Progress!

VCF support in Jalview is under active development. Please get in touch via our discussion forum if you have any questions, problems or otherwise find it useful !