Genome to Function 3. Import VCF Files & Filter Variant Features

(A) Preparation - Download the following files by clicking on these links.

(B) Open a locus view for a gene and its linked CDS/Protein view

  1. Import ‘ENSG00000113924’ into Jalview from the Ensembl database via FileFetch Sequences ⇒ ENSEMBL. Transcripts are shown aligned to the reference genome locus.

  2. Open the Protein-CDS split-screen view by selecting CalculateGet Cross-ReferencesUNIPROT.

  3. To save screen space, hide the annotation rows by un-ticking the option Show annotations in the Annotations menu. Do this for the protein and CDS panels.

(C) Overlay DNA alignment’s features on the protein sequences

  1. Open the Sequence Feature dialog box by clicking View ⇒ Feature Settings. Select the Protein tab, then tick the option Show CDS Features (below the colour transparency slider).

  2. Return to the CDS tab (centre top), in the Show column deselect the exon option, and select OK. This leaves only the sequence_variant selected.

(D) Import variants from VCF files

  1. In the upper DNA panel of the CDS/Protein alignment that is entitled ‘retrieved from Ensembl’, select File ⇒ Load VCF File.

  2. Use the file browser to locate the ENSG00000113924.hg19.vcf.gz.tbi file downloaded earlier. Load the file by selecting Open. This can take a few seconds. The alignment status bar reports how many variants are added.

  3. If the alignment isn’t coloured, go to View ⇒ Show Sequence Features, this toggles the features on the alignment on and off.

  4. Select View ⇒ Feature Settings to open the Sequence Feature Settings dialog box.

  5. A VCF group is listed in the upper region of the dialog box in the CDS tab. Deselects the other database groups dbSNP, ensemble_havana, HGMD-PUBLIC, havana, that are listed, but leave the VCF group ticked.

  6. In the Feature Type column in the Sequence Feature Settings dialog box, right-click the mouse on the sequence_variant name. In the context menu that opens select the hide all columns that do not contain a variant option.

  7. Adjust the Colour transparency slider so that the features are almost transparent.

  8. Click OK to close the Sequence Feature Settings dialog box.

  9. Open a Feature details table, by placing the mouse cursor on a variant on the alignment, right-click the mouse and select Feature detailsthe names of the feature.

Question: How do the VCF features differ from the previous features table?

(E) Select Variants from VCF files using Filters

  1. Place the mouse cursor on a blue triangle in the alignment ruler and right-click the mouse, in the context menu that opens select ⇒ Reveal All to undo the hidden column effect.

  2. Re-open the Sequence Feature Setting dialog box and select the CDS tab, in the Configuration column click the blank sequence_variant cell. This opens the Display settings for sequence_variant features dialog box.

    • In the Filters panel (lower third), in the Label drop-down menu select AF_fin (allele frequency in Finnish population)

    • In the adjacent drop-down menu select greater than > option

    • In the next cell enter the number 0.4

    • Click OK

  3. Scroll across the alignment, place the mouse cursor on a coloured variant, right click to open the context menu and select Feature detailsthe names of the feature.

  4. Scroll down the Feature details table to the AF_Fin entry and view the associated number.

(F) Select Features using Filters

  1. Open the Sequence Feature Settings dialog box and select the CDS tab, select to enable the dbSNP and HGMD-PUBLIC databases.

  2. In the Configuration column, click on the sequence_variant cell. This re-opens the Display setting dialog box.

    • In the Filters panel, remove the previous filters using the cross

    • Click on the Label drop-down menu and select consequence_type

    • Select Contains option from the adjacent drop-down menu

    • In the blank cell enter the text stop_gained (use underscore)

    • Click OK

  3. View the coloured variants in the alignment. Only those variants that have a stop gain variant are highlighted on the alignment.

(G) Explore human population variants in the context of orthologs and 3D structure

  1. In the protein panel, select the sequence name of ENSP00000283871 and right-click the mouse.

  2. This opens a context menu and select 3D Structure Data.

  3. In the Structure Chooser dialog box select the structure 1ey2, then select New View to open the 3D structure.

  4. Examine the structural locations for the stop-gained feature. (If the sequence is green, go to Sequence Feature dialog box and in the protein tab untick the Show box for RESNUM and click OK.)