Sequence Features File

The Sequence features File provides a simple way of getting your own sequence features into Jalview. It also allows feature display styles and filters to be saved and imported to another alignment. Users familiar with the earliest versions of Jalview will know that features files were originally termed 'groups' files, and that the format was was designed as a space efficient format to allow sequence features to be rendered in the Jalview applet.

Features files are imported into Jalview in the following ways:

Sequence Features File Format

A features file is a simple ASCII text file, where each line contains tab separated text fields. No comments are allowed. Its structure consists of three blocks:

Feature Colours

The first set of lines contain feature type definitions and their colours:

<Feature Type>	<Feature Style>

Each feature type definition assigns a style to features of the given type. <Feature Style> can be either a simple colour, or a more complex Graduated Colour Scheme that shades features according to their description, score, or other attributes.

Assigning a colour for a <Feature Type>
A single colour specified as either a red,green,blue 24 bit triplet in hexadecimal (eg. 00ff00) or as comma separated numbers (ranging from 0 to 255))
(For help with colour values, see https://www.w3schools.com/colors/colors_converter.asp.)

Specifying a Graduated Colourscheme
Data dependent feature colourschemes are defined by a series of "|" separated fields:

[label or score or attribute|<attName>|]<mincolor>|<maxcolor>|[absolute|]<minvalue>|<maxvalue>[|<novalue>][|<thresholdtype>|[<threshold value>]]

The fields are interpreted follows:

Feature Filters

This section is optional, and allows one or more filters to be defined for each feature type.
Only features that satisfy the filter conditions will be displayed.
Begin with a line which is just STARTFILTERS, and end with a line which is just ENDFILTERS.
Each line has the format:

featureType <tab> (filtercondition1) [and|or] (filtercondition2) [and|or]...
The parentheses are not needed if there is only one condition. Combine multiple conditions with either and or or (but not a mixture).
Each condition is written as:
Label or Score or AttributeName condition [value]
where either the label (description), (numeric) score, or (text or numeric) attribute is tested against the condition.
condition is not case sensitive, and should be one of A non-numeric value always fails a numeric test.
If either attribute name, or value to compare, contains spaces, then enclose in single quotes: 'mutagenesis site' contains 'decreased affinity'
Tip: some examples of filter syntax are given below; or to see more, first configure colours and filters in Jalview, then File | Export Features to Textbox in Jalview Format.
Feature filters were added in Jalview 2.11

Feature Definitions

The remaining lines in the file are sequence feature data. Features are either non-positional - attached to a whole sequence (as specified by its ID), or positional, so attached to a specific range on a sequence. In addition to a type, features can also include descriptive text and a score, and depending on the format used, many additional attributes.

Importing Generalised Feature Format (GFF) feature data

Jalview has its own tabular format (described below) for describing sequence features, which allows HTML descriptions (including URLs) to be defined for each feature. However, sequence feature definitions can also be provided in GFF2 (http://gmod.org/wiki/GFF2) or GFF3 (http://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md) format. To do this, a line containing only 'GFF' should precede any GFF data (this mixed format capability was added in Jalview 2.6).

Feature attributes can be included as name=value pairs in GFF3 column 9, including (since Jalview 2.11.1.0) 'nested' sub-attributes, for example:
alleles=G,A,C;AF=6;CSQ=SIFT=deleterious,tolerated,PolyPhen=possibly_damaging(0.907)
where SIFT and PolyPhen are sub-attributes of CSQ. This data is preserved if features are exported in GFF format (but not, currently, in Jalview format).

Jalview's sequence feature format

Each feature is specified as a tab-separated series of columns as defined below:

description	sequenceId	sequenceIndex	start	end	featureType	score (optional)
This format allows two alternate ways of referring to a sequence, either by its text ID, or its index (base 0) in an associated alignment. Normally, sequence features are associated with sequences rather than alignments, and the sequenceIndex field is given as "-1". In order to specify a sequence by its index in a particular alignment, the sequenceId should be given as "ID_NOT_SPECIFIED", otherwise the sequenceId field will be used in preference to the sequenceIndex field.

The description may contain simple HTML document body tags if enclosed by "<html></html>" and these will be rendered as formatted tooltips in the Jalview Application (the Jalview applet is not capable of rendering HTML tooltips, so all formatting tags will be removed).
Attaching Links to Sequence Features
Any anchor tags in an html formatted description line will be translated into URL links. A link symbol will be displayed adjacent to any feature which includes links, and these are made available from the links submenu of the popup menu which is obtained by right-clicking when a link symbol is displayed in the tooltip.
Non-positional features
Specify the start and end for a feature to be 0 in order to attach it to the whole sequence. Non-positional features are shown in a tooltip when the mouse hovers over the sequence ID panel, and any embedded links can be accessed from the popup menu.
Scores
Scores can be associated with sequence features, and used to sort sequences or shade the alignment (this was added in Jalview 2.5). The score field is optional, and malformed scores will be ignored.

Feature annotations can be collected into named groups by prefixing definitions with lines of the form:

startgroup	groupname
.. and subsequently post-fixing the group with:
endgroup	groupname
Feature grouping was introduced in version 2.08, and used to control whether a set of features are either hidden or shown together in the sequence Feature settings dialog box.

A complete example is shown below :

domain	red
metal ion-binding site	00ff00
transit peptide	0,105,215
chain	225,105,0
modified residue	105,225,35
signal peptide	0,155,165
helix	ff0000
strand	00ff00
coil	cccccc
kdHydrophobicity	ccffcc|333300|-3.9|4.5|above|-2.0

STARTFILTERS
metal ion-binding site	Label Contains sulfur
kdHydrophobicity	(Score LT 1.5) OR (Score GE 2.8)
ENDFILTERS

Your Own description here	FER_CAPAA	-1	3	93	domain
Your Own description here	FER_CAPAN	-1	48	144	chain
Your Own description here	FER_CAPAN	-1	50	140	domain
Your Own description here	FER_CAPAN	-1	136	136	modified residue
Your Own description here	FER1_LYCES	-1	1	47	transit peptide
Your Own description here	Q93XJ9_SOLTU	-1	1	48	signal peptide
Your Own description here	Q93XJ9_SOLTU	-1	49	144	chain

STARTGROUP	secondarystucture
PDB secondary structure annotation	FER1_SPIOL	-1	52	59	strand
PDB secondary structure annotation	FER1_SPIOL	-1	74	80	helix
ENDGROUP	secondarystructure

STARTGROUP	kd
Hydrophobicity score by kD	Q93XJ9_SOLTU	-1	48	48	kdHydrophobicity	1.8
ENDGROUP	kd

GFF
FER_CAPAA	GffGroup	domain	3	93	.	.