Jalview Documentation

Menus

File
Edit
Font
View
Colour
Calculate
Align

Command line parameters
Applet parameters
Input/Output
Selecting/editing operations
Colour schemes
Fonts
Calculations
Multiple alignments
SRS access
The Alignment Window
The status bar
The ID panel

Menus

File menu

Input from the command line
Input from a URL
Input from a local file (application only)
Save to a local file (application only)
Save postscript to a local file (application only)
Input via a text box (cut and paste)
Alignment output in a text box (cut and paste)
File formats supported
Applet parameters

Input from the command line
When run as an application jalview takes the name of an alignment file on the command line. The format is :
    java jalview.AlignFrame <alignfile> File|URL <format> [-mail <mailserver>
                  -srsserver <srsserver> -database <srsdatabase>]
For people using the script file jalview.bat or Jalview the syntax is
    Jalview <alignfile> <format>
<alignfile> is the name of the alignment file which has to be in one of the supported formats. The allowed formats are MSF, FASTA, PIR, CLUSTAL and BLC.
File|URL If you are reading from a local file use the File switch here. If you are reading a URL use the URL switch.

<format> This must be one of MSF, FASTA, PIR CLUSTAL, BLC, PFAM, MSP

Examples:

1) For a URL:
           java jalview.AlignFrame http://circinus.ebi.ac.uk:6543/jalview/llym.msf URL MSF
2) For a local file
           Jalview 1lym.blc BLC
           java jalview.AlignFrame 1lym.pir File PIR
<srsserver> The default is the EBI srs server :
            srs.ebi.ac.uk/srs7bin/cgi-bin/   .
If you use your own srs server then this option takes the location (minus the http://) of the wgetz program. In most cases this will be
            your.hostname/srs7bin/cgi-bin/
<database> The default database is swall which is probably EBI specific. Change this to your relevant database.
See here for more details about SRS access.

Example for access to the sanger SRS site and the pfam database :
        java jalview.AlignFrame myfile.pfam File PFAM -srsserver www.sanger.ac.uk/srs6bin/cgi-bin/ -database pfamseq
Input from a URL

See Input from the command line for details of how to do this

Input from a local file (application only)

Selecting this option brings up a window where you can type in your local file and select the right format. Pressing the 'Apply' button loads up the alignment in a new window. Input formats allowed are MSF, FASTA, PIR, BLC, CLUSTAL, PFAM and MSP. Further details about formats are here.

Output to a local file (application only)

This option allows you to save your alignment as text to a local file using much the same procedure as reading in a file. WARNING: files can be overwritten with no prompting.

Outputting postscript to a local file (application only)

You can save your coloured alignment as postscript using this option. A window appears where you can select the font and fontsize you wish to use as well as whether the output orientation is portrait or landscape.

Input via a text box

As applets can't write or read local files I have provided a way of inputting alignments by cutting and pasting.
Select your alignment from your local text editor or xterm and paste directly into jalview. You then have to select the format your alignment text is in and click apply to tell jalview to interpret the output. If the format is correct a new alignment window will appear.
Unix selection: 'cat myalignfile' will display the alignment file on screen. Select the alignment text with the mouse. Move the mouse over the text input window and press the middle button. You should now have transferred your alignment to the jalview input box.

Windows selection: Open up your alignment in a text editor (notepad, wordpad or whatever). Select all the text with the mouse and type CTRL-C to copy it. Move your mouse over the jalview text input window and type CTRL-V. The alignment text should now be transferred to the input window.

Alignment output via a text box

Similarly to the textbox input option the text version of your alignment can be output via a java text box. See the 'Input via a text box' description for how to cut and paste your alignments.

File formats supported

The formats supported are

MSF (GCG output with no checksum),

CLUSTAL (Clustalw output),

FASTA (common and simple format),

PIR (less common but almost as simple format) ,

BLC (AMPS output)

PFAM (simple and has the advantage of including start-end points).

Applet parameters

If you are running jalview via the button applet provided in the distribution then there are a number of parameters you can set to define different sequence groups and colour schemes. These are described in a separate document.

The Edit menu

Selecting sequences
Selecting residues
Inserting/deleting gaps
Copying/moving sequences
Deleting sequences
Selecting columns
Trimming the alignment (Removing sequence -> and <- of selected columns)
Changing the gap insert character
Removing gapped columns
Editing sequence groups (Groups...)
Editing the alignment by group (Group editing mode)

Selecting sequences

Selected sequences are used in the Colour and Calculate menus. The Consensus option in the Calculate menu only uses the selected sequences in its calculation and will display an error in its status bar if none are selected. Selecting a colour scheme in the Colour menu will only apply that scheme if any sequences are selected. If none are selected that colour scheme is applied to all sequences.

Sequences can also be selected in other displays such as the tree display window and the PCA results window. If a sequence is selected in one window it will also be selected in all the other windows.

Selecting residues

Inserting/deleting gaps

Copying/Moving sequences

Deleting sequences

Selecting columns

trimming

Trimming the Alignment

All columns may be deselected by choosing the 'Deselect all columns' option from the Edit menu.

Setting the gap character

The gap insert character can be changed from - to . and back again by selecting the 'Set gap character to ...' option in the Edit menu.

Removing gapped columns

It can often happen that you need to removed whole columns consisting of gaps to tidy up your alignment. This can arise after deleting sequences from an alignment that had large insertions. Selecting this option from the Edit menu will remove all columns in your alignment that consist entirely of '-', ''.' or ' '.

Editing sequence groups (Groups...)

To generate a new group or delete a group the buttons in the bottom right hand of the window can be used.

Tip: To generate a new group quickly :

select your sequences in the ID panel of the main alignment window using the mouse.
Bring up the Edit->Groups window.
Generate a new group by selecting 'Add new group'
Add the selected ids to the new group by pressing 'Add selected IDs'

Group editing (Edit->Group edit mode)

To create a group from selected sequences or to see the available sequence groups refere to the previous entry. By default when jalview is first started all sequences are in the same group.

Fastdraw mode

If Fastdraw is switched off other proportional fonts can be used (Helvetica and Times) and the residues appear more spaced out on the screen. The screen update time will also be slower (typically by a factor of 3).

Different font styles

Smaller font sizes (probably < 6) are of most use if the text is switched off and the coloured residue boxes only are displayed (see view menu).

View menu

If the redraw speed is too slow for you on your system then turning off the boxes option and colouring the text black will speed it up considerably.

View boxes

View text

Colour text

Black text

Colour menu

Zappo colours
Taylor colours
Hydrophobicity colours
Colouring residues above a percentage identity threshold
User defined colour schemes
PID (Percentage Identity) Colours
BLOSUM62 colours
View sequence features

Zappo Colours

Residues	Description	Colour
ILVAM	Aliphatic/hydrophobic residues	pink
FWY	Aromatic	orange
KRH	Positive	red
DE	Negative	green
STNQ	Hydrophilic	mid blue
PG	Proline/Glycine (conformationally special)	magenta
C	Cysteine	yellow

Taylor Colours

Residues	Full name	Colour
V	Valine
I	Isoleucine
L	Leucine
F	Phenylalanine
Y	Tyrosine
W	Tryptophan
H	Histidine
R	Arginine
K	Lysine
N	Asparagine
Q	Glutamine
E	Glutamate
D	Aspartate
S	Serine
T	Threonine
G	Glycine
P	Proline
C	Cysteine

Hydrophobicity colours

This option colours the residues in the alignment according to the hydrophobicity table of Kyte, J., and Doolittle, R.F., J. Mol. Biol. 1157, 105-132, 1982. The most hydrophobic residues according to this table are coloured red and the most hydrophilic ones are coloured blue. The colours of the in between residues are varying shades of purple according to whereabouts they are on the scale.

User Colours

Choosing a colour (the colour selector)

Underneath the list of colours and residues is a panel where you can select the rgb values of any colour you wish to use. The user can either move the scrollbars to change the rgb values or type in the values (0-255) in the text boxes. The new colour will be displayed in the panel to the right of the scrollbars.

Changing colours (the residue panels)

Clicking on the colours assigned to different residues with the left mouse button will cause whichever colour is displayed in the colour selector to appear in that residue panel. If you wish to modify an existing residue panel colour right clicking that colour will change the the colour selector's colour to the residue panel colour. The colour selector colour can then be modified.

Changing residues

For each colour present in that scheme a list of residues it is applied to appears to its right. These residues can be moved or deleted or added to to group the residues in a different way. For instance you may just want to display the charged residues in one colour and the rest in another to highlight the charged ones or you may want to only colour the cysteines differently from the others.

If you wish to change the residues associated with a colour edit the residue string in the text field and press the 'Apply' button to its right. If any residues have been deleted from the text field they will be assigned a white colour and appear in the bottom residue panel. If any residues have been transferred from another colour panel they will be deleted from the old one. The main jalview alignment window will be automatically updated.

Any modifications of the colour scheme will only apply to sequences that are selected in the main alignment window. This allows the user to have multiple colour schemes in one alignment. If no sequences are selected then the colour scheme applies to all sequences. BEWARE:: there is NO UNDO function.

Colouring above a percentage identity threshold

This option depends on a consensus calculation having been performed. If no consensus exists (e.g. after a copy or a clustalw alignment) then no residues are coloured.

PID Colours

The PID option colours the residues (boxes and/or text) according to the percentage of the residues in each column that agree with the consensus sequence. Only the residues that agree with the consensus residue for each column are coloured.

Percentage agreement	Colour
> 80 %	Mid blue
> 60 %	Light blue
> 40 %	Light grey
<= 40%	White

View sequence features

When the features have finished transferring the features will be displayed on the alignment with different colours for different features. The colours are as follows

Sequence feature Colour

CHAIN White

DOMAIN White

TRANSMEM Dark red

SIGNAL Cyan

HELIX Magenta

TURN Green

SHEET Yellow

STRAND Yellow

CARBOHYD Pink

ACT_SITE Red

TRANSIT Orange

VARIANT Dark orange

BINDING Blue

DISULFID Dark yellow

anything else Light gray

When the features have been displayed on the alignment selecting a residue will change the display in the sequence feature console. The console will display details of any feature that has been selected and underneath a list of all features listed for that sequence.

There are at the moment a few limitations on the sequence feature display:

Jalview is heavily protein biased and can only parse SWISSPROT features.

Your alignment MUST have the correct start end positions in the ids

>HBA_HUMAN/3-45

means the swissprot ID HBA_HUMAN starting at position 3 and ending at position 45. If your alignment doesn't have the correct start end positions the sequence feature overlay is at best
meaningless.

A good example of the usage of the start-end positions is the Pfam database of protein alignments.

If everything is configured correctly (srs server, database and alignment ids) then you should get output like
the following :

The main window is coloured using all the features in the Pfam pancreatic trypsin inhibitor alignment and the sequence feature console shows details of all features at the selected residue (which is in between 2 disulphide bonds and at the active site in this case). In the background can be seen the mini web browser showing the contents of a Swissprot entry.

Calculate menu

Consensus sequence
Sorting sequence by pairwise identity to consensus
Sorting by tree order
Sorting by group order
Removing redundancy of sequences
Smith-Waterman pairwise alignment
Principal component analysis
UPGMA tree using percentage identity distances
Neighbour joining tree using percentage identity distances
Conservation

Consensus sequence

When the editor first starts up the consensus sequence is automatically calculated using all the sequences in the alignment and the PID colour scheme is used as default. If the consensus option is selected again only the currently selected sequences are used to calculate it and all sequences in the alignment are coloured according to that consensus.

Sorting sequences

Once a consensus calculation has been done selecting this option will sort the selected sequences by their percentage identity to the consensus sequence. The most similar sequence is put at the top. If no sequences are selected then redundancy is removed from the whole alignment.

Sorting by tree order

Sorting by group order

If the sequences have been grouped either by hand or by selecting a point on the tree then this option will reorder the alignment so all sequences in the same group are together. The largest group is shown at the top of the alignment and the smallest at the bottom.

Removing redundancy

Selecting this option brings up a window asking you to select a threshold. If the percentage identity between two sequences exceeds this value one of the sequences (the shorter) is discarded. The redundancy calculation is done when the Apply button is pressed. For large numbers of sequences this can take a long time as all pairs have to be compared.

Pairwise alignment (Proteins only)

For each pair of sequences the best global alignment is found using BLOSUM62 as the scoring matrix. The scores reported are the raw scores. The sequences are aligned using a dynamic programming technique and using the following gap penalties :

Gap open : 12
Gap extend : 2

When you select the pairwise alignment option a new window will come up which will display the alignments in a text format as they are calculated. Also displayed is information about the alignment such as alignment score, length and percentage identity between the sequences.

If you want to save that pairwise alignment (it's not in any known format I'm afraid) you can cut and paste it from the text window with the mouse. You can also press the 'View in alignment editor' button to bring up another editor window.

Principal Component Analysis

SeqSpace

The version implemented here only looks at the clustering of whole sequences and not individual positions in the alignment to help identify functional residues. For large alignments plans are afoot to use the CORBA server written by Chris Dodge to do this 'residue space' PCA remotely.

When the Calculate->Principal component analysis option is selected all the sequences (not just the selected ones) are used in the calculation and for large numbers of sequences this could take quite a time. When the calculation is finished a new window is displayed showing the projections of the sequences along the 2nd, 3rd and 4th vectors giving a 3dimensional view of how the sequences cluster.

This 3d view can be rotated by holding the left mouse button down in the PCA window and moving it. The user can also zoom in and out by using the up and down arrow keys.

Individual points can be selected using the mouse and selected sequences show up green in the PCA window and the usual grey background/white text in the alignment and tree windows.

Different eigenvectors can be used to do the projection by changing the selected dimensions in the 3 menus underneath the 3d window.

UPGMA tree

When the tree has been calculated a new window is displayed showing the tree with labels on the leaves showing the sequence ids. The user can select the ids with the mouse and the selected sequences will also be selected in the alignment window and the PCA window if that analysis has been calculated.

Selecting the 'show distances' checkbox will put branch lengths on the branches. These branch lengths are the percentage mismatch between two nodes.

Postscript output can be generated for this tree and mailed to you by clicking the Output button. This will bring up a window asking you for your email address and you can set font options and the page orientation. Clicking the Apply button will generate the postscript and send the email.

Neighbour Joining tree

Selection and output options are the same as for the UPGMA tree.

Conservation

Hierarchical analysis is based on each residue having certain physico-chemical properties listed as follows:

In brief go about it like this :

This link provides an example of the output after grouping for Pfam family rnaseH:

The grouping by tree may not be satisfactory and the user may want to edit the groups (Edit->Groups...) to put any outliers together.

Before selecting the conservation option change the colour scheme to something sensible (Taylor or hydrophobicity for example). When the conservation is done the existing colour scheme is modified so that the most conserved columns in each group have the most intense colours and the least conserved are the palest.

This link shows the results of first colouring the alignment by hydrophobicity (Colour->by hydrophobicity) then performing conservation analysis (Calculate->Conservation). Conserved hydrophobic columns are shown with predominately red residues and conserved hydrophilic columns with blue. The most conserved regions have the brightest colours.

Here is shown the same conservation but with Taylor colours instead of hydrophobicity (Colour->Taylor).

The conservation analysis is done on each sequence group. This highlights differences and similarities in conserved residue properties between groups.

Multiple alignment

Local alignment using clustalw
Remote alignment using clustalw at the EBI

Local Alignment (Unix applications only)

When this option is selected a window will appear giving you a message about whether your process is running and the time elapsed since the job was started. The cancel button will kill your process at any time.

The text box below should show the progress of your job but at the moment doesn't. I haven't been able to devise a simple way of displaying stdout as the alignment is progressing but I'm working on this. The stdout will appear in the xterm you started

When the alignment is finished a new alignment window is created with the aligned sequences in. No consensus calculation is done on these sequences by default so to see the similarity select Calculate->Consensus.

Due to applet security restrictions this option can only be used from an application.

Remote Alignment

The Cancel button will cancel your job and the output is sent back to the text box below as the alignment progresses. As this application is written in 1.0 java (pretty much) to enable it to be used in older versions of netscape this display is somewhat flickery.

SRS Access

sequence features

SRS server and database

The default SRS server and database are srs.ebi.ac.uk and swall at the EBI. To change to your own SRS server either use the -srsserver and -database options on the command line (see command line parameters) or use the

<param name="srsServer"  value="srs.ebi.ac.uk/srs7bin/cgi-bin/">

<param name="database" value="swall">

Also for entries to be fetched correctly the sequence IDs in your alignment file must be of the right form.

The IDs must be :

HBA_HUMAN/6-20

HBA_HUMAN

urge everyone to include these numbers

The application version now allows access to SRS through it's own mini web browser but at present none of the SRS links work (I'm trying to resist rewriting netscape :)

The alignment window

The Status Bar

The right hand side of the status bar is mostly for development purposes but displays in milliseconds the time taken for the last redraw of the central sequence panel.

Sequence feature	Colour
CHAIN	White
DOMAIN	White
TRANSMEM	Dark red
SIGNAL	Cyan
HELIX	Magenta
TURN	Green
SHEET	Yellow
STRAND	Yellow
CARBOHYD	Pink
ACT_SITE	Red
TRANSIT	Orange
VARIANT	Dark orange
BINDING	Blue
DISULFID	Dark yellow
anything else	Light gray