[Jalview-discuss] pca
Jim Procter
jprocter at compbio.dundee.ac.uk
Wed Feb 17 15:20:06 GMT 2010
Dear Hadas, thanks for your email.
On 15/02/2010 14:04, Hadas Ner Gaon wrote:
> Dear members
>
> Can you please help me understand why the blusom score in the PCA
> analysis is not reciprocal.
>
> Those are the results of the output values of 4 proteins sequences PCA.
>
> Seq1 seq2 seq3 seq4
>
> seq1 1292.00 930.00 643.00 631.00
>
> seq2 931.00 1289.00 589.00 633.00
>
> seq3 622.00 567.00 1338.00 768.00
>
> seq4 629.00 630.00 785.00 1303.00
>
> Why the score of seq1-seq3 is 643 while the seq3-seq1 score is 622?
>
A good question! In the matrix used for the PCA calculation, each
element e(i,j) represents the sum of substitution scores for mutating
the symbols in the i'th sequence into the corresponding symbol in the
j'th sequence. For proteins, the substitution matrix used is the
blosum62 matrix - and because this is not symmetric (ie the score for
mutating an R to a G is different to the score for mutating a G to an
R), there are often differences between the upper and lower triangles of
the similarity matrix. Its simplest to consider each triangle as
representing the 'forward' or 'backwards' mutation cost for each pair of
sequences in the alignment.
As you may be aware, the matrix that I just described differs slightly
from the one given in the 'SeqSpace' paper cited in the jalview PCA
documentation (http://www.jalview.org/help/html/calculations/pca.html).
In the original paper (Casari, Sander and Valencia 1995 :
http://novacripta.cbm.uam.es/bioweb/courses/MasterBiofis0708/tema03/Casari_NatStructBiol_95.pdf
), the matrix used for PCA analysis is called the comparison matrix, and
is defined as the product of a matrix representation of the alignment
with its transpose:
C = F x T(F)
Here, C is a symmetric n by n matrix, because each element of the matrix
is the sum of identical pairs of symbols for the corresponding pair of
sequences in the alignment. Jalview's slightly different comparison
matrix calculation should, in theory, reflect favourable mutations
between sequences in addition to conservation. However, in my limited
tests, the resulting PCA plot often resembles that produced by the
original algorithm's projection, so this refinement probably doesn't
improve greatly on the seqspace approach.
thanks for the question, and happy Jalviewing!
Jim.
ps. this difference between seqspace and Jalview is not made clear in
the documentation. This will be rectified in a future release.
--
-------------------------------------------------------------------
J. B. Procter (JALVIEW/ENFIN) Barton Bioinformatics Research Group
Phone/Fax:+44(0)1382 388734/345764http://www.compbio.dundee.ac.uk
The University of Dundee is a Scottish Registered Charity, No. SC015096.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.compbio.dundee.ac.uk/pipermail/jalview-discuss/attachments/20100217/019d0eca/attachment.html
More information about the Jalview-discuss
mailing list