Modeller PIR Format IO

The homology modelling program, Modeller uses a special form of the PIR format where information about sequence numbering and chain codes are written into the 'description' line between the PIR protein tag and the protein alignment entry:

>P1;Q93Z60_ARATH
sequence:Q93Z60_ARATH:1:.:118:.:.
----MASTALSSAIVSTSFLRRQQTPISLRSLPFANT-QSLFGLKS-STARGGRVTAMATYKVKFITPEGEQ
EVECEEDVYVLDAAEEAGLDLPYSCRAGSCSSCAGKVVSGSIDQSD------QSFLD-D-------------
---------------------*
>P1;PDB|1FER|_
structureX:1FER:1:.:105:.:.
----------------------------------------------------AFVVTDNCIKCKY---TDCV
EV-CPVDCFY----EGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELAEVWPNITEK
KDPLPDAEDWDGVKGKLQHLE*

Jalview will attempt to parse any PIR entries conforming to the Modeller/PIR format, in order to extract the sequence start and end numbering and (possibly) a PDB file reference. The description line information is always stored in the sequence description string - so no information is lost if this parsing process fails.

The 'Modeller Output' flag in the 'Output' tab of the Jalview Preferences dialog box controls whether Jalview will also output MODELLER style PIR files. In this case, any existing 'non-modeller PIR' header information in the description string of an alignment is appended to an automatically generated modeller description line for that sequence.

The general format used for generating the Modeller/PIR sequence description line is shown below :

>P1;Primary_Sequence_ID
sequence or structureX:pdb-reference if
  available:start residue:start chain code:end
  residue:end chain code:. description text
  
The first field is either sequence or structureX, depending upon the presence of a PDB database ID for the sequence. If the protein has no PDB reference, then the chain code is not specified, unless one already existed when the sequence was imported into Jalview.