Manual¶
Basic usage¶
Getting started¶
Given an existing sequence alignment in a file data.dat in FASTA format,
the simplest usage might be:
mview -in fasta data.dat
Similarly, if the input file contained a CLUSTAL alignment:
mview -in clustal data.dat
In either case, the output would be a stacked alignment with extra columns added to show row numbers and percent identities (with respect to the first sequence), looking something like this, regardless of the input format:
Reference sequence (1): EGFR_HUMAN Identities normalised by aligned length. 1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
To process the output of a BLAST run use something like:
mview -in blast blastresults.dat
while to process the output of a FASTA run (the database search program, not the simple FASTA/Pearson data format) use something like:
mview -in uvfasta fastaresults.dat
The -in option isn’t always necessary. If the filename extension, or the
filename itself minus any directory path begins with or contains the first few
letters of the valid -in options (e.g., mydata.msf or mydata.fasta
or tfastx_run.dat), MView tries to choose a sensible input format,
allowing multiple files in mixed formats to be supplied on the command
line. The -in option will always override this mechanism but requires that
all input files be of the same format.
Attaching a ruler¶
Add a ruler along the top, with -ruler on, for example:
mview -in fasta -ruler on data.dat
gives:
Reference sequence (1): EGFR_HUMAN
Identities normalised by aligned length.
1 [ . . . . : . . ] 80
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC
2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV
3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC
4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC
5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
Only one kind of ruler is currently provided, numbering the columns of the final alignment from M to N (incrementing) or N to M (decrementing) based on the input sequence numbering, if any. For multiple alignments like the one above with no numbering the ruler runs from 1 to the length of the alignment.
For database searches that translate nucleotide sequences to protein, such as TBLASTX, the rulers differ slightly in that the native query numbering is given in nucleotide units, but MView reports amino acid units instead (using modulo 3 arithmetic).
Changing the reference sequence¶
One can colour and compute identities with respect to a sequence other than
the first/query sequence using the -reference option. This takes either
the sequence identifier or an integer argument corresponding to the ranking or
ordering of a sequence usually shown in the first labelling column of MView
output. For multiple alignment input formats, sequences are numbered from 1,
while for searches the hits are numbered from 1, but the query itself is 0, so
beware.
Command line options¶
ALl available options can be listed using:
mview -help
There are a lot of options, but the main ones are described in this manual.
Adding HTML¶
Basic HTML¶
To add some HTML markup a few extra options are needed, for example:
mview -in fasta -html head data.dat > data.html
produces a complete page of HTML and you can load this into your Web browser
with a URL like file:///full/path/to/the/folder/data.html.
To colour all the residues using the default built-in colourmap for proteins:
mview -in fasta -ruler on -html head -coloring any data.dat > data.html
produces:
Colored by: property
1 [ . . . . : . . ] 80
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC
2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV
3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC
4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC
5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
To make the letters stand out use the -bold option:
mview -in fasta -ruler on -html head -bold -coloring any data.dat > data.html
giving:
Colored by: property
1 [ . . . . : . . ] 80
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC
2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV
3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC
4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC
5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
Or change the colouring to use blocked letters with -css on instead:
mview -in fasta -ruler on -html head -css on -coloring any data.dat > data.html
giving:
Colored by: property
1 [ . . . . : . . ] 80
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC
2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV
3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC
4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC
5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
You can combine -css on with -bold to make the blocks and letters even
more prominent.
If your data are DNA or RNA, add the option -moltype dna (or rna or
na for “nucleic acid”) to change to the default nucleotide
colourmap. Here’s an MView run on some BLASTN data demonstrating some other
options as well:
mview -in blast -ruler on -html head -css on -coloring identity -moltype dna -top 5 -range 250:310 blastn.dat
which (slightly edited to reduce space) produced:
HSP processing: ranked
Query orientation: +
250 [ . . . . 3 ] 310
EMBOSS_001 bits E-value N qy ht 100.0% 1:521 TGAAGCCTGCACTTACTCAGGACTCATCATGACTGCGTACCAATTCGTCTTACTCAGGACT
1 EM_EST:GT222018.2 gh1574... 1033 0.0 1 + + 100.0% 1:521 4:524 TGAAGCCTGCACTTACTCAGGACTCATCATGACTGCGTACCAATTCGTCTTACTCAGGACT
2 EM_EST:GT222017.1 gh1572... 186 4e-43 1 + + 98.2% 256:372 205:318 ------CTGCACTTACTCAGGACTCATCATGACTGCGTACCAATTCGT-TTACTCAGGACT
3 EM_EST:GT222024.2 gh1633... 182 7e-42 1 + + 95.9% 262:372 96:209 ------------TTACTCAGGACTCATCATGACTGCGTACCAATTCGTCttACTCAGGACT
4 EM_EST:GT222023.2 gh1631... 182 7e-42 1 + + 95.9% 262:372 96:209 ------------TTACTCAGGACTCATCATGACTGCGTACCAATTCGTCttACTCAGGACT
5 EM_EST:GT222054.2 gh721 ... 178 1e-40 1 + + 100.0% 279:372 4:97 -----------------------------TGACTGCGTACCAATTCGTCTTACTCAGGACT
showing scoring and sequence range information parsed from the BLASTN run, and using the default nucleotide colouring scheme (purines, dark blue; pyrimidines, light blue). Notice the lower-cased pairs of thymines near the end of sequences 3 and 4, columns 299–300 indicating where a segment of hit sequence has been excised to close a gap in the query (see Why are some symbols lowercased?).
Controlling the amount of HTML¶
There are several values that can be passed to the -html option: head,
body, data, full, off.
Mode head
Produces a complete web page. Output includes the style sheet if -css on
was given. The most common situation.
Mode body
Produces just the <BODY></BODY> part of the web page. Note: the style
sheet produced by -css on will be missing.
Mode data
Produces just the alignment part of the web page. Note: any style sheet
produced by -css on will be missing.
Mode full
Produces a complete web page with the MIME-type "text/html", suitable for
serving directly from a web server. Output includes the style sheet if -css
on was given.
Mode off
Switches off HTML (default).
Using an external CSS style sheet¶
The option -listcss dumps the style sheet to stdout, so you can share that
across MView invocations from a web server. Each would be of the form:
mview -css URL ...
where the URL specifies the location of the style sheet as seen by the web
server (i.e., file:///some/path or http://server/path).
If you build a new colourmap you can load it into MView and save the new CSS
file. Suppose you have a new colourmap in newcolmap.dat:
mview -colorfile newcolmap.dat -listcss
will dump the new style sheet for use as before.
Consensus sequences¶
Clustal conservation line¶
A Clustal-style conservation line of *:. symbols can be added to any
alignment (not just one from CLUSTAL itself) using -conservation on, like
this:
1 [ . . . . : . . ] 80
1 DMD401_1-640 100.0% LQLDTVLGEGEFGQVLKGFATEIAG---------LPGITTVAVKMLKKGSNSV------------EYMALLSEFQLLQEV
2 CER09D1_11-435 22.2% DTFNRKLGKGKFGIINKGLLTLRICKTNE------VVQVNVAVKKMVDPTDEK------------QDKLIYDEIKLMEYN
3 EGFR_HUMAN 26.7% FKKIKVLGSGAFGTVYKGLWIPEGEK----------VKIPVAIKELREATSPK------------ANKEILDEAYVMASV
4 DMDPR2_1-384 25.4% ISVNKQLGTGEFGIVQQGVWSNGNE------------RIQVAIKCLCRERMQS------------NPMEFLKEAAIMHSI
5 ITK_HUMAN-620 22.0% LTFVQEIGSGQFGLVHLGYWLN---------------KDKVAIKTIREGAMS--------------EEDFIEEAEVMMKL
clustal :* * ** : * **:* : : * ::
The symbols are * for full column identity, and : or . for strong
and weak amino acid grouping, respectively, as defined in CLUSTAL.
For DNA or RNA sequences, if the molecule type was set to nucleic acid with
-moltype na or dna or rna, then the clustal conservation line will
show only the column identities.
Note: these conservation lines can be generated for any subset of rows extracted using the various row filtering options (see Filtering rows).
Consensus lines¶
Consensus lines can be added beneath the alignment using -consensus on. By
default, this adds four extra lines of consensus sequences computed at various
thresholds of percentage composition of the columns.
There are default consensus patterns for protein and nucleotide (either DNA or RNA) sequences. MView starts up with the default protein consensus pattern, for example:
mview ... -consensus on ...
gives:
1 [ . . . . : . . ] 80
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC
2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV
3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC
4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC
5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
consensus/100% hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls
consensus/90% hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls
consensus/80% lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls
consensus/70% lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls
Changing consensus thresholds¶
The default consensus mechanism displays consensus lines calculated at four
levels of identity (100%, 90%, 80%, 70%). This can be changed to show as many
or as few consensus lines at any level of percent identity between 50 and 100%
using the -con_threshold option and a comma-separated list of identities:
mview ... -consensus on -con_threshold 80 ...
would give a single consensus line calculated at 80% identity, while:
mview ... -consensus on -con_threshold 80,65 ...
would produce two lines at 80% and 65% identity.
Consensus pattern definitions¶
Consensus patterns are based on equivalence classes, that is, sets of residues that share some predefined property. These classes are not mutually exclusive and the consensus mechanism will choose the most specific class that summarizes a given column at the desired percent identity.
The default for protein alignments is called P1 and is defined by
physicochemical property as follows:
[P1] #Protein consensus: conserved physicochemical classes, derived from #the Venn diagrams of: Taylor W. R. (1986). The classification of amino acid #conservation. J. Theor. Biol. 119:205-218. #description => symbol members . => . A => A { A } C => C { C } D => D { D } E => E { E } F => F { F } G => G { G } H => H { H } I => I { I } K => K { K } L => L { L } M => M { M } N => N { N } P => P { P } Q => Q { Q } R => R { R } S => S { S } T => T { T } V => V { V } W => W { W } Y => Y { Y } alcohol => o { S, T } aliphatic => l { I, L, V } aromatic => a { F, H, W, Y } charged => c { D, E, H, K, R } hydrophobic => h { A, C, F, G, H, I, K, L, M, R, T, V, W, Y } negative => - { D, E } polar => p { C, D, E, H, K, N, Q, R, S, T } positive => + { H, K, R } small => s { A, C, D, G, N, P, S, T, V } tiny => u { A, G, S } turnlike => t { A, C, D, E, G, H, K, N, Q, R, S, T } stop => * { * }
The default nucleotide consensus pattern is D1 grouping bases by ring type
(purine, pyrimidine). It is selected when any of the nucleotide molecule types
is set -moltype na (for “nucleic acid”; also dna or rna), for
example:
mview ... -consensus on -moltype dna ...
and has the following definition:
[D1] #DNA consensus: conserved ring types #Ambiguous base R is purine: A or G #Ambiguous base Y is pyrimidine: C or T or U #description => symbol members . => . A => A { A } C => C { C } G => G { G } T => T { T } U => U { U } purine => r { A, G, R } pyrimidine => y { C, T, U, Y }
Changing consensus patterns¶
The available list of built-in patterns can be seen with -listgroups.
Alternative equivalence classes can be selected using -con_groupmap. For
example, to select the CYS built-in consensus pattern to show only
conserved cysteines you would use an invocation like:
mview ... -consensus on -con_groupmap CYS ...
New groups can be defined in the same format and read in from a file using
the -groupfile option.
Showing conserved symbols or conserved classes¶
Two options -con_ignore and -con_gaps can be used to tune the
consensus lines. Consider the following alignment:
1 [ . . . . : . . ] 80
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC
2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV
3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC
4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC
5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
The default consensus pattern for proteins, with these options:
mview ... -consensus on -con_threshold 80 ...
would add this consensus line:
consensus/80% lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls
comprising a mixture of conserved residue classes and residues, whichever is more specific.
If you just want to see the conserved physicochemical classes, use -con_ignore singleton:
consensus/80% lphs+plusutauhlhhuhhhs..............hh.lul+pl+.ts.s....p-hhc-utlhtplp+.pllplhuls
Alternatively, to see just the conserved residues, use -con_ignore class:
consensus/80% ....K..G.G.FG.V..G.....................VA.K.................EA..M....H...V.L.G..
Lastly, the default consensus computation counts gap characters in each
column, so that gapped regions are diluted and may not show up in the
consensus. Building on the last example, setting -con_gaps off prevents
this:
consensus/80% ....K..G.G.FG.V..G........LPKGSMN......VA.K.........E.......EA..M....H...V.L.G..
The consensus sequence now runs the full length of the alignment because the insert in sequence 4 spanning the gap has been added to the consensus. This is a little contrived in this case, but is sometimes useful when you want to preserve as much of the alignment as possible.
These options work similarly with nucleotide alignments and with any other consensus pattern you choose.
Note: it is possible to colour the consensus sequences independently of the alignment (see Consensus colouring).
Colouring modes¶
Alignment colouring¶
There are several basic ways to colour the alignment using the -coloring
option which takes five modes: any, identity, mismatch,
consensus, group. These all have default associated colour schemes,
but you can supply a different one or just a single colour by name (see the
description for the mismatch mode for an example).
Mode any
The simplest is to colour every residue according to the currently selected colourmap:
mview ... -coloring any ...
gives:
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
Mode identity
You can colour only those residues that are identical to some reference sequence (usually the query or first row) with:
mview ... -coloring identity ...
to produce:
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
or with respect to another row (let’s use row 4):
mview ... -coloring identity -ref 4 ...
giving:
1 EGFR_HUMAN 21.2% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 25.0% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 26.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN1 00.0% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 22.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
in which also you see that the percent identity calculations have been recomputed with respect to the new row of interest.
Mode mismatch
Behaves like identity mode, but colours only those residues that differ
from the reference sequence (the query or first row unless specified
otherwise) with:
mview ... -coloring mismatch ...
to produce:
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
Using a single colour
That’s using the default protein colourmap and rather difficult to see, so let’s mark all mismatched residues in red:
mview ... -coloring mismatch -colormap red
to produce:
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
This approach of using a single colour works with any of the coloring modes
listed above. To see colours and colormaps use mview -listcolors.
You can add new colours or build your own multicoloured specialist colourmap. For example, you might want certain important mismatched residues showing up in one colour against a ground colour of all the other mismatches (see Colours).
Mode consensus
This mode uses the currently selected alignment colourmap to colour only those
residues assigned to a consensus class for each column (see
Consensus sequences). The consensus threshold defaults to 70% and
and may be set to another value with the -threshold option. In the
following example we add a single consensus line and set the same threshold
for both consensus calculations (they are independent) to 90%:
mview ... -coloring consensus -threshold 90 ... -consensus on -con_threshold 90 ...
gives:
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA consensus/90% hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls
Notice that the coloured columns correspond to the consensus features (i.e., not wildcards or gaps). In each column, the residues that contribute to that consensus class have been coloured using the prevailing alignment colourmap (see Alignment colouring), which is the default one used in the other examples in this section.
Mode group
The last mode works like the consensus colouring mode, but gives the residues in a column a uniform colour defined for that consensus class (see Consensus colouring):
mview ... -coloring group -threshold 90 ... -consensus on -con_threshold 90 ...
yields:
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA consensus/90% hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls
As in the last example, the coloured columns correspond to the consensus features (i.e., not wildcards or gaps). In each column, the residues that contribute to that consensus class have been coloured using a single colour defined for that consensus class (see Consensus colouring), and conserved residues (at least 90% of a column) are given a solid coloured background for emphasis.
The choice of consensus classes may be changed using the -groupmap name
option, where name is the name of a consensus pattern. These can be listed
using the -listgroups option (see Changing consensus patterns).
Note: it is also possible to colour the consensus sequences themselves independently of the alignment (see Consensus colouring).
Colouring only conserved residues¶
The colouring of an alignment under the consensus or group colouring
modes (see Alignment colouring) can be tuned to ignore the
consensus classes with -ignore class for the purposes of colouring:
mview ... -coloring group -threshold 90 ... -consensus on -con_threshold 90 ... -ignore class
gives:
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA consensus/90% hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls
which highlights the conserved residues (at least 90% of a column) in the alignment by applying the default consensus group colouring scheme to them.
Consensus colouring¶
The consensus lines may be coloured independently of the alignment using the
-con_coloring option which takes two modes: any, identity.
Consider the following alignment:
1 [ . . . . : . . ] 80
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC
2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV
3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC
4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC
5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
Mode any
The simplest is to colour every consensus symbol according to the currently selected consensus colourmap:
mview ... -consensus on -con_coloring any ...
gives:
consensus/100% hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls consensus/90% hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls consensus/80% lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls consensus/70% lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls
Mode identity
You can colour only those consensus symbols that are identical to some reference sequence (usually the query or first row) with:
mview ... -consensus on -con_coloring identity ...
to produce:
consensus/100% hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls consensus/90% hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls consensus/80% lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls consensus/70% lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls
Tuning
These modes work with the -con_ignore and -con_gaps options to tune
the consensus symbols displayed (see Showing conserved symbols or conserved classes). For example, the consensus symbols can be switched off, to leave
only conserved residues:
mview ... -consensus on -con_coloring identity -con_ignore class ...
to produce:
consensus/100% .......G.G.FG.V........................VA.K.................E...M............... consensus/90% .......G.G.FG.V........................VA.K.................E...M............... consensus/80% ....K..G.G.FG.V..G.....................VA.K.................EA..M....H...V.L.G.. consensus/70% ....K..G.G.FG.V..G.....................VA.K.................EA..M....H...V.L.G..
Finding and colouring patterns and motifs¶
Occurrences of a string or pattern defined by a regular expression can be
coloured using the -find 'pattern' option. This will cause all instances
of the pattern to be highlighted using the user selected colourmap. Patterns
are case-insensitive.
1. Patterns may be exact strings:
mview ... -html head -css on -find VAIK
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
2. Patterns may be regular expressions enclosed in quotes:
mview ... -find 'VA[IV]K'
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
3. Patterns are unaffected by gaps in the sequence:
mview ... -find '.{4}VA[IV]K'
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
where you can see that the pattern (any 4 residues followed by V A [I or V] K) has been found even though it spans a gap in two rows.
4. Patterns will find all possible matches including overlapping matches:
mview ... -find '.V.'
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
where overlapping instances of the pattern merge together.
5. Multiple alternative patterns are allowed, separated by | characters:
mview ... -find 'VAIK|VAVK|[LI]G.G|[DKER]E[AI]..M|[VIL]V[QR]L'
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
6. Alternative patterns can be given different colours by changing the
delimiter from a | to a : character:
mview ... -find 'VAIK:VAVK:[LI]G.G:[DKER]E[AI]..M:[VIL]V[QR]L'
1 EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA
and | and : delimiters may be combined so that patterns joined by |
still form a single discrete pattern and will have one colour.
If you specify more patterns than the number of colours available (currently 20) the colours are simply cycled.
Colours¶
Colourmaps¶
There are default colourmaps for protein and nucleotide (either DNA or RNA)
alignments and consensus lines. MView starts up with the default protein
colourmap selected, as if you had specified a molecule type with -moltype
aa (for “amino acid”).
Alignments
Colourmaps have names, e.g., the default protein alignment colourmap is called
P1 and the default nucleotide colourmap is D1. Alternative alignment
colouring colourmaps are explicitly selected using the -colormap
option. For example, another built-in colouring scheme can be specified with
-colormap CLUSTAL.
Here is the default colormap for proteins:
[P1] #Protein: highlight amino acid physicochemical properties #symbols => color #comment . -> dark-gray #wildcard/mismatch Aa => bright-green #hydrophobic Bb => dark-gray #D or N Cc => yellow #cysteine Dd => bright-blue #negative charge Ee => bright-blue #negative charge Ff => dark-green #large hydrophobic Gg => bright-green #hydrophobic Hh => dark-green #large hydrophobic Ii => bright-green #hydrophobic Kk => bright-red #positive charge Ll => bright-green #hydrophobic Mm => bright-green #hydrophobic Nn => purple #polar Pp => bright-green #hydrophobic Qq => purple #polar Rr => bright-red #positive charge Ss => dull-blue #small alcohol Tt => dull-blue #small alcohol Vv => bright-green #hydrophobic Ww => dark-green #large hydrophobic Yy => dark-green #large hydrophobic Zz => dark-gray #E or Q Xx -> dark-gray #unknown ? -> light-gray #unknown * -> black #stop
The default alignment colormap for nucleotide sequences can be selected using
the -moltype na (or dna or rna) option:
mview ... -moltype na ...
or by specifying:
mview ... -colormap D1 ...
and is defined as:
[D1] #DNA: highlight nucleotide types #symbols => color #comment . -> dark-gray #wildcard/mismatch Aa => bright-blue #adenosine Cc => dull-blue #cytosine Gg => bright-blue #guanine Tt => dull-blue #thymine Uu => dull-blue #uracil Mm => dark-gray #amino: A or C Rr => dark-gray #purine: A or G Ww => dark-gray #weak: A or T Ss => dark-gray #strong: C or G Yy => dark-gray #pyrimidine: C or T Kk => dark-gray #keto: G or T Vv => dark-gray #not T: A or C or G Hh => dark-gray #not G: A or C or T Dd => dark-gray #not C: A or G or T Bb => dark-gray #not A: C or G or T Nn => dark-gray #any: A or C or G or T Xx -> dark-gray #any ? -> light-gray #unknown
Consensus lines
In addition, the consensus lines optionally displayed below an alignment can
be coloured, and they have their own consensus colourmaps; the default for
proteins is PC1 and for nucleotides it is DC1. Alternative consensus
colouring colourmaps are explicitly selected using the -con_colormap option.
Here is the default consensus colormap for proteins:
[PC1] #Protein consensus: highlight equivalence class #symbols => color #comment . -> dark-gray #unconserved + -> bright-red #positive charge - -> bright-blue #negative charge a -> dark-green #aromatic c -> purple #charged h -> bright-green #hydrophobic l -> bright-green #aliphatic o -> dull-blue #alcohol p -> dull-blue #polar s -> bright-green #small t -> bright-green #turnlike u -> bright-green #tiny
The default consensus colormap for nucleotide sequences can be selected using
the -moltype na (or dna or rna) option:
mview ... -moltype na ...
or by specifying:
mview ... -con_colormap DC1 ...
and is defined as:
[DC1] #DNA consensus: highlight ring type #symbols => color #comment . -> dark-gray #unconserved r -> purple #purine y -> orange #pyrimidine
Creating new colourmaps¶
The built-in colour palette and colourmaps built from it can be listed from
the command line with -listcolors, and new colour schemes can be loaded
from a file using the -colorfile option.
Predefined colours are defined as in the following short segment of the built
in colour palette obtained with -listcolors -html head to wrap the output
in HTML and display the actual colours:
#color : #RGB color black : #000000 color white : #ffffff color red : #ff0000 color green : #00ff00 color blue : #0000ff color cyan : #00ffff color magenta : #ff00ff color yellow : #ffff00 ...
Here’s an example of a short protein colouring scheme using the built-in colourmap, which is used to explain the syntax:
[CYS] #Protein: highlight cysteines #symbols => color #comment . -> dark-gray #wildcard/mismatch Cc => yellow #cysteine Xx -> dark-gray #unknown ? -> light-gray #unknown
When writing a new protein/nucleotide colouring scheme, scheme names
introduced in square brackets ([CYS], above) are case-insensitive.
Any line or part of a line beginning with a # character is a comment.
Colourings are defined one per line by symbol(s) at the left, an arrow, then
the colour name or RGB code, followed by an optional comment. The symbols at
the left are case-sensitive, and can be given as single characters X or as
a character pair like Xx, where we want both upper- and lowercase to have
the same colour. The special wildcard . symbol sets the base colour to be
used for all symbols in the alignment. Other lines define specific colourings
for sequence symbols of interest. The ordering of lines is not important.
In the example, C or c will be painted yellow; an explicit X or
x residue will be dark grey; an explicit unknown residue ? will be
light gray; any other residue will match the . wildcard and be painted
dark grey.
A symbol to name mapping can use a predefined colour name (as above) or an explicit hexadecimal RGB code like those in the colour palette.
The arrow separating the symbol(s) from the colour code can be double =>
or single -> arrows selecting background or foreground colouring:
- If style sheets are not being used, the choice of arrow is unimportant: the supplied colour is used for the foreground, i.e., the output symbol.
- If style sheets are in use with
-css on, then=>means that the colour should be applied to the background of the symbol, while->means it should be aplied to the foreground, i.e., the symbol itself.
So, in the CYS example, the symbols *, ?, X, x will be
coloured in the foreground (i.e., the symbols themselves), and C or c
will be displayed as coloured symbols unless -css on is set, in which case
they will appear as coloured bocks.
Layout and filtering¶
Pagination¶
The default layout is a single unbroken horizontal band of alignment - fine if
scrolling inside Firefox. However, you may prefer to break the alignment into
vertically stacked panes. For panes, for example, 80 columns wide, set
-width 80. Widths refer to the alignment, not to the whole displayed
output.
Column ranges¶
It is possible to narrow (or expand) the displayed range of columns of the
alignment, for example, -range 10:78 would select only that column range
using the numbering scheme reported when -ruler on is set (see
Attaching a ruler). Note: the range setting is not related to the sequence
position labelling for blast/fasta database search input; it’s just the
position along the ruler.
The order of the numbers is unimportant making it simpler to state interest in a region of the alignment that might actually be reversed in the output (e.g., a BLASTN search hit matching the reverse complement of the query strand).
Filtering rows¶
Showing only the top N rows
Usually, specifying a limited number of hits to view from a long search
alignment speeds things up a lot as there’s less parsing and less formatting
to be generated, so to get the best 10 hits, use the option -top 10.
Filtering by percent identity
You also can squeeze more out of a deep alignment and get a less biased view
if a threshold on the pairwise sequence identity is set using -maxident N,
where N is some value between 0 and 100.
Similarly, -minident N will report only those hits above some threshold
percent identity; useful for looking for close matches to the query or some
reference sequence.
Showing and hiding sets of rows
Rows can be dropped explicitly using the -hide option. This can be
supplied a comma-separated list of row identifiers, rank numbers, rank number
ranges (1,2,3, 1..3, 1:3 are all equivalent), regular expressions (case
insensitive, enclosed between // characters) to match against row identifiers,
or the * symbol meaning all rows.
Likewise, the -show option specifies a list of rows to keep in the
alignment. The -show option overrides -hide whenever a row is common
to both.
For example, the options:
-hide all -show '2,3,6..10,/^pdb/'
or even:
-hide '/.*/' -show '2,3,6:10,/^pdb/'
would hide everything except rows 2, 3, 6 through 10 inclusive, and any hits beginning with the string ‘pdb’.
Note: the currently set reference row is still used for percent identity and
colouring operations, even though the row may have been dropped from display
by the -hide list (see Changing the reference sequence).
Data format specific filters
Other filters specific to BLASTP, FASTA, etc., input formats allow cutoffs on
scores or p-values, etc. In particular, it is possible to apply some control
over the selection of HSPs used in building the MView alignment using the
-hsp filtering option.
Some search programs produce DNA strand-directional output (e.g., BLASTN) and you can extract or output the results separately. For example, to see just the plus strand matches:
mview -in blast -strand p blastn_results.dat
The choices are p, m, both.
Of interest to anyone using PSI-BLAST, you can display alignments for any/all iterations of a PSI-BLAST run using, say:
mview -in blast -cycle 1,last psiblast_results.dat
to get just those two iterations. The default is to display only the last
iteration. If you want all output, use -cycle all.
Keeping rows, but ignoring them in calculations
Another control option can be used to prevent MView from using rows for
colouring or for calculation of percent identities although these rows will
still be displayed. Use -nop to specify a list (comma-separated as usual)
of identifiers or row numbers to flag for “NO Processing”. This is useful for
displaying non-alignment data (e.g., secondary structure predictions)
alongside the alignment.
Labels and annotations¶
The labelling information at the left of the alignment can be too wide, so you can switch some of them off. Labels are in blocks numbered from zero (perverse, but the original reasoning was that the input data starts with the sequence identifiers in column 1 and MView tacks on a rank number in front, so make that column 0).
Column Description 0 rank 1 identifier 2 description 3 score block (may contain many score columns) 4 percent identities 5 query sequence positions (blast or fasta searches) 6 hit sequence positions (blast or fasta searches)
Any of the of the label types can be switched off with an option like
-label2 to remove the descriptions label at column 2, and so on.
Data formats¶
Input formats¶
MView supports a variety of input formats covering common sequence database seach and multiple alignment formats. Alternatively, if you can convert some strange alignment to one of the simpler input formats (FASTA, PIR, MSF, plain) you can then read it into MView. See input_formats.
Output formats¶
The default output is plain text showing the alignment together with some header information. HTML markup will be added if any of the HTML-specific or colouring options are set.
However, a number of alternative output formats allow format conversions (e.g., convert a BLAST search to FASTA sequence format) for subsequent processing. See output_formats.
Linking identifiers to external resources¶
Using the -srs on option with HTML output, it is possible to convert
sequence identifiers into links to a sequence database:
1 sp|P00533.2|EGFR_HUMAN 100.0% FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 2 sp|Q9I7F7.3|PR2_DROME 35.7% ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 3 sp|Q08881.1|ITK_HUMAN 32.9% LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 4 sp|Q13308.2|PTK7_HUMAN 21.2% IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 5 sp|P34265.4|KIN31_CAEEL 31.5% VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA consensus/90% .......G.G.FG.V........................VA.K.................E...M............... consensus/80% ....K..G.G.FG.V..G.....................VA.K.................EA..M....H...V.L.G..
The identfiers need to conform to patterns such as:
database|accession|identifier
database:identifier
like those produced by the NCBI, EBI or other blast services.
Links will be constructed if the patterns are listed in the SRS.pm
library, which is part of this software. You can modify and extend this file
to include more patterns if you know some Perl and the format of the URLs
needed to access the sequence databases of interest.
Of course, this linking mechanism works for any recognised input data format, not just blast results.
Memory usage¶
Use of memory by MView can be very great, particularly if you try to process
complete sets of PSI-BLAST cycles each containing 1000s of hits all at
once. Use of most filtering options should reduce memory requirements by
cutting down the number of internal data structures created. Likewise,
processing each alignment separately will save memory or you can use the
option -register off to cause each alignment to be output when ready (by
default all alignments are saved until the end so they can be printed with
fields in register). Finally, the choice of malloc library compiled into your
perl may affect memory use.