.. raw:: html Manual ====== Basic usage ----------- Getting started ^^^^^^^^^^^^^^^ Given an existing sequence alignment in a file ``data.dat`` in FASTA format, the simplest usage might be:: mview -in fasta data.dat Similarly, if the input file contained a CLUSTAL alignment:: mview -in clustal data.dat In either case, the output would be a stacked alignment with extra columns added to show row numbers and percent coverage and percent identity (with respect to the first sequence, see :ref:`ref_cov_pid`), looking something like this, regardless of the input format: .. raw:: html
  Reference sequence (1): EGFR_HUMAN
  Identities normalised by aligned length.

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
To process the output of a BLAST run use something like:: mview -in blast blastresults.dat while to process the output of a FASTA run (the sequence database search program from U. of Virginia, not the simple FASTA/Pearson data format) use something like:: mview -in uvfasta fastaresults.dat The ``-in`` option isn't always necessary. If the filename extension, or the filename itself minus any directory path begins with or contains the first few letters of the valid ``-in`` options (e.g., ``mydata.msf`` or ``mydata.fasta`` or ``tfastx_run.dat``), MView tries to choose a sensible input format, allowing multiple files in mixed formats to be supplied on the command line. The ``-in`` option will always override this mechanism but requires that all input files be of the same format. .. _ref_rulers: The ruler line ^^^^^^^^^^^^^^ The ruler line at the top is enabled by default. To remove it, use the ``-ruler off`` option:: mview -in fasta -ruler off data.dat gives: .. raw:: html
  Reference sequence (1): EGFR_HUMAN
  Identities normalised by aligned length.

  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
Removing the ruler also removes some column header information, which could cause confusion. Only one kind of ruler is currently provided, numbering the columns of the final alignment from M to N (incrementing) or N to M (decrementing) based on the input sequence numbering, if any. For multiple alignments like the one above with no numbering the ruler runs from 1 to the length of the alignment. For database searches that translate nucleotide sequences to protein, such as TBLASTX, the rulers differ slightly in that the native query numbering is given in nucleotide units, but MView reports amino acid units instead (using modulo 3 arithmetic). Command line options ^^^^^^^^^^^^^^^^^^^^ ALl available options can be listed using:: mview -help There are a lot of options, but the main ones are described in this manual. Many of them can be abbreviated as long as they are unambiguous, for example, the option ``-reference`` can be shortened to ``-ref``. Layout and filtering -------------------- Pagination ^^^^^^^^^^ The default layout is a single unbroken horizontal band of alignment - fine if scrolling inside Firefox. However, you may prefer to break the alignment into vertically stacked panes. For panes, for example, 80 columns wide, set ``-width 80``. Widths refer to the alignment, not to the whole displayed output. Column ranges ^^^^^^^^^^^^^ It is possible to narrow (or expand) the displayed range of columns of the alignment, for example, ``-range 10:78`` would select only that column range using the numbering scheme reported when ``-ruler on`` is set (see :ref:`ref_rulers`). Note: the range setting is not related to the sequence position labelling for blast/fasta database search input; it's just the position along the ruler. The order of the numbers is unimportant making it simpler to state interest in a region of the alignment that might actually be reversed in the output (e.g., a BLASTN search hit matching the reverse complement of the query strand). .. _ref_reference_row: Changing the reference sequence ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ One can colour and compute identities with respect to a sequence other than the first/query sequence using the ``-reference`` option. This takes either the sequence identifier or an integer argument corresponding to the ranking or ordering of a sequence usually shown in the first labelling column of MView output. For multiple alignment input formats, sequences are numbered from 1, while for searches the hits are numbered from 1, but the query itself is 0, so beware. .. _ref_sorting: Sorting by percent coverage or percent identity ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ By default MView outputs sequences in the same order they were read in and computes percent coverage and percent identity values (see :ref:`ref_cov_pid`). You can change the ordering with the ``-sort`` option:: -sort cov -sort pid -sort cov:pid -sort pid:cov to sort the output (descending) by coverage, percent identity, coverage then percent identity, or percent identity then coverage. Rows that coincide in the sort retain their original local ordering (ascending row number). You can also change the reference sequence (see :ref:`ref_reference_row`) and apply one of these sorts like this:: -ref 2 -sort cov:pid which would compute all the coverage and percent identities with respect to row 2, then sort by coverage and percent identity placing row 2 first. Default ordering taken from the input alignment: .. raw:: html
  Reference sequence (1): EGFR_HUMAN
  Identities normalised by aligned length.

  1 EGFR_HUMAN  100.0% 100.0%  FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 
  2 PR2_DROME    97.1%  35.7%  ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 
  3 ITK_HUMAN    90.0%  32.9%  LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 
  4 PTK7_HUMAN   97.1%  21.2%  IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 
  5 KIN31_CAEEL 100.0%  31.5%  VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA 
  
After sorting by coverage and percent identity to row 2: .. raw:: html
  Reference sequence (2): PR2_DROME
  Identities normalised by aligned length.

  2 PR2_DROME   100.0% 100.0%  ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV 
  1 EGFR_HUMAN  100.0%  35.7%  FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC 
  5 KIN31_CAEEL 100.0%  30.1%  VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA 
  4 PTK7_HUMAN   97.1%  25.0%  IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC 
  3 ITK_HUMAN    92.6%  33.8%  LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC 
  
.. _ref_filtering_rows: Filtering rows ^^^^^^^^^^^^^^ **Showing only the top N rows** Usually, specifying a limited number of hits to view from a long search alignment speeds things up a lot as there's less parsing and less formatting to be generated, so to get the best 10 hits, use the option ``-top 10``. **Filtering by percent identity** The ``-minident N`` option will report only those hits above some threshold percent identity N compared to the reference row; useful for looking for close matches to the query or other reference sequence. Similarly, you can exclude the strong matches using ``-maxident N``. Both options can be combined. **Showing and hiding sets of rows** Rows can be dropped explicitly using the ``-hide`` option. This can be supplied a comma-separated list of row identifiers, rank numbers, rank number ranges (1,2,3, 1..3, 1:3 are all equivalent), regular expressions (case insensitive, enclosed between // characters) to match against row identifiers, or the ``*`` symbol meaning all rows. Likewise, the ``-show`` option specifies a list of rows to keep in the alignment. The ``-show`` option overrides ``-hide`` whenever a row is common to both. For example, the options:: -hide all -show '2,3,6..10,/^pdb/' or even:: -hide '/.*/' -show '2,3,6:10,/^pdb/' would hide everything except rows 2, 3, 6 through 10 inclusive, and any hits beginning with the string 'pdb'. Note: the currently set reference row is still used for percent identity and colouring operations, even though the row may have been dropped from display by the ``-hide`` list (see :ref:`ref_reference_row`). **Data format specific filters** Other filters specific to BLASTP, FASTA, etc., input formats allow cutoffs on scores or p-values, etc. In particular, it is possible to apply some control over the selection of HSPs used in building the MView alignment using the ``-hsp`` filtering option. Some search programs produce DNA strand-directional output (e.g., BLASTN) and you can extract or output the results separately. For example, to see just the plus strand matches:: mview -in blast -strand p blastn_results.dat The choices are ``p``, ``m``, ``both``. Of interest to anyone using PSI-BLAST, you can display alignments for any/all iterations of a PSI-BLAST run using, say:: mview -in blast -cycle 1,last psiblast_results.dat to get just those two iterations. The default is to display only the last iteration. If you want all output, use ``-cycle all``. **Keeping rows, but ignoring them in calculations** Another control option can be used to prevent MView from using rows for colouring or for calculation of percent identities although these rows will still be displayed. Use ``-nop`` to specify a list (comma-separated as usual) of identifiers or row numbers to flag for "NO Processing". **Comment lines, another way to ignore rows** If a sequence identifier starts with the ``#`` character, it is treated as a ``nop`` row, i.e., as a comment. This is useful for displaying other textual information (e.g., secondary structure predictions) within the alignment. Here is an example with two lines of input data beginning with dummy identifiers ``#residues`` and ``#properties``: .. raw:: html
                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
    residues                     ------------------------------------------^Lys--------------^Glu----------------   
    properties                   ------------------------------------^^^^^^hydrophobic---------------------------   
  
Note 1: MView elides the hash character in the output and also the row numbers. However, the name or row number based filtering mechanism described above still works. Note 2: Comment lines like this cannot contain whitespace in the sequence region - they look like sequences and are processed as such by MView. Labels, annotations and sequences ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can switch off some of the displayed columns, referred to here as labels. Labels are in blocks numbered from zero (perverse, but the original reasoning was that the input data starts with the sequence identifiers in column 1 and MView tacks on a rank number in front, so make that column 0). ====== ================================================== Column Description ====== ================================================== 0 rank 1 identifier 2 description 3 score block (may contain many score columns) 4 percent coverage 5 percent identity 6 query sequence positions (blast or fasta searches) 7 hit sequence positions (blast or fasta searches) 8 trailing optional fields (blast -outfmt 7 data) ====== ================================================== Any of the of the label types can be switched off with an option like ``-label2`` to remove the descriptions label at column 2, and so on. You can also disable processing and display of the sequences with the ``-sequences off`` option. This would be useful if you only want to see summarized scoring information from many blast runs, for example. Adding HTML ----------- Basic HTML ^^^^^^^^^^ To add some HTML markup a few extra options are needed, for example:: mview -in fasta -html head data.dat > data.html produces a complete page of HTML and you can load this into your Web browser with a URL like ``file:///full/path/to/the/folder/data.html``. To colour all the residues using the default built-in colourmap for proteins:: mview -in fasta -html head -coloring any data.dat > data.html produces: .. raw:: html
  Reference sequence (1): EGFR_HUMAN
  Identities normalised by aligned length.
  Colored by: property

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
To make the letters stand out use the ``-bold`` option:: mview -in fasta -html head -bold -coloring any data.dat > data.html giving: .. raw:: html
  Reference sequence (1): EGFR_HUMAN
  Identities normalised by aligned length.
  Colored by: property

  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
Or change the colouring to use blocked letters with ``-css on`` instead:: mview -in fasta -html head -css on -coloring any data.dat > data.html giving: .. raw:: html
  Reference sequence (1): EGFR_HUMAN
  Identities normalised by aligned length.
  Colored by: property

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
You can combine ``-css on`` with ``-bold`` to make the blocks and letters even more prominent. If your data are DNA or RNA, add the option ``-moltype dna`` (or ``rna`` or ``na`` for "nucleic acid") to change to the default nucleotide colourmap. Here's an MView run on some BLASTN data demonstrating some other options as well:: mview -in blast -html head -css on -coloring identity -moltype dna -top 5 -range 250:310 blastn.dat which (slightly edited to reduce space) produced: .. raw:: html
  HSP processing: ranked
  Query orientation: +

                                                cov    pid   query   sbjct 250 [         .         .         .         .         3         ] 310
    EMBOSS_001         bits E-value  N qy ht 100.0% 100.0%   1:521             TGAAGCCTGCACTTACTCAGGACTCATCATGACTGCGTACCAATTCGTCTTACTCAGGACT    
  1 EM_EST:GT222018.2  1033     0.0  1  +  + 100.0% 100.0%   1:521   4:524     TGAAGCCTGCACTTACTCAGGACTCATCATGACTGCGTACCAATTCGTCTTACTCAGGACT    
  2 EM_EST:GT222017.1   186   4e-43  1  +  +  88.5%  98.2% 256:372 205:318     ------CTGCACTTACTCAGGACTCATCATGACTGCGTACCAATTCGT-TTACTCAGGACT    
  3 EM_EST:GT222024.2   182   7e-42  1  +  +  80.3% 100.0% 262:372  96:209     ------------TTACTCAGGACTCATCATGACTGCGTACCAATTCGTCttACTCAGGACT    
  4 EM_EST:GT222023.2   182   7e-42  1  +  +  80.3% 100.0% 262:372  96:209     ------------TTACTCAGGACTCATCATGACTGCGTACCAATTCGTCttACTCAGGACT    
  5 EM_EST:GT222054.2   178   1e-40  1  +  +  52.5% 100.0% 279:372    4:97     -----------------------------TGACTGCGTACCAATTCGTCTTACTCAGGACT    
  
showing scoring and sequence range information parsed from the BLASTN run, and using the default nucleotide colouring scheme (purines, dark blue; pyrimidines, light blue). Notice the lower-cased pairs of thymines near the end of sequences 3 and 4, columns 299--300 indicating where a segment of hit sequence has been excised to close a gap in the query (see :ref:`ref_funny_sequences`). Controlling the amount of HTML ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are several values that can be passed to the ``-html`` option: ``head``, ``body``, ``data``, ``full``, ``off``. **Mode** ``head`` Produces a complete web page. Output includes the style sheet if ``-css on`` was given. The most common situation. **Mode** ``body`` Produces just the ```` part of the web page. Note: the style sheet produced by ``-css on`` will be missing. **Mode** ``data`` Produces just the alignment part of the web page. Note: any style sheet produced by ``-css on`` will be missing. **Mode** ``full`` Produces a complete web page with the ``MIME-type "text/html"``, suitable for serving directly from a web server. Output includes the style sheet if ``-css on`` was given. **Mode** ``off`` Switches off HTML (default). Using an external CSS style sheet ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The option ``-listcss`` dumps the style sheet to stdout, so you can share that across MView invocations from a web server. Each would be of the form:: mview -css URL where the URL specifies the location of the style sheet as seen by the web server (i.e., ``file:///some/path`` or ``http://server/path``). If you build a new colourmap you can load it into MView and save the new CSS file. Suppose you have a new colourmap in ``newcolmap.dat``:: mview -colorfile newcolmap.dat -listcss will dump the new style sheet for use as before. .. _ref_consensus_sequences: Consensus sequences ------------------- Clustal conservation line ^^^^^^^^^^^^^^^^^^^^^^^^^ A Clustal-style conservation line of ``*:.`` symbols can be added to any alignment (not just one from CLUSTAL itself) using the ``-conservation on`` option:: mview ... -conservation on giving: .. raw:: html
                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
    clustal                      .   : :* * ** *  .                     **:*             .:  *  .* .:.:  : :: .:    

  
The symbols are ``*`` for full column identity, and ``:`` or ``.`` for strong and weak amino acid grouping, respectively, as defined in CLUSTAL. For DNA or RNA sequences, if the molecule type was set to nucleic acid with ``-moltype na`` or ``dna`` or ``rna``, then the clustal conservation line will show only the column identities. Note: these conservation lines can be generated for any subset of rows extracted using the various row filtering options (see :ref:`ref_filtering_rows`). Consensus lines ^^^^^^^^^^^^^^^ Consensus lines can be added beneath the alignment using ``-consensus on``. By default, this adds four extra lines of consensus sequences computed at various thresholds of percentage composition of the columns. There are default consensus patterns for protein and nucleotide (either DNA or RNA) sequences. MView starts up with the default protein consensus pattern, for example:: mview ... -consensus on gives: .. raw:: html
                      cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN     100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME       97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN       90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN      97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL    100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
    consensus/100%                  hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls   
    consensus/90%                   hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls   
    consensus/80%                   lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls   
    consensus/70%                   lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls   
  
Changing consensus thresholds ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The default consensus mechanism displays consensus lines calculated at four levels of identity (100%, 90%, 80%, 70%). This can be changed to show as many or as few consensus lines at any level of percent identity between 50 and 100% using the ``-con_threshold`` option and a comma-separated list of identities:: mview ... -consensus on -con_threshold 80 would give a single consensus line calculated at 80% identity, while:: mview ... -consensus on -con_threshold 80,65 would produce two lines at 80% and 65% identity. Consensus pattern definitions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Consensus patterns are based on equivalence classes, that is, sets of residues that share some predefined property. These classes are not mutually exclusive and the consensus mechanism will choose the most specific class that summarizes a given column at the desired percent identity. The default for protein alignments is called ``P1`` and is defined by physicochemical property as follows: .. raw:: html
  [P1]
  #Protein consensus: conserved physicochemical classes, derived from
  #the Venn diagrams of: Taylor W. R. (1986). The classification of amino acid
  #conservation. J. Theor. Biol. 119:205-218.
  #description =>  symbol  members
  .            =>  .     
  A            =>  A       { A }
  C            =>  C       { C }
  D            =>  D       { D }
  E            =>  E       { E }
  F            =>  F       { F }
  G            =>  G       { G }
  H            =>  H       { H }
  I            =>  I       { I }
  K            =>  K       { K }
  L            =>  L       { L }
  M            =>  M       { M }
  N            =>  N       { N }
  P            =>  P       { P }
  Q            =>  Q       { Q }
  R            =>  R       { R }
  S            =>  S       { S }
  T            =>  T       { T }
  V            =>  V       { V }
  W            =>  W       { W }
  Y            =>  Y       { Y }
  alcohol      =>  o       { S, T }
  aliphatic    =>  l       { I, L, V }
  aromatic     =>  a       { F, H, W, Y }
  charged      =>  c       { D, E, H, K, R }
  hydrophobic  =>  h       { A, C, F, G, H, I, K, L, M, R, T, V, W, Y }
  negative     =>  -       { D, E }
  polar        =>  p       { C, D, E, H, K, N, Q, R, S, T }
  positive     =>  +       { H, K, R }
  small        =>  s       { A, C, D, G, N, P, S, T, V }
  tiny         =>  u       { A, G, S }
  turnlike     =>  t       { A, C, D, E, G, H, K, N, Q, R, S, T }
  stop         =>  *       { * }
  
The default nucleotide consensus pattern is ``D1`` grouping bases by ring type (purine, pyrimidine). It is selected when any of the nucleotide molecule types is set ``-moltype na`` (for "nucleic acid"; also ``dna`` or ``rna``), for example:: mview ... -consensus on -moltype dna and has the following definition: .. raw:: html
  [D1]
  #DNA consensus: conserved ring types
  #Ambiguous base R is purine: A or G
  #Ambiguous base Y is pyrimidine: C or T or U
  #description =>  symbol  members
  .            =>  .     
  A            =>  A       { A }
  C            =>  C       { C }
  G            =>  G       { G }
  T            =>  T       { T }
  U            =>  U       { U }
  purine       =>  r       { A, G, R }
  pyrimidine   =>  y       { C, T, U, Y }
  
.. _ref_changeing_consensus_patterns: Changing consensus patterns ^^^^^^^^^^^^^^^^^^^^^^^^^^^ The available list of built-in patterns can be seen with ``-listgroups``. Alternative equivalence classes can be selected using ``-con_groupmap``. For example, to select the ``CYS`` built-in consensus pattern to show only conserved cysteines you would use an invocation like:: mview ... -consensus on -con_groupmap CYS New groups can be defined in the same format and read in from a file using the ``-groupfile`` option. .. _ref_conserved_symbols or conserved classes: Showing conserved residues or conserved classes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Two options ``-con_ignore`` and ``-con_gaps`` can be used to tune the consensus lines. Consider the following alignment: .. raw:: html
                      cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN     100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME       97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN       90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN      97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL    100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
The default consensus pattern for proteins, with these options:: mview ... -consensus on -con_threshold 80 would add this consensus line: .. raw:: html
    consensus/80%                  lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls   
  
comprising a mixture of conserved residue classes and residues, whichever is more specific. If you just want to see the conserved physicochemical classes, use ``-con_ignore singleton``: .. raw:: html
    consensus/80%                  lphs+plusutauhlhhuhhhs..............hh.lul+pl+.ts.s....p-hhc-utlhtplp+.pllplhuls   
  
Alternatively, to see just the conserved residues, use ``-con_ignore class``: .. raw:: html
    consensus/80%                  ....K..G.G.FG.V..G.....................VA.K.................EA..M....H...V.L.G..   
  
Lastly, the default consensus computation counts gap characters in each column, so that gapped regions are diluted and may not show up in the consensus. Building on the last example, setting ``-con_gaps off`` prevents this: .. raw:: html
    consensus/80%                  ....K..G.G.FG.V..G........LPKGSMN......VA.K.........E.......EA..M....H...V.L.G..   
  
The consensus sequence now runs the full length of the alignment because the insert in sequence 4 spanning the gap has been added to the consensus. This is a little contrived in this case, but is sometimes useful when you want to preserve as much of the alignment as possible. These options work similarly with nucleotide alignments and with any other consensus pattern you choose. Note: it is possible to colour the consensus sequences independently of the alignment (see :ref:`ref_consensus_colouring`). Colouring modes --------------- .. _ref_alignment_colouring: Alignment colouring ^^^^^^^^^^^^^^^^^^^ There are several basic ways to colour the alignment using the ``-coloring`` option which takes five modes: ``any``, ``identity``, ``mismatch``, ``consensus``, ``group``. These all have default associated colour schemes, but you can supply a different one or just a single colour by name (see the description for the ``mismatch`` mode for an example). **Mode** ``any`` The simplest is to colour every residue according to the currently selected colourmap:: mview -html head -css on -coloring any gives: .. raw:: html
  Colored by: property

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
**Mode** ``identity`` You can colour only those residues that are identical to some reference sequence (usually the query or first row) with:: mview ... -coloring identity to produce: .. raw:: html
  Colored by: identity

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
or with respect to another row (let's use row 4):: mview ... -coloring identity -ref 4 giving: .. raw:: html
  Colored by: identity

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN   87.2%  21.2%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    84.6%  25.0%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    80.8%  26.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN  100.0% 100.0%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL  91.0%  22.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
in which also you see that the percent identity calculations have been recomputed with respect to the new row of interest. **Mode** ``mismatch`` Behaves like ``identity`` mode, but colours only those residues that differ from the reference sequence (the query or first row unless specified otherwise) with:: mview ... -coloring mismatch to produce: .. raw:: html
  Colored by: mismatch

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
**Using a single colour** That's using the default protein colourmap and rather difficult to see, so let's mark all mismatched residues in red:: mview ... -coloring mismatch -colormap red to produce: .. raw:: html
  Colored by: mismatch

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
This approach of using a single colour works with any of the colouring modes listed above. To see colours and colormaps use ``mview -listcolors``. You can add new colours or build your own multicoloured specialist colourmap. For example, you might want certain important mismatched residues showing up in one colour against a ground colour of all the other mismatches (see :ref:`ref_colourmaps`). **Mode** ``consensus`` This mode uses the currently selected alignment colourmap to colour only those residues assigned to a consensus class for each column (see :ref:`ref_consensus_sequences`). The consensus threshold defaults to 70% and and may be set to another value with the ``-threshold`` option. In the following example we add a single consensus line and set the same threshold for both consensus calculations (they are independent) to 90%:: mview ... -coloring consensus -threshold 90 -consensus on -con_threshold 90 gives: .. raw:: html
  Colored by: consensus/90%

                     cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN    100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME      97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN      90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN     97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL   100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
    consensus/90%                  hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls   
  
Notice that the coloured columns correspond to the consensus features (i.e., not wildcards or gaps). In each column, the residues that contribute to that consensus class have been coloured using the prevailing alignment colourmap (see :ref:`ref_alignment_colouring`), which is the default one used in the other examples in this section. **Mode** ``group`` The last mode works like the consensus colouring mode, but gives the residues in a column a uniform colour defined for that consensus class (see :ref:`ref_consensus_colouring`):: mview ... -coloring group -threshold 90 -consensus on -con_threshold 90 yields: .. raw:: html
  Colored by: consensus group/90%

                     cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN    100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME      97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN      90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN     97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL   100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
    consensus/90%                  hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls   
  
As in the last example, the coloured columns correspond to the consensus features (i.e., not wildcards or gaps). In each column, the residues that contribute to that consensus class have been coloured using a single colour defined for that consensus class (see :ref:`ref_consensus_colouring`), and conserved residues (at least 90% of a column) are given a solid coloured background for emphasis. The choice of consensus classes may be changed using the ``-groupmap name`` option, where ``name`` is the name of a consensus pattern. These can be listed using the ``-listgroups`` option (see :ref:`ref_changeing_consensus_patterns`). Note: it is also possible to colour the consensus sequences themselves independently of the alignment (see :ref:`ref_consensus_colouring`). Conserved residue colouring ^^^^^^^^^^^^^^^^^^^^^^^^^^^ The colouring of an alignment under the ``consensus`` or ``group`` colouring modes (see :ref:`ref_alignment_colouring`) can be tuned to ignore the consensus classes with ``-ignore class`` for the purposes of colouring:: mview ... -coloring group -threshold 90 -consensus on -con_threshold 90 -ignore class gives: .. raw:: html
  Colored by: consensus group/90%

                     cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN    100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME      97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN      90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN     97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL   100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
    consensus/90%                  hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls   
  
which highlights the conserved residues (at least 90% of a column) in the alignment by applying the default consensus group colouring scheme to them. Finding and colouring motifs ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Occurrences of a string or pattern defined by a regular expression can be coloured using the ``-find 'pattern'`` option. This will cause all instances of the pattern to be highlighted using the user selected colourmap. Patterns are case-insensitive. **Exact strings** :: mview ... -html head -css on -find VAIK .. raw:: html
  Colored by: search pattern 'VAIK'

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
**Regular expressions** Regular expressions containing any of the shell metacharacters in the set ``[ ] { } ( ) | ^ $ ? *`` must be enclosed in quotes. This example finds V A followed by either I or V then K:: mview ... -find 'VA[IV]K' .. raw:: html
  Colored by: search pattern 'VA[IV]K'

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
**Patterns can span gaps in the sequence** The pattern (any 5 residues followed by V A) has intact instances in the last two rows, but also spans a gap in the first three rows:: mview ... -find '.{5}VA' .. raw:: html
  Colored by: search pattern '.{5}VA'

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
**Patterns find the longest match** Patterns are greedy and try to find the longest possible match. This looks for a subsequence starting with by V and ending with G:: mview ... -find 'V.*G' .. raw:: html
  Colored by: search pattern 'V.*G'

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
**Alternative patterns** Alternative patterns separated by ``|`` characters receive the same colouring:: mview ... -find 'VAIK|VAVK|[LI]G.G|[DKER]E[AI]..M|[VIL]V[QR]L' .. raw:: html
  Colored by: search pattern 'VAIK|VAVK|[LI]G.G|[DKER]E[AI]..M|[VIL]V[QR]L'

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
**Alternative patterns with different colours** Alternative patterns separated by ``:`` characters receive a different colouring:: mview ... -find 'VAIK:VAVK:[LI]G.G:[DKER]E[AI]..M:[VIL]V[QR]L' .. raw:: html
  Colored by: search pattern 'VAIK:VAVK:[LI]G.G:[DKER]E[AI]..M:[VIL]V[QR]L'

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
Note: if you specify more patterns than the number of colours available (currently 20) the colours are cycled. **Combining both types of alternative pattern** Patterns containing both ``|`` and ``:`` are allowed, with ``|`` taking precedence. That is, all neighbouring patterns separated by ``|`` are treated as one colour group, and groups separated by ``:`` are given different colours:: mview ... -find 'VAIK|VAVK|[LI]G.G:[DKER]E[AI]..M|[VIL]V[QR]L' .. raw:: html
  Colored by: search pattern 'VAIK|VAVK|[LI]G.G:[DKER]E[AI]..M|[VIL]V[QR]L'

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
Combining alignment and motif colouring ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It is possible to combine the alignment colouring mode with a motif search. For example, this shows mismatches to the query in a light grey overlaid with two coloured motifs:: mview ... -coloring mismatch -colormap gray5 -find 'G.G.FG.V:VA.K' .. raw:: html
  Colored by: mismatch; search pattern 'G.G.FG.V:VA.K'

                   cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
.. _ref_comment_colouring: Comment row colouring ^^^^^^^^^^^^^^^^^^^^^ Individual rows can be coloured following a particular scheme by embedding the name of a colourmap in the sequence identifier of a comment line (see above). The sequence identifier must begin with a ``#`` character followed by the name of a colourmap and a colon separator before the identifier string. For example, ``#MAP`` or ``#MAP:comment`` would cause the MAP colourmap to be used if it exists. Otherwise, the row will be displayed uncoloured. One use of this might be to colour rows of protein secondary structure assignments. Here is a fragment of an HSSP multiple sequence alignment, which also shows a DSSP-derived secondary structure assignment under the dummy identifier ``#DSSP``: .. raw:: html
                                 cov    pid   1 [        .         .         .         .         :         .         .         . 80 
  1 0:9wga                    100.0% 100.0%     XRCGEQGSNMECPNNLCCSQYGYCGMGGDYCGKGCQNGACWTSKRCGSQAGGATCPNNHCCSQYGHCGFGAEYCGAGCQG    
  2 2:uniprot|P02876|agi2_whe  99.4% 100.0%     -RCGEQGSNMECPNNLCCSQYGYCGMGGDYCGKGCQNGACWTSKRCGSQAGGATCPNNHCCSQYGHCGFGAEYCGAGCQG    
  3 4:uniprot|P10968|agi1_whe  98.8%  97.6%     -RCGEQGSNMECPNNLCCSQYGYCGMGGDYCGKGCQNGACWTSKRCGSQAGGATCTNNQCCSQYGYCGFGAEYCGAGCQG    
    DSSP                                        ---BGGGTTB--GGG-EE-TTS-EE-SHHHHSTT--SSS-SS--B-GGGGTT---GGG-EE-TTSBEE-SHHHHSTT--B    
  
Note 1: MView strips the ``#`` character from the row identifier in the display, but leaves the name of the colourmap. The intention is that the user can define various specialized colourmaps with informative names, as here with DSSP. Note 2: Any number of these rows is allowed, using the same or different colourmaps, and they can be placed anywhere in the alignment. .. _ref_consensus_colouring: Consensus pattern colouring ^^^^^^^^^^^^^^^^^^^^^^^^^^^ The consensus lines may be coloured independently of the alignment. The ``--con_coloring`` option takes two modes: ``any``, ``identity``. **Mode** ``any`` The simplest is to colour every consensus symbol according to the currently selected consensus colourmap:: mview ... -consensus on -con_coloring any gives: .. raw:: html
                      cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN     100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME       97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN       90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN      97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL    100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
    consensus/100%                  hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls   
    consensus/90%                   hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls   
    consensus/80%                   lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls   
    consensus/70%                   lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls   
  
Alternative colourmaps can be set using the ``-con_colormap`` option. In particular, a predefined uniform colour can be chosen:: mview ... -consensus on -con_coloring any -con_colormap brown gives: .. raw:: html
                      cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN     100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME       97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN       90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN      97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL    100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
    consensus/100%                  hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls   
    consensus/90%                   hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls   
    consensus/80%                   lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls   
    consensus/70%                   lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls   
  
**Mode** ``identity`` Instead of colouring all the consensus symbols, we can just colour those that are identical to the reference sequence. Here we colour the consensus identities with respect to the first alignment row:: mview ... -consensus on -con_coloring identity gives: .. raw:: html
                      cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN     100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME       97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN       90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN      97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL    100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
    consensus/100%                  hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls   
    consensus/90%                   hp..p.lG.GtFG.V..u.h...................VAlKphp.t........ph.cEh.hM.plpp.plsphhuls   
    consensus/80%                   lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls   
    consensus/70%                   lphsKplGsGtFGhVhhGhhhs..............hh.VAlKpl+.ts.s....p-hhcEAtlMtplpH.plVpLhGls   
  
These modes work with the ``-con_gaps`` and ``-con_ignore`` options in various combinations to tune the consensus symbols displayed (see :ref:`ref_conserved_symbols or conserved classes`). **Make consensus span low coverage regions** The strict consensus calculation can be relaxed to ignore gaps, so that the consensus is also computed across lower coverage regions:: mview ... -consensus on -con_coloring any -con_gaps off to produce: .. raw:: html
                      cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN     100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME       97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN       90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN      97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL    100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
    consensus/100%                  hp..p.lG.GtFG.V..u.h....-.LPKGSMNsc....VAlKphp.t...pE...ph.cEh.hM.plpp.plsphhuls   
    consensus/90%                   hp..p.lG.GtFG.V..u.h....-.LPKGSMNsc....VAlKphp.t...pE...ph.cEh.hM.plpp.plsphhuls   
    consensus/80%                   lphsKplGsGtFGhVhhGhhhs..-.LPKGSMNsc.hh.VAlKpl+.ts.spE..p-hhcEAtlMtplpH.plVpLhGls   
    consensus/70%                   lphsKplGsGtFGhVhhGhhhshs-.LPKGSMNsc.hh.VAlKpl+.ts.spEs.p-hhcEAtlMtplpH.plVpLhGls   
  
**Make consensus span low coverage regions, switch off conserved residues** The strict consensus calculation can be relaxed to ignore gaps, so that the consensus is also computed across lower coverage regions. In addition, conserved residue names can be replaced with their consensus classes:: mview ... -consensus on -con_coloring any -con_gaps off -con_ignore singleton to produce: .. raw:: html
                      cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN     100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME       97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN       90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN      97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL    100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
    consensus/100%                  hp..p.lu.utau.l..u.h....-.ls+uohssc....lul+php.t...p-...ph.c-h.hh.plpp.plsphhuls   
    consensus/90%                   hp..p.lu.utau.l..u.h....-.ls+uohssc....lul+php.t...p-...ph.c-h.hh.plpp.plsphhuls   
    consensus/80%                   lphs+plusutauhlhhuhhhs..-.ls+uohssc.hh.lul+pl+.ts.sp-..p-hhc-utlhtplp+.pllplhuls   
    consensus/70%                   lphs+plusutauhlhhuhhhshs-.ls+uohssc.hh.lul+pl+.ts.sp-s.p-hhc-utlhtplp+.pllplhuls   
  
**Switch off consensus classes, show only conserved residues** The consensus classes can be switched off, to leave only conserved residues:: mview ... -consensus on -con_coloring any -con_ignore class to produce: .. raw:: html
                      cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 EGFR_HUMAN     100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 PR2_DROME       97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 ITK_HUMAN       90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 PTK7_HUMAN      97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 KIN31_CAEEL    100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
    consensus/100%                  .......G.G.FG.V........................VA.K.................E...M...............   
    consensus/90%                   .......G.G.FG.V........................VA.K.................E...M...............   
    consensus/80%                   ....K..G.G.FG.V..G.....................VA.K.................EA..M....H...V.L.G..   
    consensus/70%                   ....K..G.G.FG.V..G.....................VA.K.................EA..M....H...V.L.G..   
  
.. _ref_colourmaps: Colours ------- Colourmaps ^^^^^^^^^^ There are default colourmaps for protein and nucleotide (either DNA or RNA) alignments and consensus lines. MView starts up with the default protein colourmap selected, as if you had specified a molecule type with ``-moltype aa`` (for "amino acid"). **Alignments** Colourmaps have names, e.g., the default protein alignment colourmap is called ``P1`` and the default nucleotide colourmap is ``D1``. Alternative alignment colouring colourmaps are explicitly selected using the ``-colormap`` option. For example, another built-in colouring scheme can be specified with ``-colormap CLUSTAL``. Here is the default colormap for proteins: .. raw:: html
  [P1]
  #Protein: highlight amino acid physicochemical properties
  #symbols =>  color                #comment
  .        ->  dark-gray            #wildcard/mismatch
  Aa       =>  bright-green         #hydrophobic
  Bb       =>  dark-gray            #D or N
  Cc       =>  yellow               #cysteine
  Dd       =>  bright-blue          #negative charge
  Ee       =>  bright-blue          #negative charge
  Ff       =>  dark-green           #large hydrophobic
  Gg       =>  bright-green         #hydrophobic
  Hh       =>  dark-green           #large hydrophobic
  Ii       =>  bright-green         #hydrophobic
  Kk       =>  bright-red           #positive charge
  Ll       =>  bright-green         #hydrophobic
  Mm       =>  bright-green         #hydrophobic
  Nn       =>  purple               #polar
  Pp       =>  bright-green         #hydrophobic
  Qq       =>  purple               #polar
  Rr       =>  bright-red           #positive charge
  Ss       =>  dull-blue            #small alcohol
  Tt       =>  dull-blue            #small alcohol
  Vv       =>  bright-green         #hydrophobic
  Ww       =>  dark-green           #large hydrophobic
  Yy       =>  dark-green           #large hydrophobic
  Zz       =>  dark-gray            #E or Q
  Xx       ->  dark-gray            #unknown
  ?        ->  light-gray           #unknown
  *        ->  black                #stop
  
The default alignment colormap for nucleotide sequences can be selected using the ``-moltype na`` (or ``dna`` or ``rna``) option:: mview ... -moltype na or by specifying:: mview ... -colormap D1 and is defined as: .. raw:: html
  [D1]
  #DNA: highlight nucleotide types
  #symbols =>  color                #comment
  .        ->  dark-gray            #wildcard/mismatch
  Aa       =>  bright-blue          #adenosine
  Cc       =>  dull-blue            #cytosine
  Gg       =>  bright-blue          #guanine
  Tt       =>  dull-blue            #thymine
  Uu       =>  dull-blue            #uracil
  Mm       =>  dark-gray            #amino:      A or C
  Rr       =>  dark-gray            #purine:     A or G
  Ww       =>  dark-gray            #weak:       A or T
  Ss       =>  dark-gray            #strong:     C or G
  Yy       =>  dark-gray            #pyrimidine: C or T
  Kk       =>  dark-gray            #keto:       G or T
  Vv       =>  dark-gray            #not T: A or C or G
  Hh       =>  dark-gray            #not G: A or C or T
  Dd       =>  dark-gray            #not C: A or G or T
  Bb       =>  dark-gray            #not A: C or G or T
  Nn       =>  dark-gray            #any: A or C or G or T
  Xx       ->  dark-gray            #any
  ?        ->  light-gray           #unknown
  
**Consensus lines** In addition, the consensus lines optionally displayed below an alignment can be coloured, and they have their own consensus colourmaps; the default for proteins is ``PC1`` and for nucleotides it is ``DC1``. Alternative consensus colouring colourmaps are explicitly selected using the ``-con_colormap`` option. Here is the default consensus colormap for proteins: .. raw:: html
  [PC1]
  #Protein consensus: highlight equivalence class
  #symbols =>  color                #comment
  .        ->  dark-gray            #unconserved
  +        ->  bright-red           #positive charge
  -        ->  bright-blue          #negative charge
  a        ->  dark-green           #aromatic
  c        ->  purple               #charged
  h        ->  bright-green         #hydrophobic
  l        ->  bright-green         #aliphatic
  o        ->  dull-blue            #alcohol
  p        ->  dull-blue            #polar
  s        ->  bright-green         #small
  t        ->  bright-green         #turnlike
  u        ->  bright-green         #tiny
  
The default consensus colormap for nucleotide sequences can be selected using the ``-moltype na`` (or ``dna`` or ``rna``) option:: mview ... -moltype na or by specifying:: mview ... -con_colormap DC1 and is defined as: .. raw:: html
  [DC1]
  #DNA consensus: highlight ring type
  #symbols =>  color                #comment
  .        ->  dark-gray            #unconserved
  r        ->  purple               #purine
  y        ->  orange               #pyrimidine
  
Creating new colourmaps ^^^^^^^^^^^^^^^^^^^^^^^ The built-in colour palette and colourmaps built from it can be listed from the command line with ``-listcolors``, and new colour schemes can be loaded from a file using the ``-colorfile`` option. Predefined colours are defined as in the following short segment of the built in colour palette obtained with ``-listcolors -html head`` to wrap the output in HTML and display the actual colours: .. raw:: html
  #color                     : #RGB 
  color black                : #000000
  color white                : #ffffff
  color red                  : #ff0000
  color green                : #00ff00
  color blue                 : #0000ff
  color cyan                 : #00ffff
  color magenta              : #ff00ff
  color yellow               : #ffff00    
  ...
  
Here's an example of a short protein colouring scheme using the built-in colourmap, which is used to explain the syntax: .. raw:: html
  [CYS]
  #Protein: highlight cysteines
  #symbols =>  color                #comment
  .        ->  dark-gray            #wildcard/mismatch
  Cc       =>  yellow               #cysteine
  Xx       ->  dark-gray            #unknown
  ?        ->  light-gray           #unknown
  
When writing a new protein/nucleotide colouring scheme, scheme names introduced in square brackets (``[CYS]``, above) are case-insensitive. Any line or part of a line beginning with a ``#`` character is a comment. Colourings are defined one per line by symbol(s) at the left, an arrow, then the colour name or RGB code, followed by an optional comment. The symbols at the left are case-sensitive, and can be given as single characters ``X`` or as a character pair like ``Xx``, where we want both upper- and lowercase to have the same colour. The special wildcard ``.`` symbol sets the base colour to be used for all symbols in the alignment. Other lines define specific colourings for sequence symbols of interest. The ordering of lines is not important. In the example, ``C`` or ``c`` will be painted yellow; an explicit ``X`` or ``x`` residue will be dark grey; an explicit unknown residue ``?`` will be light gray; any other residue will match the ``.`` wildcard and be painted dark grey. A symbol to name mapping can use a predefined colour name (as above) or an explicit hexadecimal RGB code like those in the colour palette. The arrow separating the symbol(s) from the colour code can be double ``=>`` or single ``->`` arrows selecting background or foreground colouring: * If style sheets are not being used, the choice of arrow is unimportant: the supplied colour is used for the foreground, i.e., the output symbol. * If style sheets are in use with ``-css on``, then ``=>`` means that the colour should be applied to the background of the symbol, while ``->`` means it should be aplied to the foreground, i.e., the symbol itself. So, in the ``CYS`` example, the symbols ``*``, ``?``, ``X``, ``x`` will be coloured in the foreground (i.e., the symbols themselves), and ``C`` or ``c`` will be displayed as coloured symbols unless ``-css on`` is set, in which case they will appear as coloured bocks. .. _ref_cov_pid: Coverage and identity --------------------- MView calculates percent coverage and percent identity of every sequence with respect to a reference sequence, by default the query sequence of a search, or the first row of a sequence alignment. You can change the reference sequence against which identities are calculated using the ``-reference`` option, which requires either a row number or a sequence identifier (see :ref:`ref_reference_row`). In addition, you can resort the alignment using the ``-sort`` option (see :ref:`ref_sorting`). The coverage and identity calculations are defined below. Percent coverage ^^^^^^^^^^^^^^^^ Percent coverages reported in each alignment row are calculated with respect to the reference sequence (by default, the query or first row): .. math:: \frac{\mathrm{number~of~residues~in~row~aligned~with~reference~row}} {\mathrm{length~of~ungapped~reference~row}} \times 100 Percent identity ^^^^^^^^^^^^^^^^ Percent identities reported in each alignment row are calculated with respect to the aligned portion of the reference sequence (usually the query or first row): .. math:: \frac{\mathrm{number~of~identical~residues}} {\mathrm{length~of~ungapped~reference~row~over~aligned~region}} \times 100 Two other calculation possibilities are available: ``-pcid reference`` normalises by the ungapped length of the query or reference sequence, and ``-pcid hit`` normalises by the ungapped length of the hit sequence. Note: in the case of BLAST MView output, minor deviations from the percentages reported by BLAST are due to (1) different rounding, and (2) the way MView assembles a single pseudo-sequence (see :ref:`ref_funny_sequences`) for a hit composed of multiple HSPs, giving an averaged percent identity. This default behaviour above is also obtained using the option ``-pcid aligned``. Data formats ------------ Input formats ^^^^^^^^^^^^^ MView supports a variety of input formats covering common sequence database seach and multiple alignment formats. Alternatively, if you can convert some strange alignment to one of the simpler input formats (FASTA, PIR, MSF, plain) you can then read it into MView. See `input_formats`_. .. _input_formats: formats_in.html Output formats ^^^^^^^^^^^^^^ The default output is plain text showing the alignment together with some header information. HTML markup will be added if any of the HTML-specific or colouring options are set. However, a number of alternative output formats allow format conversions (e.g., convert a BLAST search to FASTA sequence format) for subsequent processing. See `output_formats`_. .. _output_formats: formats_out.html Linking identifiers to external resources ----------------------------------------- Using the ``-srs on`` option with HTML output, it is possible to convert sequence identifiers into links to a sequence database:: mview -in fasta -html head -css on -coloring id -colormap clustal -srs on data.dat .. raw:: html
  Colored by: identity
                               cov    pid  1 [        .         .         .         .         :         .         .         ] 80
  1 sp|P00533.2|EGFR_HUMAN  100.0% 100.0%    FKKIKVLGSGAFGTVYKGLWIPEGEK---------VKIPVAIKELREATSPK-ANKEILDEAYVMASVDNPHVCRLLGIC   
  2 sp|Q9I7F7.3|PR2_DROME    97.1%  35.7%    ISVNKQLGTGEFGIVQQGVWSNGNE-----------RIQVAIKCLCRERMQS-NPMEFLKEAAIMHSIEHENIVRLYGVV   
  3 sp|Q08881.1|ITK_HUMAN    90.0%  32.9%    LTFVQEIGSGQFGLVHLGYWLN--------------KDKVAIKTIREGAMS---EEDFIEEAEVMMKLSHPKLVQLYGVC   
  4 sp|Q13308.2|PTK7_HUMAN   97.1%  21.2%    IREVKQIGVGQFGAVVLAEMTGLS-XLPKGSMNADGVALVAVKKLKPDVSD-EVLQSFDKEIKFMSQLQHDSIVQLLAIC   
  5 sp|P34265.4|KIN31_CAEEL 100.0%  31.5%    VELTKKLGEGAFGEVWKGKLLKILDA-------NHQPVLVAVKTAKLESMTKEQIKEIMREARLMRNLDHINVVKFFGVA   
  
The identfiers need to conform to patterns such as:: database|accession|identifier database:identifier like those produced by the NCBI, EBI or other blast services. Links will be constructed if the patterns are listed in the ``SRS.pm`` library, which is part of this software. You can modify and extend this file to include more patterns if you know some Perl and the format of the URLs needed to access the sequence databases of interest. Of course, this linking mechanism works for any recognised input data format, not just blast results. Memory usage and speed ---------------------- Use of memory by MView can be very great, particularly if you try to process complete sets of PSI-BLAST cycles each containing 1000s of hits all at once. Use of most filtering options should reduce memory requirements by cutting down the number of internal data structures created. In particular, if you only want to see the scoring information and don't care about the sequence alignments, you can switch these off with the ``-sequences off`` option, which will also speed up the program. Likewise, processing each alignment (for example, plus strands and then minus strands) separately will save memory, or you can use the option ``-register off`` to cause each alignment to be output when ready (by default all alignments are saved until the end so they can be printed with fields in vertical register). .. END