SEQ, SEQUENCE
NAME
SEQ, SEQUENCE - manipulate the content of the sequence buffer.
SYNOPSIS
SEQ = three_letters_code
SEQ LOAD filename
SEQ READ filename
SEQ FROM structure_identifier
SEQ COPY
SEQ SAVE filename
SEQ SWN filename
SEQ RESET
DESCRIPTION
The command SEQ (long form: SEQUENCE) manipulates the content of the main
sequence buffer. Garlic mantains two sequence buffers: the main buffer and
the reference buffer. The main sequence buffer is used to prepare the average
hydrophobicity plot, the hydrophobic moment plot, helical wheel plot and for
some other operations which require the sequence information. The reference
sequence buffer is used for sequence comparison and other operations which
require two sequences.
Both buffers store the following sequence information:
(1) The number of residues.
(2) The sequence in the form of three letters code. Uppercase letters are used.
(3) Disulfide bond flag, if information about disulfide bonds is available.
(4) Residue serial numbers.
(5) Raw hydrophobicity values (replaced by average value for exotic residues).
In addition, the main sequence buffer contains the following information:
(6) The average hydrophobicity.
width.
(7) The hydrophobic moment.
As sequence information may be given independently from any structure, atomic
coordinates are not required for most sequence manipulation routines. Thus,
garlic may be used as the sequence analysing tool.
All version of the command SEQ, except one, are used to manipulate the content
of the main sequence buffer. The only exception is SEQ COPY, which copies the
content of the main sequence buffer to the reference buffer. This is the only
way to store information to the reference buffer.
SEQ = three_letters_code
The command SEQ may be used with the keyword = (equal sign) to define sequence
at garlic command prompt. This may be practical to define a short sequence
fragment. This fragment may be used for helical wheel plot, or to locate the
given sequence fragment in a structure which is being investigated.
The syntax:
SEQ = three_letters_code
Example:
seq = ala phe tyr trp asn
The sequence fragment will be converted to uppercase. The sequence is not 
checked for exotic residues so you can use the non-standard codes. However,
the routine which assigns the hydrophobicity values will fail to recognize
them. The average hydrophobicity value (calculated for the current scale)
will be assigned to these residues. At present, 23 codes are recognized:
 
SEQ LOAD filename
The keyword LOAD (or READ, short forms LOA and REA) may be used to read the
sequence from the specified file. Garlic is capable to recognize two types
of input file formats: FASTA files (one letter code) and files which contain
three letters code in a free format.
If input file contains the symbol > (greater than) in the first column of
the first useful line, the file is treated as one letter protein code in
FASTA format. Empty lines are ignored. The lines beginning with the symbol
# (numbersign) in the first column are treated as comments (ignored too).
Thus, the lines which are not empty and do not contain the symbol # in the
first column are treated as useful.
If input file is not recognized as FASTA file, it is expected to contain the
three letters code in a free format. Empty lines and all lines which
contain # in the first column are ignored. All other lines are treated as
useful. Digits (serial numbers, for example) are ignored.
The following characters are threated as separators:
(1) space
(2) tab
(3) comma (,)
(4) semicolon (;)
(5) newline (line feed)
If input file contains at least one bad code (a residue name which consists
of four letters, for example) the reading will fail. The hard-coded maximal
number of residues is 20000, but it may be easily changed (see MAXRESIDUES
in the header file defines.h).
Example:
load sample.fasta
SEQ FROM structure_identifier
The keyword FROM (short form: FRO) may be used to copy the sequence from
the specified structure to the main sequence buffer. Only selected residues
are copied. Residue is treated as selected if the first atom is selected.
For proteins, this is typically N (nitrogen). Residue insertion codes are
ignored! Thus, the same residue serial index (number) may appear more than
once in the array of residue serial numbers.
Example:
seq from 1
SEQ COPY
The command SEQ COPY (short form: SEQ COP) copies the sequence from the main
sequence buffer to the reference buffer. This is the only way to initialize
the reference buffer. This command must be executed (i.e., the keyword COPY
must be used) before executing commands which require two sequences for proper
operation. The main sequence buffer may be initialized prior to SEQ COPY by
using one of the keywords described above (=, LOAD or FROM).
Example:
seq copy
SEQ SAVE filename
The command SEQ SAVE (short form: SEQ SAV) saves the sequence to the
specified file. Ten codes (each consisting of up to three letters) are
written per line, separated by space. Serial numbers are not included
(but see the keyword SWN).
Example:
seq save 9pap.seq
SEQ SWN filename
The command SEQ SWN saves the sequence to the specified file. Both residue
names and serial numbers are written to the output file. Insertion codes will
be missing! Five serial numbers and residue names are written per line,
separated by space.
Example:
seq swn 9pap.seq
SEQ RESET
Reset (clear) the main sequence buffer. The command SEQ RESET (short form:
SEQ RES) sets the number of residues in the main sequence buffer to zero.
The storage is not freed, so the buffer may be used again later.
Example:
seq reset
RELATED COMMANDS
PLOT prepares the average hydrophobicity and/or hydrophobic moment plot.
COMPARE compares two sequences. VENN draws Venn diagram. WHEEL draws helical
wheel plot. SEL SEQ selects portions of the structure which contain the
sequence stored to the main sequence buffer. To use any of these commands,
the main sequence buffer (to use COMPARE both buffers) must be initialized by
using the command SEQ. STR defines the secondary structure and CREATE may be
used to create a new peptide.