Transmutator user manual

Transmutator is a small utility that translates a DNA mutation into a protein mutation. It follows the syntax recommendations of den Dunnen JT and Antonarakis SE (2000). Hum.Mutat. 15:7-12 (which are somewhat outdated) and latter updates from the Human Genome Variation Society.

Transmutator is a freeware and can be downloaded from http://www.nouspikel.com/transmute.zip Source code is available upon request. Please report bugs to: nouspikel(at)yahoo(dot)com

Overview

Copy-paste the cDNA sequence of your gene of interest into the top text box. If the sequence does not start with the ATG, you can enter the position of the ATG in the dedicated field.
Enter the description of the DNA mutation in the "DNA mutation" field.
Click [Transmute]. The protein mutation appears in the "Protein mutation" field.
Optionally, the "Output" field gives you the wild-type or mutant DNA sequence, or the wild-type or mutant protein sequence.

You can enter a different DNA mutation and click [Transmute] again if you are checking several mutations on the same gene. Or click [Reset] to start over with a new gene.

The main window

Below is a detailed description of the various fields in the main application window.

cDNA sequence: Here you copy-paste the sequence of the gene you are interested in. Only A, T, G, C and U are considered, in lower or upper case. This means that numbering is ignored, and so are ambiguity codes (e.g. 'N').

ATG: If the sequence you pasted does not start at the ATG, you can enter the position of the ATG in this field.

Name: This optional field allows you to assign a name to your sequence. If you terminate the program by clicking on [Save & Exit], the current sequence is saved and will be loaded next time you start the program. The name field may thus come handy to remember what the sequence was.

DNA mutation: Here you enter the DNA mutation, according to the syntax rules suggested by den Dunnen and Antonarakis. Note that the initial "c." is optional. Upper- and lower-case characters are accepted.

Substitutions: c.76A>C or c.76_77AC>CT
Deletions: c.76_78del or c.76_78delACT or c.76del or c.76delA
Insertions: c.76_77insT or c76_77insACGT
Indels: c.112_117delinsTG or c.112_117delAGGTCAinsTG or c.113delinsTACTAGC
Duplications: c.77_79dup or c.77_79dupCTG or c.77_79dup3
Inversions: c.77_79inv or c_77_79inv3
Repeats: c.1210T[5] or c.1209_1211[14] or c.1209GGC[79]

Although it is formally illegal, the program will accept syntaxes in which the endpoint is missing but the sequence is provided: c.76delACT, c.77dupCTG, c.77inv3, etc.
From version 1.4, Tansmutator also accepts deletions specifying the start point and the number of deleted nucleotides: c.76del3

The program cannot predict consequences of intronic mutations, and will thus issue an error messages for such entries

Intronic substitutions: c.-14G>C c.88+1G>T c.89-2A>C c.*46T>A
Intronic deletions: c.88-?_923+?del c.(?_-30)_(*220_?)del c.88+101_oGJB2
Intronic insertions: c.123+54_123+55insT

In addition, the program does not handle large insertions or repeats for which the inserted (or repeated) sequence is provided as a link to an online sequence: c.76_77insAB012345.2:g.76_420

It is possible to enter several mutations on the same allele, by separating them with semicolons (with or without enclosing brackets):

c.[7G>A ; 11_13delTCG] or 7G>A;11_13delTCG

Be warned that no attempt is made to detect absurd combinations, such as a substitution occuring inside a deletion...
In line with current recommendations, the program always describes the protein mutation as a single event, no matter how many DNA mutations you combined. If you wish to encode two widely separated DNA mutations as two distinct protein mutations, enter them one by one.

Numbered from: Indicates the numbering scheme used in the mutation description. By convention, numbering should start from the ATG, but some paper (particularly those published before the guidelines were established) number nucleotide with respect to a sequenced deposited in a database. This sequence does not necessarily begin with the ATG, implying that mutation numbering must be adjusted. You can specify the adjustment in this list, or pick an option from the drop list:

ATG (by convention). This will be converted to 1 when you leave the field. It is the established numbering convention and is used by default when starting the program.
Start of cDNA sequence. To be used when numbering refers to a sequence that does not start with the ATG. You should cut-and-paste the sequence in the "cDNA" box, enter the position of the ATG in the "ATG" field, and select "Start of cDNA sequence" in the drop-list. When you leave the field, the negative of the ATG value will appear here (e.g. if the ATG is at 83, the field will read as -83).
Upstream of ATG (-number). Use this option if numbering starts from an arbitrary position, upstream of the ATG. The number should be negative, with -1 being the first nt before the ATG (Note that the ATG itself is nucleotide 1, as there is no nucleotide number 0, by convention).

[Transmute] Click this button to introduce the mutation into the DNA, translate it to a protein, and get a description of the protein mutation.

Output: With the drop list, you can select the type of output that you wish: wild-type DNA, mutant DNA, wild-type protein, or mutant protein.

[Copy] Allow you to copy the contents of the "Output" field to the clipboard in a single click. This could also be done by selecting the whole field and doing Ctrl-C.

[Options] Calls up a dialog box that lets you specify various formatting options.

Protein mutation: This field will contain the consequences of the DNA mutation at the protein level, described according to the syntax of den Dunnen and Antonarakis. The "Options" dialog box lets you select various alternative formats. One important notion is that protein mutations must be described without assuming knowledge of the DNA mutation. So for instance, if an insertion in the DNA causes a frameshift that happens to match the end of the protein, it will be listed as a deletion. We know it's a frameshift from the DNA sequence, but at the protein level it looks like a deletion and must be listed as such.

[Copy] Copies the contents of the "Protein mutation" field to the clipboard.

[Reset] This button clears the input field, and reset all sequences.

[Help] This button opens the help file (which you're reading now) into your default browser.

[Save & Exit] Quits the program after saving your options, and the current wild-type DNA sequence. These are saved in a file called "Transmutator.opt", from which the will be reloaded next time you start the program.
Important: if this file gets corrupted, it can prevent the program from starting normally. In such a case, delete the .OPT file and start over.

Note that nothing gets saved if you leave the program by closing the main window, which allows you to leave without changing your options.

The Options dialog box

This dialog box lets you define the output format for DNA, protein, and protein mutation.

DNA display format

Upper case: Check this box for a sequence in upper case (ACGT), leave it unchecked for lower case (acgt).

Group: Nucleotides can be displayed in groups separated with spaces, to facilitate reading.Select the group size from the the drop box: 3 nucleotides (i.e. one codon), 10 nucleotide, or the whole line.

Nt per line: Number of nucleotides to display on each line.

Number lines: Check this box to have each line start with the number of its first nucleotide.

Protein display format

You can display amino acids in 1-letter (MSAX) or 3-lettes code (MetSerAlaStop). Pick the desired value from the drop list.

Group: Amino acids can be displayed in groups separated with spaces, to facilitate reading. Enter the desired group size, or 999 for the whole line.

Nt per line: Number of amino acids to display on each line.

Number lines: Check this box to have each line start with the number of its first amino acid.

End at first stop codon: Check this box to have the protein end at the first stop codon, even if the DNA sequence is longer.

Display stops as: These two fields let you decide how to display stop codons in 3-letter and 1-letter code respectively. For instance, you could use "Xxx" or "***" or "Stop" in 3-letter mode, and "X" or "*" in 1-letter mode.

Protein mutation syntax

This box lets you decide between various ways of reporting the different types of protein mutation. First, you can decide whether you'd like the mutation description to use 1-letter or 3-letter amino acids.

Stop: Here you can enter the string to be used to represent a stop codon. For instance, "*" or "Stop".

ATG change: Altering the initial ATG can have various consequences: no protein produced (coded as: p.0), translation starting at a downstream ATG or even an upstream ATG. Without experimental evidence, the recommendation is to use the "unknown consequences" syntax: p.Met1? However, you have the option to use p.Met1Stop whenever the change would lead to a stop, or to use p.0.

Stop lost: Allows different syntaxes to represent mutations of the final stop codon to a valid amino acid, resulting in an extension of the protein. *110Alaext*17 means that the original protein now contains an extra 17 amino acids. Whereas Nostop110stop127 indicates that the mutated protein now contains a stop codon at position 127. Both notations are equivalent, but den Dunnen & Antonarakis recommend the first one.

Sustitutions: Gives you the choice between the official syntax listing the old and new amino acid (p.Arg20Ser) or a short form that only lists the new amino acid (Ser20).

Frameshift: Allows different syntaxes to represent frameshifts. Arg20fs*25 is the original recommendation, listing the first modified amino-acid and the new position of the stop codon (counting from the ATG). This recommendation was later modified to include the new identity of the first frameshifted amino acid and to number the stop codon from the start of the frameshift (not from the ATG): Arg20Profs*6. Finally, you can use a short form that does not reflect the size of the frameshift: Arg20fs.

Minimum end-of-mutant match for indels: Each and every frameshift can be considered as an indel: insertion of the frame-shifted bit, deletion of the rest of the protein. It is even more tempting to do so when the last framshifter amino acid(s) match the sequence of the wild-type protein (at the limit, if the entire framshift matches the WT protein, it looks like a simple deletion). Conversely, a deletion might be considered as a frameshift that matches the end of the WT protein out of sheer luck. Statistically, a 1-aa acid match will happen about 1 time in 20, a 2-aa match 1 time in 400, etc. The only way we can be sure that it's a frameshift is if we know what happened at the DNA level, but we are not supposed to use DNA information when describing a protein mutation... So Transmutator offers you a compromise solution: you can enter a minimum number of amino-acids that must be matched between the end of the mutant protein and the WT sequence. If this cutoff is met, the mutation will be described as an indel (or a deletion, as the case might be). Otherwise, it will be considered a frameshift. Thus, if you enter 0, frameshifts will never be reported. If you enter 1, frameshift will only be reported if no match is found between the framshifted bit and the WT protein. If you enter 2, you demand a 2-aminoacid match to report an indel (this is the default). Entering a very high number here will cause all deletions to be diagnosed as frameshifts.

Insertions: Allows different options to represent insertions. With Lys2_Met3insGlnSerLys the inserted amino acids are spelled out, whereas Lys2_Met3ins3 only indicates the number of inserted amino acids. You can select "Use number if more than" to switch from one syntax to another, depending on the size of the insertion. Enter the threashold value in the nearby box: if the insertion if larger that the specified value, the numeric format will be used, otherwise the inserted amino acids will be listed.

Detect duplications: If this box is not checked, duplications will be listed as insertions (a duplication is a special type of insertion). If the box is unchecked, perfect repeats will be listed as duplications: p.His7_Gln8dup (or p.Gly4dup if only one amino acid is involved). Be aware that, duplications in DNA sometimes result in a codon change at the junction between the two repeats. In such cases, since there isn't a perfect repeat at the protein level, the mutation will always be listed as an insertion.

Version 1.0 23/2/2012. c) Thierry Nouspikel www.nouspikel.com

Version 1.1 3/4/2012. Added "Numbered from" correction, for paper not respecting the recomemded scheme.

Version 1.2 17/4/2012. Added "Minimum end match" to discriminate between indels and frameshifts.

Version 1.3 8/11/2012. Added illegal syntaxes for cDNA mutations, changed default options, changed contents of .opt file.

Version 1.4 24/1/2013. Added 76_del8 syntax. Corrected numbering bug for delins longer than 1 aa, handled stop deleted when cDNA stops at stop codon.