A reductive dehalogenase amino acid sequence is considered to be part of an ortholog group (OG) if it shares more than 90% pairwise identity with all current members of an OG. New sequences which cannot be sorted into a group are added as singletons. A more detailed breakdown is described below:
We first ascertain the full-length pairwise amino acid identity of your novel sequence(s) with the best-match protein from the reductive dehalogenase database. This is done using BLASTP searches of your sequence queried against the current reductive dehalogenase amino acid sequence database.
a) If your sequence has less than 90% identity at the amino acid level to any sequence in the current database, it is given a singleton identifier number.
b) If your sequence has greater than or equal to 90% pairwise identity at the amino acid level to a sequence in current database, we identify the group that your sequence belongs to.
| Cost to open a gap: | 11 |
| Cost to extend a gap: | 1 |
| Penalty for mismatch: | -3 |
| Reward for match: | 2 |
| Expected Value (E-Value): | 0.000001 |
You can construct a phylogenetic tree including your sequence from the blast result page. This tree (Neighbor Joining) is calculated from an alignment constructed from your sequence and the full database or a selection of sequences. Both the alignment and the tree is constructed by MUSCLE.
Notably, the alignment is not edited, and the resulting tree should only be use as a first approximation of the phylogenetic position of your sequence.
To construct phylogenetic trees for publications we recommend you download the sequences you want to include in the tree, align them in MUSCLE or MAFFT, inspect and edit the alignment manually, and then construct the phylogenetic tree using your favourite software (e.g. IQTree, FastTree, RAxML).
For automatic submissions, we only accept publicly available sequences that have an accession number from Genbank or IMG, and that have been annotated as a reductive dehalogenase. You can submit upto 20 sequences per file.
The sequences must be in fasta format with the following header:
> Self-assigned ID or Name | Accession No. | Organism | Citation (optional) |
Your Amino Acid Sequence/Nucleotide Sequence ...
> 8657042VS | WP_012881438 | Dehalococcoides mccartyi VS | |
MEMNIYHSTISRRNFMKGLGLSGAALGAATASA...
> ...
...