GMI Logo

Input

Query type

MtDNA haplotypes can be queried in two different input formats, either as differences to the rCRS or as FASTA/sequence string.
  • differences to the rCRS
    Sequence is entered as differences to the revised Cambridge Reference Sequence (rCRS, Andrews et al 1999).
  • FASTA/sequence string
    Sequence is entered as consecutive string of bases (e.g. copy&paste from a text file or a consensus from sequence analysis software). Please do not enter header information like in FASTA format, enter nucleotides only.
In either case the query haplotype is internally converted into a consecutive string of nucleotides (FASTA-like format) and compared to the database sequences, which are also converted to strings.

Sample Info

The sample-specific information identifies a search. This is also the reference under which the query is reported. In addition the query is given a unique result identification number (RID, see search output).

Sequence range

It is important to specify the sequence range of your query haplotype.
The stringvalidation database holds entire control region (CR) information (16024-576; see overview for details). In addition, individual sequence ranges including single positions can be defined as sequence range.

mtDNA profile

Profile
MtDNA sequence data can be entered
  • as differences to rCRS: specify the mutations separated by blanks following the standard forensic annotation, e.g. 263G 309.1C 309.2C. Deletions can be annotated as 16193DEL or 16193del or 16193-. 16193D is considered a mixture of A, G, and T (IUB code).
  • or in string format (FASTA-like sequence without header information): you can copy&paste sequence stretches from analysis software or text strings. Each sequence string requires the respective sequence range.
Sequence range
e.g. 16024-16365, 16024-576 (full control region) or Control Region SNPs, such as 16519. Insertions after the last position of the sequence range are considered out of range, e.g. 455.1C is considered lying outside range 450-455. Extend the sequence range to the subsequent position to include insertions, e.g. 450-456.

Options

  • Match type: Select how the bases of the query profile and the database profiles are matched.
    • Pattern match: mixture designations match its individual components (Y={C,T,Y}).
      Example: 152Y matches 152T and 152C.
    • Literal match: mixture designations are considered exclusive to all other nucleotide designations (Y={Y}).
      Example: 152Y matches only 152Y.
      This is helpful for determining the observed heteroplasmic mutations in the stringvalidation database.
  • Number of differences displayed
    Select the maximum number of differences between the query profile and the database profiles (range 1 - 5). Database profiles with more than the selected number of differences are not shown individually in the results.
    Default value is 5 (3 for unregistered users).
  • Disregard InDels in length variants at positions 16193, 309, 455, 573
    Length variants that are known hotspots for indels can be ignored in a search. This involves the C-runs around positions 16193, 309 and 573 and the T-run around position 455 relative to the rCRS. You can select these positions individually in order to exclude mutations there from a search. To optimize the run time of a search the number of ignored mutations has been limited to 5 per queried region (e.g. HVS-I).
    Note that the uninterrupted C-run around position 16189 may collapse into a single C-run if the respective transition occurs without other transitions in its vicinity.
    Example 1:
    Consider the following profile of the stringvalidation database
    16189C 16193.1C 16519C 263G 315.1C
    and the query profile
    16189C 16519C 263G 315.1C
    The database profile harbors an uninterrupted C-run in HVS-I (16189C). The length variation at 16193.1C from the base profile is ignored in a search if the hotspot 16193 was selected from the list.
    Example 2:
    Consider the following profile of the stringvalidation database
    16189C 16192T 16270T 16398A 73G 150T 263G 315.1C
    and the query profile
    16189C 16191.1C 16192T 16270T 16398A 73G 150T 263G 315.1C
    The database profile harbors an interrupted C-run in HVS-I that is shifted with respect to rCRS (16189C 16192T). The length variation at 16191.1C from the query profile is not ignored in a search if hotspot 16193 was selected from the list.

Source

The stringvalidation database is using a structure for the haplotypes, which suits the forensic demand. Haplotypes are grouped with respect to the
  • geographical affiliation, therein classified into continent, UN region, country, and locality and
  • population-specific affiliation, therein classified into a hierarchical metapopulation system consisting of 4 levels.
This information can be retrieved in the output of a search.
Validationdata: Haplotypes stored under "Validationdata" are represented by their raw data and they are permanently linked to the database entries. The minimum requirement is full double-strand coverage of high quality sequences within the sequence range.

Output

Please note that tabs can be used to change between input and output (do not use the "BACK"-function of your browser).
The Summary tab provides
  • an overview of the query settings chosen.
  • a summary statistics of the number of haplotypes found in the database grouped by the number of differences to the queried profile. The third column displays the cumulative number of haplotypes found, thus the lower-most number indicates the number of haplotypes that met the search criteria. This table is accompanied by a graphical representation of the number of haplotypes found. For a list of individual haplotypes in a new tab either click on one of the numbers in the first column of the table or select the appropriate column in the bar chart.
    Matching profiles can further be restricted to a specific geographical affiliation and/or a (sub)metapopulation. You can choose from either of the two drop-down lists independently. Only those options remain in the two lists that meet the actual restrictions, that is, any selection in one list also affects the other list. To remove a restriction choose "All" in the appropriate list.
To change the query activate the tab labeled "Input", change the input options and press SEARCH again.