High-resolution haplogrouping with control/coding regions

High-resolution mtDNA haplogroup prediction using the Sanger sequencing of small fragmented control region sequences, tracking with coding region sequences for confirmation or sub-haplogrouping by narrowing and differentiating highly ranked haplogroups. DOI: 10.1101/2020.04.23.057646

High accuracy using control region sequences

Highly accurate prediction of haplogroup (HG) using control region (CR) or hypervariable region sequence fragments.

  • Using a novel algorithm based on PhyloTree definitions and our scoring system built with big data of haplotypes (n=118,869)
  • Extensively evaluated with 54,538 CR sequence samples comparing with HaploGrep 2.

A novel tracking solution

It was established through repeated blind simulation tests

  • Inputs sequence fragments via a unique user-friendly interface.
  • Minimizes the number of tests for HG tracking by narrowing down the predicted HGs by the integration of HGs to their MRCAs (most recent common ancestors).
  • Proposes differential variants between HGs.
  • Re-tracks with the fragments of proposed variants to confirm the HGs or to track sub-HGs.

Additional tools

  • 'HG tracking by fragment variant profiles' for variant profile input instead of sequences once the sample fragments are sequenced and their variants are extracted using the main tool.
  • 'Conserved region mapping tool for primer design' to secure successful PCR that is necessary to obtain additional sequences for further tracking.
  • 'HG database' to explore Phylotree HGs and their sub-haplogroups with their defining variant profiles.
  • 'Differntiation between HGs' to find differential variant information among user-inputted HGs.
  • QC tools: analysis of potential artificial recombination (included in main tools) and potential phantom mutants in a dataset
  • Variant format conversion tool for Phlyotreee, MitoTool, HaploGrep 2, and EMPOP
  • Major HG-specific variants

Super-HG prediction rates with CR sequences

