gnomAD Ancestry Estimation (BETA)

Block Bootstrapping

We use block bootstrapping to estimate error for the ancestry proportions. We resample 3,357 centiMorgan blocks 1,000 times for the plots and confidence intervals shown here.

Numeric
Visual
Proportion Estimates for Block Bootstrapping

Distribution Plots and 95% Confidence Intervals

Random SNP Sample

We sample N random SNPs across the 22 autosomes to estimate ancestry proportions. We randomly sample 1,000 times for the plots and confidence intervals shown here. N can be varied to evaluate our method with different numbers of SNPs.

N Random SNPs

Numeric
Visual
Proportion Estimates for Random SNP Sample

Distribution Plots and 95% Confidence Intervals

Chromosome

Estimated ancestry proportions by chromosome using all SNPs.

Numeric
Visual
Proportion Estimates by Chromosome

Ancestry Adjusted Allele Frequency

Example of the adjusted allele frequency function within the Summix package. Adjust allele frequencies to match a target ancestral population (homogenous or admixed), either with user provided allele frequencies or from the gnomAD database.

BETA

User Input

Enter a position on the genome.

Allele Frequency Output Table

Reference Allele:

Alternate Allele:

Allele Frequencies

AFR AF:

AMR AF:

EAS AF:

NFE AF:

Estimated Ancestry Proportions

AFR Estimated Proportion:

AMR Estimated Proportion:

EAS Estimated Proportion:

NFE Estimated Proportion:

Target Ancestry Proportions

AFR Target Proportion:

AMR Target Proportion:

EAS Target Proportion:

NFE Target Proportion:

Target Ancestry

Allele Frequencies

Ancestry 1 AF:

Ancestry 2 AF:

Ancestry 3 AF:

Ancestry 4 AF:

Ancestry 5 AF:

Estimated Ancestry Proportions

Ancestry 1 Estimated Proportions:

Ancestry 2 Estimated Proportions:

Ancestry 3 Estimated Proportions:

Ancestry 4 Estimated Proportions:

Ancestry 5 Estimated Proportions:

Target Ancestry Proportions

Ancestry 1 Target Proportions:

Ancestry 2 Target Proportions:

Ancestry 3 Target Proportions:

Ancestry 4 Target Proportions:

Ancestry 5 Target Proportions:

Target Ancestry

Unadjusted Allele Frequency:

Adjusted Allele Frequency:

ReadMe

Purpose
Estimate the proportion of reference ancestry groups in summary genotype frequency data.

Data
Our reference panel was created from 1000 Genomes Project (GRCh37/hg19) superpopulations (African, Non-Finish European, East Asian, South Asian) and an Indigenous American population (616,568 SNPs and 43 individuals, GRCh37/hg19). Tri-allelic SNPs and SNPs with missing allele frequency information were removed, leaving 613,298 SNPs across the 22 autosomes.

We estimate the ancestry proportions from gnomAD V2 (GRCh37/hg19). After merging with our reference panel we checked for allele matching and strand flips. Our final dataset had 582,550 genome SNPs and 9,835 exome SNPs across the 22 autosomes.

Disclaimer

Under no circumstances shall authors of this website and ancestry estimation algorithm be liable for any indirect, incidental, consequential, special or exemplary damages arising out of or in connection with your access or use of or inability to access the ancestry estimation website or any associated software and tools and any third party content and services, whether or not the damages were foreseeable and whether or not the authors were advised of the possibility of such damages. By using the ancestry estimation platform you agree to use it to promote scientific research, learning or health.

Acknowledgements

This work was a collaborative effort by:
Ian S. Arriaga Mackenzie, Gregory M. Matesi, Alexandria Ronco, Ryan Scherenberg, Andrew Zerwick, Yinfei Wu, James Vance, Sam Chen, Kaichao Chang, Katie Marker, Jordan R. Hall, Christopher R. Gignoux, Megan Null, Audrey E. Hendricks

Additional Funding
CU Denver Undergraduate Research Opportunity Program (UROP)
CU Denver Education through Undergraduate Research and Creative Activities program (EUReCA)

Shiny App
Ian S. Arriaga MacKenzie
IAN.ARRIAGAMACKENZIE@ucdenver.edu

Principal Investigator
Audrey E. Hendricks, Ph.D.
AUDREY.HENDRICKS@ucdenver.edu