Mendelian Precision

Mendelian Precision (MP) is the core quality metric in MCNV2, quantifying the proportion of CNV calls that follow Mendelian inheritance patterns in parent–offspring trios.

Note

Mendelian Precision is defined as:

\[\text{MP} = \frac{I}{N} = 1 - \frac{E}{N}\]

where:

N = Total number of CNVs detected in offspring
I = Number of inherited CNVs (found in at least one parent)
E = Number of non-inherited CNVs (not found in parents)

Rationale:

Given the low expected rate of genuine de novo CNVs (typically 1.92% per individual), non-inherited CNVs are predominantly false positives. MP therefore provides a biologically grounded estimate of CNV call precision without requiring an external reference callset.

Interpretation

High MP (≥80%):

Most CNVs are inherited → High call quality
Few false positives
Callset suitable for downstream analyses

Moderate MP (50-80%):

Mixed quality
May benefit from additional filtering
Investigate size-specific or quality-score-specific patterns

Low MP (<50%):

High false positive rate
Requires aggressive filtering or caller parameter tuning
Consider alternative CNV calling methods

Transmission types

MCNV2 calculates MP using two complementary approaches:

CNV-level matching (coordinate-based) — Based on genomic coordinate overlap
Gene-level matching (gene-based) — Based on shared affected genes

MP stratification

To identify quality issues and optimization strategies, MP should be computed across multiple dimensions:

By CNV size

Size ranges:

1-30 kb
30-50 kb
50-100 kb
100-200 kb
200-500 kb
500 kb-1 Mb
>1 Mb

Typical pattern:

Small CNVs (<30 kb): Low MP (high false positive rate)
Medium CNVs (50-200 kb): Moderate to high MP
Large CNVs (>500 kb): High MP (more reliable)

Interpretation:

If MP increases dramatically with size, consider applying a minimum size filter.

By CNV type

DEL vs DUP:

MP is calculated separately for deletions and duplications, as they often have different precision profiles.

By quality score

Quality metrics:

MP can be stratified by various quality scores:

Score — Caller-specific quality score
SNP — Number of supporting SNPs (array data)
% Overlap — Reciprocal overlap percentage between CNVs detected by multiple algorithms

Typical pattern:

MP increases with quality score, often plateauing at a certain threshold. This plateau identifies the optimal filtering threshold.

Example:

Score ≥10: MP = 65%
Score ≥30: MP 780%
Score ≥50: MP= 90%
Score ≥200: M = 9 2% (plateau)

Optimal threshold: 50 (additional filtering beyond this point provides no MP improvement)

Filtering strategies

To maximize MP while retaining biologically relevant CNVs, MCNV2 supports multiple filtering approaches:

CNV-level filters

Size filters:

Minimum size (bp)
Maximum size (bp)
Target specific size ranges

Quality score filters:

Minimum quality score
Minimum number of supporting probes/reads
Minimum caller concordance

Important

Highly recommended: Apply problematic region filters to all datasets.

CNVs in these regions are enriched for false positives due to:

Read mismapping (segmental duplications)
Poor mappability (centromeres, telomeres)
Genuine polymorphism (HLA region)

Recommended threshold: Exclude CNVs with >50% overlap.

Gene-level filters

Exclusion lists:

Upload a list of genes to exclude (e.g., immunoglobulin genes, olfactory receptors)
Useful for removing genes prone to technical artifacts

Gene constraint filters (LOEUF):

Exclude CNVs affecting highly constrained genes (LOEUF < threshold)
Rationale: CNVs in constrained genes may be genuine de novo events, reducing technical MP

Important

LOEUF filter for MP calculation only

Excluding CNVs in constrained genes (low LOEUF) helps distinguish:

Technical MP — Precision excluding likely de novo events
Overall MP — Precision including all non-inherited CNVs

CNVs affecting constrained genes are enriched for genuine de novo events, which reduce MP but are biologically real. Excluding them from MP calculation provides a cleaner assessment of technical false positive rate.

Critical: These CNVs must be retained in your final dataset for downstream analyses (disease association, burden tests) as they may represent pathogenic variants.

See Filtering strategies for detailed filtering strategies and optimization approaches.

Optimal threshold identification

Strategy:

Plot MP versus quality score threshold (line plot)
Identify where MP plateaus
Use the lowest threshold at which MP reaches plateau

Example:

For deletions 50-100kb:
- Score ≥30  → MP = 75% (n = 500)
- Score ≥50 → MP = 85% (n = 300)
- Score ≥70 → MP = 92% (n = 150)  ← Plateau starts
- Score ≥70 → MP = 92% (n = 140)  ← No further MP gain

Optimal threshold: 150 (plateau reached, retains 150 CNVs)

Technical vs biological non-inheritance

Challenge:

Non-inherited CNVs include both:

Technical false positives (reduce MP, should be filtered)
Genuine *de novo* CNVs (reduce MP, should be retained)

Solution:

Use gene constraint (LOEUF) to distinguish these two categories:

CNVs affecting constrained genes (low LOEUF) → Enriched for de novo events
CNVs affecting unconstrained genes → Enriched for false positives

Workflow:

Calculate MP (all CNVs) → Includes both technical and biological non-inheritance
Calculate MP (excluding LOEUF < 0.6) → Focuses on technical precision only
Compare the two values:
- Large difference (>10%): High de novo rate in constrained genes
- Small difference (<5%): Most non-inherited CNVs are false positives

Important

LOEUF exclusion is for assessment only

Example:

1000 CNVs total:
- MP (all CNVs) = 85% → 150 non-inherited
- MP (LOEUF ≥ 0.6) = 92% → 80 non-inherited (970 CNVs after excluding 30)

Interpretation:
- ~30 CNVs (3%) likely genuine *de novo* in constrained genes
- ~120 CNVs (12%) technical false positives
- Keep all 1000 CNVs for downstream analysis
- Apply quality filters to remove the 120 false positives

Global vs stratified MP

Global MP:

Single value for all CNVs (or all DEL, all DUP)
Simple summary metric
May mask quality issues in specific size ranges or quality bins

Stratified MP:

MP computed for each size range and/or quality threshold
Reveals patterns (e.g., low MP for small CNVs)
Enables targeted filtering strategies

Best practice: Always examine stratified MP before applying filters.

Use cases for Mendelian Precision

1. CNV caller evaluation

Compare MP across different callers or parameter settings to identify the best configuration.

2. Quality control

Assess CNV call quality in a new dataset.

3. Filter optimization

Systematically evaluate the impact of different filters on MP to maximize precision while retaining CNVs.

4. Method development

Use MP as an optimization criterion when developing new CNV calling methods.

5. Publication

Report MP alongside standard metrics (sensitivity, specificity) to provide a biologically grounded quality assessment.

Limitations

Assumptions:

Parents are biological parents (not step-parents or adoption)
Trio relationships are correctly specified in pedigree file
De novo CNV rate is low (~1.92%)

Caveats:

MP does not distinguish between technical false positives and genuine de novo events
MP is not a measure of sensitivity (cannot detect false negatives)

Mendelian Precision

Interpretation

Transmission types

MP stratification

By CNV size

By CNV type

By quality score

Filtering strategies

CNV-level filters

Gene-level filters

Optimal threshold identification

Technical vs biological non-inheritance

Global vs stratified MP

Limitations

See also