Preprocessing

The Preprocessing tab is the first step in the MCNV2 workflow. It performs CNV annotation and inheritance status calculation.

Overview

This tab allows you to:

  1. Upload input files (CNV calls, pedigree, problematic regions)

  2. Set inheritance parameters (overlap threshold, genome build)

  3. Annotate CNVs with genes, LOEUF scores, and problematic regions

  4. Calculate inheritance status (transmitted vs non-transmitted)

  5. View annotated results

Input files

CNV file (mandatory)

Tab-delimited file with columns: CHR, START, STOP, TYPE (DEL/DUP), SAMPLE_ID

See Input formats for detailed specifications.

Pedigree file (mandatory)

Three-column file: SAMPLE_ID, FATHER_ID, MOTHER_ID

Only complete trios are analyzed.

Problematic regions file (optional, BED format)

BED file with problematic genomic regions:

  • Segmental duplications

  • Centromeres

  • Telomeres

  • HLA region

Note

A default file is provided. You can replace it with your own BED file if needed.

Parameters for inheritance calculation

Inheritance threshold (child CNV proportion)

Default: 0.5 (50%)

This parameter defines the minimum reciprocal overlap required between a child CNV and a parental CNV to consider the child CNV as inherited.

  • Value range: 0.01 to 1.0 (1% to 100%)

  • Interpretation: If ≥X% of a child CNV overlaps with a parental CNV (father or mother), the CNV is classified as inherited (True)

  • Example: With threshold=0.5, a child CNV is inherited if at least 50% of it overlaps with a CNV in at least one parent

Genome build

  • Default: GRCh38/hg38

  • Planned: GRCh37/hg19 (not yet implemented)

Workflow

Step 1: Upload files

Upload your CNV file, pedigree file, and optionally a custom problematic regions file.

Step 2: Set parameters

  • Inheritance threshold: Adjust the reciprocal overlap percentage (default 50%)

  • Genome build: Select GRCh38/hg38

Step 3: Submit

Click Submit to start annotation and inheritance calculation.

Annotation process

MCNV2 annotates each CNV with:

Gene annotation

Each CNV is intersected with gene coordinates (Gencode v45):

  • GeneName — HGNC gene symbol

  • GeneID — Ensembl gene ID

  • Transcript — Ensembl transcript ID

If a CNV overlaps multiple genes, one row is created per gene (CNV-gene pairs).

LOEUF scores (gnomAD v4)

LOEUF (Loss-of-function Observed/Expected Upper bound Fraction) quantifies gene constraint:

  • Low LOEUF (≤0.6): Highly constrained genes (intolerant to loss-of-function)

  • High LOEUF (>0.6): Less constrained genes

LOEUF is used for:

  • Stratifying Mendelian Precision by gene constraint

  • Optional filtering (exclude constrained genes to focus on technical precision)

Problematic regions overlap

Percentage of CNV overlapping with problematic regions:

  • Segmental duplications

  • Centromeres

  • Telomeres

  • HLA region

Note

This percentage is used in the filtering step to exclude CNVs with high overlap (e.g., >50%).

Annotated CNV table

After clicking Submit, the first table displays annotated CNVs:

Columns:

  • Original CNV file columns (CHR, START, STOP, TYPE, SAMPLE_ID, quality scores, etc.)

  • GeneName — Overlapping gene name

  • GeneID — Ensembl gene ID

  • Transcript — Ensembl transcript ID

  • LOEUF — Constraint score

  • problematic_region_overlap — Percentage overlap with problematic regions

Table navigation:

  • Show 10, 50, 100, or all entries

  • Scroll horizontally to view all columns

File path:

The file path where the annotated table is saved is displayed below the table.

Inheritance status calculation

Click Proceed to inheritance status to calculate transmission for each CNV.

MCNV2 uses two complementary approaches to determine inheritance:

  1. Transmitted_CNV (coordinate-based) — Based on genomic coordinate overlap

  2. Transmitted_gene (gene-based) — Based on shared affected genes

See also

See Inheritance status for a comprehensive explanation of the two inheritance matching approaches.

Column values in the inheritance table:

  • Transmitted_CNV: True / False

  • Transmitted_gene: True / False / intergenic

Inheritance status table

The second table displays inheritance results:

Columns:

  • All columns from the annotated CNV table

  • Transmitted_CNV — True/False (coordinate-based inheritance)

  • Transmitted_gene — True/False/intergenic (gene-based inheritance)

Interpretation:

  • True (both columns): CNV is inherited from at least one parent

  • False (Transmitted_CNV): Candidate de novo CNV (no parental overlap)

  • False (Transmitted_gene): Gene not affected in parents

  • “intergenic” (Transmitted_gene): CNV does not overlap any gene

File path:

The file path where the inheritance table is saved is displayed below the table.

Next step: Mendelian Precision analysis

Once inheritance status is calculated, click Go to Mendelian Precision analysis to:

  • Compute Mendelian Precision across size ranges and quality thresholds

  • Apply filters (quality scores, problematic regions, LOEUF, caller concordance)

  • Generate publication-ready plots

See Mendelian Precision for details on the MP analysis workflow.

Tips

Understanding the two inheritance approaches

Both Transmitted_CNV (coordinate-based) and Transmitted_gene (gene-based) have different strengths:

  • Transmitted_CNV requires an overlap threshold but works for all CNVs

  • Transmitted_gene is more robust to breakpoint and CNV fragmentation but only works for genic CNVs

Recommendation: Use both approaches to get a comprehensive view of CNV inheritance.

See Inheritance status for detailed comparison, advantages, limitations, and use cases.

Filtering intergenic CNVs

Intergenic CNVs can only be evaluated using coordinate-based matching (Transmitted_CNV).

Table too large to display

Use pagination (show 10/50/100 entries).

Troubleshooting

Error: “No complete trios found”

Check that:

  • Pedigree file has correct format (SAMPLE_ID, FATHER_ID, MOTHER_ID)

  • All three IDs (child, father, mother) are present in the pedigree file

  • Sample IDs are consistent between CNV and pedigree files

Warning: “Some samples in pedigree not found in CNV file”

This is normal. If a sample has no detected CNVs, it won’t appear in the CNV file. The trio is still valid.

Table too large to display

Use pagination (show 10/50/100 entries) or download the full table for offline analysis.