Input formats

MCNV2 requires two main input files: a CNV callset and a pedigree file.

Important

Column order is critical. Column names are flexible, but the position of each column must be respected exactly as described below.

CNV callset file

Format: Tab-delimited file

File extension can be .txt, .tsv, .bed, .xls, or any other extension, as long as the content is tab-delimited.

Required columns (in order):

  1. CHR — Chromosome (e.g., chr1, 1, chrX)

  2. START — CNV start position (integer)

  3. STOP — CNV end position (integer)

  4. TYPE — CNV type: DEL (deletion) or DUP (duplication)

  5. SAMPLE_ID — Sample identifier

Optional columns (after the required 5):

  • Quality score (highly recommended) — Caller-specific likelihood score (e.g., PennCNV quality score)

  • Number of probes

  • Caller concordance percentage (if CNVs are merged from multiple callers)

  • Any other metadata

Important

Quality score (column 6) is highly recommended.

While optional, the quality score enables generation of MP versus quality score curves, which are essential for identifying optimal quality thresholds. Without this column, only size-stratified bar plots will be available.

Example:

chr1    1000000    1050000    DEL    child001    32.5    150
chr1    2000000    2100000    DUP    child001    45.2    220
chr2    5000000    5080000    DEL    child002    28.1    95

# Column 6 (32.5, 45.2, 28.1) = Quality scores ← Highly recommended!

Note

  • Column names can be anything (e.g., chromosome, chr, chrom are all acceptable)

  • Column order cannot change — position 1 = chromosome, position 2 = start, etc.

  • Optional columns can include quality scores, probe counts, or any other metadata

  • Caller concordance: If CNVs from multiple callers are merged, you can include an overlap percentage (e.g., 70% concordance between two callers) as an optional column

Pedigree file

Format: Tab-delimited text file with exactly 3 columns

Required columns (in order):

  1. SAMPLE_ID — Offspring/proband sample ID (must be first column)

  2. FATHER_ID — Father sample ID

  3. MOTHER_ID — Mother sample ID

Important

  • SampleID MUST be in the first column

  • The order of FatherID and MotherID (columns 2-3) can be swapped if needed

  • The file must have exactly 3 columns (no additional columns)

  • All three IDs must form a complete trio (no missing parents or offspring)

Example (correct format):

child001    father001    mother001
child002    father002    mother002
child003    father003    mother003

PLINK .fam and KING .kin files:

If your pedigree file comes from PLINK (.fam) or KING (.kin), you must reformat it to 3 columns before using MCNV2.

PLINK .fam files typically contain 6 columns:

FAM001    child001    father001    mother001    1    2
FAM002    child002    father002    mother002    2    1

To use with MCNV2, extract only columns 2-4:

# Extract columns 2, 3, 4 from PLINK .fam file
cut -f2,3,4 pedigree.fam > pedigree_mcnv2.txt

Result (correct 3-column format):

child001    father001    mother001
child002    father002    mother002

Note

  • Column names (headers) are optional

  • Only complete trios are analyzed

  • Incomplete trios in the pedigree file are excluded with a warning

Sample ID matching

Critical requirement: The pedigree file must contain complete trios only.

Complete trio definition:

A trio is complete when the pedigree file contains all three entries:

  1. Offspring sample ID

  2. Father sample ID

  3. Mother sample ID

Important

Incomplete trios are excluded:

  • ❌ Father and mother without offspring

  • ❌ Offspring with only one parent

  • ❌ Any trio missing one of the three IDs

All three IDs must be present in the pedigree file.

CNV file presence:

It is normal for a sample to be absent from the CNV file. This simply means the caller did not detect any CNVs for that individual.

Example scenario:

# Pedigree file (complete trio)
child001    father001    mother001

# CNV file (only father and child have detected CNVs)
chr1    1000000    1050000    DEL    child001
chr2    2000000    2100000    DUP    father001
# mother001 has NO CNVs detected → this is NORMAL

✅ This trio is valid because all three IDs are present in the pedigree file, even though the mother has no detected CNVs.

Note

The absence of CNVs for a sample in the CNV file indicates that the caller found no variants for that individual. This does not invalidate the trio.

Reference annotations

MCNV2 uses the following reference files (provided with the package):

Problematic regions (UCSC-based, BED format):

  • Segmental duplications

  • Centromeres

  • Telomeres

  • HLA region

Gene annotations (tab-delimited or BED):

  • Gene coordinates (from GTF/GFF)

  • LOEUF scores (gnomAD v4) — loss-of-function constraint metric

Gene exclusion list (optional, tab-delimited):

  • Ensembl gene IDs to exclude from Mendelian Precision calculation

Note

Genome build: hg38 is supported in this release. hg19/GRCh37 support is planned.

File validation

Before running MCNV2, verify:

  1. ✅ CNV file has correct column order (chr, start, stop, type, sample_id, optional…)

  2. ✅ Pedigree file has exactly 3 columns (offspring, father, mother)

  3. ✅ Pedigree file contains only complete trios (all 3 IDs present for each trio)

  4. ✅ CNV file is tab-delimited (not comma-separated)

  5. ✅ Chromosome names are consistent (e.g., always chr1 or always 1)

  6. ✅ CNV types are DEL or DUP (case-insensitive)

Tip

Use the Shiny interface to upload files — MCNV2 will automatically validate formats and report any issues.