Input formats
MCNV2 requires two main input files: a CNV callset and a pedigree file.
Important
Column order is critical. Column names are flexible, but the position of each column must be respected exactly as described below.
CNV callset file
Format: Tab-delimited file
File extension can be .txt, .tsv, .bed, .xls, or any other extension, as long as the content is tab-delimited.
Required columns (in order):
CHR — Chromosome (e.g., chr1, 1, chrX)
START — CNV start position (integer)
STOP — CNV end position (integer)
TYPE — CNV type:
DEL(deletion) orDUP(duplication)SAMPLE_ID — Sample identifier
Optional columns (after the required 5):
Quality score (highly recommended) — Caller-specific likelihood score (e.g., PennCNV quality score)
Number of probes
Caller concordance percentage (if CNVs are merged from multiple callers)
Any other metadata
Important
Quality score (column 6) is highly recommended.
While optional, the quality score enables generation of MP versus quality score curves, which are essential for identifying optimal quality thresholds. Without this column, only size-stratified bar plots will be available.
Example:
chr1 1000000 1050000 DEL child001 32.5 150
chr1 2000000 2100000 DUP child001 45.2 220
chr2 5000000 5080000 DEL child002 28.1 95
# Column 6 (32.5, 45.2, 28.1) = Quality scores ← Highly recommended!
Note
Column names can be anything (e.g.,
chromosome,chr,chromare all acceptable)Column order cannot change — position 1 = chromosome, position 2 = start, etc.
Optional columns can include quality scores, probe counts, or any other metadata
Caller concordance: If CNVs from multiple callers are merged, you can include an overlap percentage (e.g., 70% concordance between two callers) as an optional column
Pedigree file
Format: Tab-delimited text file with exactly 3 columns
Required columns (in order):
SAMPLE_ID — Offspring/proband sample ID (must be first column)
FATHER_ID — Father sample ID
MOTHER_ID — Mother sample ID
Important
SampleID MUST be in the first column
The order of FatherID and MotherID (columns 2-3) can be swapped if needed
The file must have exactly 3 columns (no additional columns)
All three IDs must form a complete trio (no missing parents or offspring)
Example (correct format):
child001 father001 mother001
child002 father002 mother002
child003 father003 mother003
PLINK .fam and KING .kin files:
If your pedigree file comes from PLINK (.fam) or KING (.kin), you must reformat it to 3 columns before using MCNV2.
PLINK .fam files typically contain 6 columns:
FAM001 child001 father001 mother001 1 2
FAM002 child002 father002 mother002 2 1
To use with MCNV2, extract only columns 2-4:
# Extract columns 2, 3, 4 from PLINK .fam file
cut -f2,3,4 pedigree.fam > pedigree_mcnv2.txt
Result (correct 3-column format):
child001 father001 mother001
child002 father002 mother002
Note
Column names (headers) are optional
Only complete trios are analyzed
Incomplete trios in the pedigree file are excluded with a warning
Sample ID matching
Critical requirement: The pedigree file must contain complete trios only.
Complete trio definition:
A trio is complete when the pedigree file contains all three entries:
Offspring sample ID
Father sample ID
Mother sample ID
Important
Incomplete trios are excluded:
❌ Father and mother without offspring
❌ Offspring with only one parent
❌ Any trio missing one of the three IDs
All three IDs must be present in the pedigree file.
CNV file presence:
It is normal for a sample to be absent from the CNV file. This simply means the caller did not detect any CNVs for that individual.
Example scenario:
# Pedigree file (complete trio)
child001 father001 mother001
# CNV file (only father and child have detected CNVs)
chr1 1000000 1050000 DEL child001
chr2 2000000 2100000 DUP father001
# mother001 has NO CNVs detected → this is NORMAL
✅ This trio is valid because all three IDs are present in the pedigree file, even though the mother has no detected CNVs.
Note
The absence of CNVs for a sample in the CNV file indicates that the caller found no variants for that individual. This does not invalidate the trio.
Reference annotations
MCNV2 uses the following reference files (provided with the package):
Problematic regions (UCSC-based, BED format):
Segmental duplications
Centromeres
Telomeres
HLA region
Gene annotations (tab-delimited or BED):
Gene coordinates (from GTF/GFF)
LOEUF scores (gnomAD v4) — loss-of-function constraint metric
Gene exclusion list (optional, tab-delimited):
Ensembl gene IDs to exclude from Mendelian Precision calculation
Note
Genome build: hg38 is supported in this release. hg19/GRCh37 support is planned.
File validation
Before running MCNV2, verify:
✅ CNV file has correct column order (chr, start, stop, type, sample_id, optional…)
✅ Pedigree file has exactly 3 columns (offspring, father, mother)
✅ Pedigree file contains only complete trios (all 3 IDs present for each trio)
✅ CNV file is tab-delimited (not comma-separated)
✅ Chromosome names are consistent (e.g., always
chr1or always1)✅ CNV types are
DELorDUP(case-insensitive)
Tip
Use the Shiny interface to upload files — MCNV2 will automatically validate formats and report any issues.