CLI tutorial

MCNV2 provides R functions for reproducible batch processing and integration into automated pipelines.

Note

These functions are R wrappers that call Python scripts internally. You must have the MCNV2 R package installed and Python environment configured (see Installation).

Overview

MCNV2 provides three main functions for command-line workflows:

annotate() — Annotate CNVs with genes, LOEUF scores, and problematic regions
compute_inheritance() — Calculate inheritance status (Transmitted_CNV and Transmitted_gene)
compute_mp() — Calculate Mendelian Precision with filtering options

Typical workflow

library(MCNV2)

# 1. Annotate CNVs
annotate(
  cnvs_file = "data/cnvs.tsv",
  prob_regions_file = "data/problematic_regions.bed",
  output_file = "results/cnvs_annotated.tsv",
  genome_version = 38,
  bedtools_path = "/usr/local/bin/bedtools"
)

# 2. Compute inheritance status
compute_inheritance(
  cnvs_file = "results/cnvs_annotated.tsv",
  pedigree_file = "data/pedigree.tsv",
  output_file = "results/cnvs_inheritance.tsv",
  overlap = 0.5
)

# 3. Compute Mendelian Precision
# Note: MP is stratified by CNV type (DEL vs DUP) by default
compute_mp(
  inheritance_file = "results/cnvs_inheritance.tsv",
  output_file = "results/mp_summary.tsv",
  transmission_type = "cnv",
  min_size = 30000,
  max_prob_regions = 0.5
)

Function reference

annotate()

Annotate CNVs with gene information, LOEUF scores, and problematic region overlap.

Usage:

annotate(
  cnvs_file,
  prob_regions_file,
  output_file,
  genome_version = 38,
  bedtools_path
)

Parameters:

cnvs_file (character) — Path to CNV file (tab-delimited)

Required columns: CHR, START, STOP, TYPE, SAMPLE_ID
prob_regions_file (character) — Path to problematic regions BED file

Default file provided in package if not specified
output_file (character) — Path for output annotated file
genome_version (numeric) — Genome build: 38 (GRCh38/hg38) or 37 (GRCh37/hg19)

Default: 38
bedtools_path (character) — Path to bedtools executable

Example: “/usr/local/bin/bedtools”

Returns:

0 if successful
1 if output file was not created (error)

Output columns:

All input columns plus:

GeneName — HGNC gene symbol
GeneID — Ensembl gene ID
Transcript — Ensembl transcript ID
LOEUF — Loss-of-function constraint score (gnomAD v4)
problematic_region_overlap — Percentage overlap with problematic regions

Example:

library(MCNV2)

# Annotate CNVs
status <- annotate(
  cnvs_file = "data/cnvs.tsv",
  prob_regions_file = system.file("resources", "problematic_regions.bed",
                                  package = "MCNV2"),
  output_file = "results/cnvs_annotated.tsv",
  genome_version = 38,
  bedtools_path = "/usr/local/bin/bedtools"
)

if (status == 0) {
  message("Annotation completed successfully")
} else {
  stop("Annotation failed")
}

compute_inheritance()

Calculate inheritance status for each CNV based on parental data.

Usage:

compute_inheritance(
  cnvs_file,
  pedigree_file,
  output_file,
  overlap = 0.5
)

Parameters:

cnvs_file (character) — Path to annotated CNV file (output from annotate())
pedigree_file (character) — Path to pedigree file

Three columns: SAMPLE_ID, FATHER_ID, MOTHER_ID (tab-delimited, no header)
output_file (character) — Path for output file with inheritance status
overlap (numeric) — Minimum overlap for a CNV to be considered as inherited

Range: 0.01 to 1.0

Default: 0.5 (50%)

Returns:

0 if successful
1 if output file was not created (error)

Output columns:

All input columns plus:

Transmitted_CNV — True/False (coordinate-based inheritance)
Transmitted_gene — True/False/intergenic (gene-based inheritance)

Example:

library(MCNV2)

# Compute inheritance with 50% overlap threshold
status <- compute_inheritance(
  cnvs_file = "results/cnvs_annotated.tsv",
  pedigree_file = "data/pedigree.tsv",
  output_file = "results/cnvs_inheritance.tsv",
  overlap = 0.5
)

if (status == 0) {
  message("Inheritance calculation completed")

  # Read results
  cnvs <- read.table("results/cnvs_inheritance.tsv",
                     header = TRUE, sep = "\t", quote = "")

  # Count inherited vs non-inherited
  table(cnvs$Transmitted_CNV)
}

compute_mp()

Calculate Mendelian Precision with optional filtering.

Important

MP is always stratified by CNV type (DEL vs DUP) by default.

This is the recommended approach as deletions and duplications have different quality profiles. Set stratify_by_type = FALSE only if you specifically need a single global MP value (not recommended).

Important

This function requires the output file from compute_inheritance() as input.

Usage:

compute_mp(
  inheritance_file,
  output_file,
  transmission_type = "cnv",
  min_size = NULL,
  max_size = NULL,
  min_score = NULL,
  max_prob_regions = NULL,
  min_loeuf = NULL,
  stratify_by_size = FALSE,
  stratify_by_type = TRUE
)

Parameters:

inheritance_file (character) — Path to inheritance file

Must be the output from compute_inheritance()

Required columns: Transmitted_CNV, Transmitted_gene
output_file (character) — Path for MP summary output
transmission_type (character) — Transmission matching type
- “cnv” — Use Transmitted_CNV (coordinate-based)
- “gene” — Use Transmitted_gene (gene-based)
Default: “cnv”
min_size (numeric) — Minimum CNV size in bp (optional)

Example: 30000 (30 kb)
max_size (numeric) — Maximum CNV size in bp (optional)
min_score (numeric) — Minimum quality score (optional)
max_prob_regions (numeric) — Maximum problematic regions overlap (0-1, optional)

Example: 0.5 (exclude CNVs with >50% overlap)
min_loeuf (numeric) — Minimum LOEUF threshold (optional)

Example: 0.6 (exclude constrained genes)
stratify_by_size (logical) — Stratify MP by size ranges

Default: FALSE
stratify_by_type (logical) — Stratify MP by CNV type (DEL/DUP)

Default: TRUE (recommended)

Keep TRUE unless you specifically need a single global MP value. Deletions and duplications have different quality profiles and should be evaluated separately.

Returns:

0 if successful
1 if output file was not created (error)

Output:

Tab-delimited file with MP statistics:

CNV_type (DEL/DUP, or All if stratify_by_type=FALSE)
Size_range (All, or specific ranges if stratify_by_size=TRUE)
Total_CNVs
Inherited_CNVs
Non_inherited_CNVs
MP (Mendelian Precision %)

Example output (default: stratified by type):

CNV_type  Size_range  Total_CNVs  Inherited_CNVs  Non_inherited_CNVs   MP
DEL       All         5000        4250            750                  85.0
DUP       All         3000        2400            600                  80.0

Example output (stratified by type AND size):

CNV_type  Size_range   Total_CNVs  Inherited_CNVs  Non_inherited_CNVs   MP
DEL       1-30kb       800         600             200                  75.0
DEL       30-50kb      600         540             60                   90.0
DEL       50-100kb     900         855             45                   95.0
DUP       1-30kb       600         420             180                  70.0
DUP       30-50kb      500         425             75                   85.0
DUP       50-100kb     700         665             35                   95.0

Example 1: Basic MP calculation (stratified by type)

library(MCNV2)

# Calculate MP (CNV-level, stratified by DEL/DUP, no filters)
compute_mp(
  inheritance_file = "results/cnvs_inheritance.tsv",
  output_file = "results/mp_summary.tsv",
  transmission_type = "cnv"
)

# Read MP results
mp <- read.table("results/mp_summary.tsv", header = TRUE, sep = "\t")
print(mp)
#   CNV_type  Size_range  Total_CNVs  Inherited_CNVs  Non_inherited_CNVs   MP
#   DEL       All         5000        4250            750                  85.0
#   DUP       All         3000        2400            600                  80.0

Example 2: MP with filtering

# Calculate MP with size and quality filters
# Still stratified by DEL/DUP (default)
compute_mp(
  inheritance_file = "results/cnvs_inheritance.tsv",
  output_file = "results/mp_filtered.tsv",
  transmission_type = "cnv",
  min_size = 30000,           # ≥30 kb
  max_prob_regions = 0.5,     # ≤50% prob regions overlap
  min_score = 100             # Score ≥100
)

# Output shows DEL and DUP separately after filtering
#   CNV_type  Size_range  Total_CNVs  Inherited_CNVs  Non_inherited_CNVs   MP
#   DEL       All         3000        2775            225                  92.5
#   DUP       All         1800        1620            180                  90.0

Example 3: Technical MP (excluding constrained genes)

# Calculate technical MP (excluding LOEUF < 0.6)
compute_mp(
  inheritance_file = "results/cnvs_inheritance.tsv",
  output_file = "results/mp_technical.tsv",
  transmission_type = "gene",
  min_loeuf = 0.6
)

# Output:
#   CNV_type  Size_range  Total_CNVs  Inherited_CNVs  Non_inherited_CNVs   MP
#   DEL       All         4500        4185            315                  93.0
#   DUP       All         2700        2511            189                  93.0

Example 4: MP stratified by size

# Calculate MP for each size range
compute_mp(
  inheritance_file = "results/cnvs_inheritance.tsv",
  output_file = "results/mp_by_size.tsv",
  transmission_type = "cnv",
  stratify_by_size = TRUE,
  stratify_by_type = TRUE
)

# Output shows DEL and DUP for each size range
#   CNV_type  Size_range   Total_CNVs  Inherited_CNVs  Non_inherited_CNVs   MP
#   DEL       1-30kb       800         600             200                  75.0
#   DEL       30-50kb      600         540             60                   90.0
#   DEL       50-100kb     900         855             45                   95.0
#   DEL       100-200kb    700         686             14                   98.0
#   DUP       1-30kb       600         420             180                  70.0
#   DUP       30-50kb      500         425             75                   85.0
#   DUP       50-100kb     700         665             35                   95.0
#   DUP       100-200kb    500         490             10                   98.0

Example 5: Global MP (not recommended)

# Calculate single global MP value (not stratified by type)
# NOT RECOMMENDED: loses information about DEL vs DUP differences
compute_mp(
  inheritance_file = "results/cnvs_inheritance.tsv",
  output_file = "results/mp_global.tsv",
  transmission_type = "cnv",
  stratify_by_type = FALSE
)

# Output (single row):
#   CNV_type  Size_range  Total_CNVs  Inherited_CNVs  Non_inherited_CNVs   MP
#   All       All         8000        6650            1350                 83.1

Batch processing example

Process multiple datasets:

library(MCNV2)

# List of datasets
datasets <- c("cohort1", "cohort2", "cohort3")

for (dataset in datasets) {
  message(paste("Processing", dataset))

  # 1. Annotate
  annotate(
    cnvs_file = paste0("data/", dataset, "_cnvs.tsv"),
    prob_regions_file = system.file("resources", "problematic_regions.bed",
                                    package = "MCNV2"),
    output_file = paste0("results/", dataset, "_annotated.tsv"),
    genome_version = 38,
    bedtools_path = "/usr/local/bin/bedtools"
  )

  # 2. Inheritance
  compute_inheritance(
    cnvs_file = paste0("results/", dataset, "_annotated.tsv"),
    pedigree_file = paste0("data/", dataset, "_pedigree.tsv"),
    output_file = paste0("results/", dataset, "_inheritance.tsv"),
    overlap = 0.5
  )

  # 3. MP (multiple strategies)
  # All MP calculations are stratified by type (DEL/DUP) by default

  # 3a. Overall MP
  compute_mp(
    inheritance_file = paste0("results/", dataset, "_inheritance.tsv"),
    output_file = paste0("results/", dataset, "_mp_all.tsv"),
    transmission_type = "cnv"
  )

  # 3b. Filtered MP
  compute_mp(
    inheritance_file = paste0("results/", dataset, "_inheritance.tsv"),
    output_file = paste0("results/", dataset, "_mp_filtered.tsv"),
    transmission_type = "cnv",
    min_size = 30000,
    max_prob_regions = 0.5
  )

  # 3c. Technical MP
  compute_mp(
    inheritance_file = paste0("results/", dataset, "_inheritance.tsv"),
    output_file = paste0("results/", dataset, "_mp_technical.tsv"),
    transmission_type = "gene",
    min_loeuf = 0.6
  )
}

# Combine results (each file has DEL and DUP rows)
all_mp <- do.call(rbind, lapply(datasets, function(d) {
  mp <- read.table(paste0("results/", d, "_mp_all.tsv"),
                   header = TRUE, sep = "\t")
  mp$Dataset <- d
  return(mp)
}))

# Result has DEL and DUP rows for each dataset
write.table(all_mp, "results/combined_mp.tsv",
            sep = "\t", row.names = FALSE, quote = FALSE)

Pipeline integration

Nextflow workflow

// main.nf
process annotate {
    input:
    path cnv
    path prob_regions

    output:
    path "${cnv.baseName}_annotated.tsv"

    script:
    """
    Rscript -e '
    MCNV2::annotate(
        cnvs_file = "${cnv}",
        prob_regions_file = "${prob_regions}",
        output_file = "${cnv.baseName}_annotated.tsv",
        genome_version = 38,
        bedtools_path = "/usr/local/bin/bedtools"
    )'
    """
}

process inheritance {
    input:
    path annotated
    path pedigree

    output:
    path "${annotated.baseName}_inheritance.tsv"

    script:
    """
    Rscript -e '
    MCNV2::compute_inheritance(
        cnvs_file = "${annotated}",
        pedigree_file = "${pedigree}",
        output_file = "${annotated.baseName}_inheritance.tsv",
        overlap = 0.5
    )'
    """
}

process mp {
    input:
    path inheritance

    output:
    path "${inheritance.baseName}_mp.tsv"

    script:
    """
    Rscript -e '
    MCNV2::compute_mp(
        inheritance_file = "${inheritance}",
        output_file = "${inheritance.baseName}_mp.tsv",
        transmission_type = "cnv",
        min_size = 30000
    )'
    """
}

Error handling

Check return codes:

library(MCNV2)

# Annotate with error checking
status <- annotate(
  cnvs_file = "data/cnvs.tsv",
  prob_regions_file = "data/problematic_regions.bed",
  output_file = "results/annotated.tsv",
  genome_version = 38,
  bedtools_path = "/usr/local/bin/bedtools"
)

if (status != 0) {
  stop("Annotation failed. Check input files and bedtools path.")
}

# Verify output exists and is not empty
if (!file.exists("results/annotated.tsv")) {
  stop("Output file was not created")
}

if (file.size("results/annotated.tsv") == 0) {
  stop("Output file is empty")
}

message("Annotation completed successfully")

Performance considerations

Memory requirements:

Annotation: ~4 GB for 100,000 CNVs
Inheritance: ~8 GB for 100,000 CNVs in 1,000 trios
MP calculation: ~2 GB

Runtime (approximate):

Annotation: ~5 min for 100,000 CNVs
Inheritance: ~10 min for 100,000 CNVs in 1,000 trios
MP calculation: ~1 min

Recommendations:

For large datasets (>500,000 CNVs), consider splitting by chromosome
Use parallel processing for batch analysis of multiple cohorts
Ensure sufficient disk space for intermediate files

CLI tutorial

Overview

Typical workflow

Function reference

annotate()

compute_inheritance()

compute_mp()

Batch processing example

Pipeline integration

Nextflow workflow

Error handling

Performance considerations

See also