Package 'medrobust' reference manual

Title:	Robust Causal Mediation Analysis Under Differential Misclassification
Description:	Provides tools for conducting sensitivity analysis for causal mediation effects when the exposure or mediator is measured with differential misclassification (e.g., recall bias, outcome-dependent measurement error). Unlike existing measurement error correction methods that assume non-differential error or require validation data, 'medrobust' derives partial identification bounds that remain valid without gold-standard measurements. The package implements methods developed in Tofighi (2025) for partial identification bounds for Natural Direct Effects (NDE) and Natural Indirect Effects (NIE), data-driven falsification via testable implications, sensitivity analysis over user-specified ranges of misclassification parameters, diagnostic tools and publication-quality visualizations, bootstrap inference for confidence intervals (percentile and BCa methods), and synthetic data generation for power analysis and methods research. The package handles both mediator misclassification and exposure misclassification within a unified framework.
Authors:	Davood Tofighi [aut, cre] (ORCID: <https://orcid.org/0000-0001-8523-7776>)
Maintainer:	Davood Tofighi <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.0.9000
Built:	2026-06-12 11:48:16 UTC
Source:	https://github.com/Data-Wise/medrobust

Create sensitivity_region object from list

Description

Create sensitivity_region object from list

Usage

as_sensitivity_region(region_list)
as_sensitivity_region(region_list)

Arguments

region_list

List with sn0_range, sp0_range, psi_sn_range, psi_sp_range

Value

sensitivity_region S7 object

Coerce to data frame (S3 - Legacy)

Description

Extract bounds as a data frame for further analysis or export. NOTE: This is a legacy S3 method. The package now uses S7 methods (see s7-methods.R).

Usage

## S3 method for class 'medrobust_bounds'
as.data.frame(x, row.names = NULL, optional = FALSE, ...)
## S3 method for class 'medrobust_bounds'
as.data.frame(x, row.names = NULL, optional = FALSE, ...)

Arguments

x

An object of class medrobust_bounds.

row.names

Optional row names (not used).

optional

Logical (not used).

...

Additional arguments (not used).

Value

A data frame with one row containing the bounds.

Convert sensitivity_region S7 object to list

Description

Convert sensitivity_region S7 object to list

Usage

## S3 method for class 'sensitivity_region'
as.list(x, ...)
## S3 method for class 'sensitivity_region'
as.list(x, ...)

Arguments

x

A sensitivity_region S7 object

...

Additional arguments (ignored)

Value

A list with sn0_range, sp0_range, psi_sn_range, psi_sp_range

Bootstrap Results Class

Description

Stores bootstrap inference results for partial identification bounds.

Usage

bootstrap_results(
  method = character(0),
  n_reps = integer(0),
  n_failed = 0L,
  confidence_level = integer(0),
  nie_lower_ci = integer(0),
  nie_upper_ci = integer(0),
  nde_lower_ci = integer(0),
  nde_upper_ci = integer(0),
  boot_nie_lower = integer(0),
  boot_nie_upper = integer(0),
  boot_nde_lower = integer(0),
  boot_nde_upper = integer(0),
  z0 = NULL,
  acceleration = NULL
)
bootstrap_results(
  method = character(0),
  n_reps = integer(0),
  n_failed = 0L,
  confidence_level = integer(0),
  nie_lower_ci = integer(0),
  nie_upper_ci = integer(0),
  nde_lower_ci = integer(0),
  nde_upper_ci = integer(0),
  boot_nie_lower = integer(0),
  boot_nie_upper = integer(0),
  boot_nde_lower = integer(0),
  boot_nde_upper = integer(0),
  z0 = NULL,
  acceleration = NULL
)

Arguments

method

Character: "percentile" or "bca"

n_reps

Number of bootstrap replications

n_failed

Number of failed bootstrap samples

confidence_level

Confidence level (e.g., 0.95)

nie_lower_ci

Confidence interval for NIE lower bound

nie_upper_ci

Confidence interval for NIE upper bound

nde_lower_ci

Confidence interval for NDE lower bound

nde_upper_ci

Confidence interval for NDE upper bound

boot_nie_lower

Bootstrap samples for NIE lower bound

boot_nie_upper

Bootstrap samples for NIE upper bound

boot_nde_lower

Bootstrap samples for NDE lower bound

boot_nde_upper

Bootstrap samples for NDE upper bound

z0

BCa bias-correction parameter

acceleration

BCa acceleration parameter

Compute Width of Bootstrap Distribution

Description

Compute the mean, median, and range of bound widths from bootstrap samples.

Usage

bootstrap_width_summary(bootstrap_results, effect = "NIE")
bootstrap_width_summary(bootstrap_results, effect = "NIE")

Arguments

bootstrap_results

List returned by compute_bootstrap_ci

effect

Character string: "NIE" or "NDE"

Value

List with width statistics

Confidence intervals for partial-identification bounds (Imbens-Manski)

Description

Computes a confidence interval for the partial-identification set returned by [bound_ne()]. The raw estimated bound $[\hat L, \hat U]$ is consistent but is *not* a confidence set: when the identified set is narrow relative to the sampling uncertainty of its endpoints, it under-covers the true effect at small samples. 'bound_ci()' widens the endpoints by their standard errors using the Imbens & Manski (2004) construction, restoring approximately nominal coverage of the true effect.

Usage

bound_ci(
  bounds,
  data,
  exposure,
  mediator,
  outcome,
  confounders,
  misclassified_variable = c("exposure", "mediator"),
  n_boot = 200L,
  level = 0.95,
  seed = NULL
)
bound_ci(
  bounds,
  data,
  exposure,
  mediator,
  outcome,
  confounders,
  misclassified_variable = c("exposure", "mediator"),
  n_boot = 200L,
  level = 0.95,
  seed = NULL
)

Arguments

bounds

A fitted 'medrobust_bounds' object from [bound_ne()].

data

The data frame passed to [bound_ne()].

exposure, mediator, outcome, confounders

Column names, as in [bound_ne()].

misclassified_variable

Either '"exposure"' or '"mediator"'; selects the recovery used to evaluate the effect at a single sensitivity parameter.

n_boot

Number of resamples for the endpoint standard errors (default 200).

level

Confidence level (default 0.95).

seed

Optional integer seed for reproducibility.

Details

Endpoint standard errors are obtained by re-evaluating the effect at the fixed minimizing/maximizing sensitivity parameter on resampled data (one evaluation per resample, with no grid search), which is far cheaper than a full bootstrap of the whole grid.

Value

A named list with elements 'NIE' and 'NDE', each a numeric vector with 'lower', 'upper' (the point bounds), 'se_lower', 'se_upper' (endpoint SEs), and 'ci_lower', 'ci_upper' (the Imbens-Manski confidence interval).

References

Imbens, G. W. and Manski, C. F. (2004). Confidence Intervals for Partially Identified Parameters. *Econometrica*, 72(6), 1845-1857.

Partial Identification Bounds for Natural Effects Under Differential Misclassification

Description

Computes partial identification bounds for the Natural Direct Effect (NDE) and Natural Indirect Effect (NIE) when either the exposure or mediator is subject to differential misclassification. The method does not require validation data.

Usage

bound_ne(
  data,
  exposure,
  mediator,
  outcome,
  confounders = NULL,
  misclassified_variable = c("exposure", "mediator"),
  sensitivity_region = NULL,
  n_grid = 50,
  effect_scale = c("OR", "RR", "RD"),
  confidence_level = 0.95,
  ci_method = c("none", "analytic"),
  ci_n_boot = 200L,
  bootstrap = FALSE,
  bootstrap_reps = 1000,
  bootstrap_method = c("percentile", "bca"),
  parallel = FALSE,
  n_cores = NULL,
  cache = FALSE,
  cache_dir = NULL,
  verbose = TRUE,
  stratify_by = NULL,
  use_adaptive_grid = TRUE,
  grid_method = c("lhs", "auto", "regular", "adaptive", "sobol", "binary")
)
bound_ne(
  data,
  exposure,
  mediator,
  outcome,
  confounders = NULL,
  misclassified_variable = c("exposure", "mediator"),
  sensitivity_region = NULL,
  n_grid = 50,
  effect_scale = c("OR", "RR", "RD"),
  confidence_level = 0.95,
  ci_method = c("none", "analytic"),
  ci_n_boot = 200L,
  bootstrap = FALSE,
  bootstrap_reps = 1000,
  bootstrap_method = c("percentile", "bca"),
  parallel = FALSE,
  n_cores = NULL,
  cache = FALSE,
  cache_dir = NULL,
  verbose = TRUE,
  stratify_by = NULL,
  use_adaptive_grid = TRUE,
  grid_method = c("lhs", "auto", "regular", "adaptive", "sobol", "binary")
)

Arguments

data

A data frame containing the observed variables.

exposure

Character string. Name of the exposure variable (A or A*).

mediator

Character string. Name of the mediator variable (M or M*).

outcome

Character string. Name of the outcome variable (Y).

confounders

Character vector. Names of confounding variables.

misclassified_variable

Character string. Either "exposure" or "mediator" to indicate which variable is subject to misclassification.

sensitivity_region

A named list defining the sensitivity region Theta_Psi. Should contain: sn0_range, sp0_range, psi_sn_range, psi_sp_range. Each element should be a numeric vector of length 2 giving the minimum and maximum values. If NULL, default ranges are used.

n_grid

Integer. Number of grid points per parameter dimension. Default is 50.

effect_scale

Character string. Scale for reporting effects: "OR" (odds ratio), "RR" (risk ratio), or "RD" (risk difference). Default is "OR".

confidence_level

Numeric. Confidence level for confidence intervals. Default is 0.95.

ci_method

Character. '"none"' (default) or '"analytic"'. If '"analytic"', attaches Imbens-Manski confidence intervals (see [bound_ci()]) to the result's '@analytic_ci' slot.

ci_n_boot

Integer. Resamples for the analytic CI endpoint SEs. Default 200.

bootstrap

Logical. Whether to compute bootstrap confidence intervals. Default is FALSE.

bootstrap_reps

Integer. Number of bootstrap replicates if bootstrap=TRUE. Default is 1000.

bootstrap_method

Character string. Bootstrap CI method: "percentile" (default) or "bca" (bias-corrected and accelerated). The percentile method is faster and adequate for most applications. BCa provides second-order accurate intervals but requires jackknife estimation, which is computationally intensive for large datasets.

parallel

Logical. Whether to use parallel processing. Default is FALSE.

n_cores

Integer. Number of cores for parallel processing. If NULL, uses detectCores() - 1. Default is NULL.

cache

Logical. Whether to cache intermediate results. Default is FALSE.

cache_dir

Character string. Directory for cache files. If NULL, uses temp directory.

verbose

Logical. Whether to print progress messages. Default is TRUE.

stratify_by

Character vector. Additional variables to stratify by (advanced use).

use_adaptive_grid

Logical. Whether to use adaptive grid refinement for large grids (n_grid >= 20). This dramatically reduces computation time by focusing on compatible regions. Default is TRUE.

grid_method

Character string specifying grid search algorithm:

"lhs" (default): Latin Hypercube Sampling - space-filling design that reduces evaluations by 99% while maintaining broad coverage. Best for most applications (McKay et al., 1979).
"auto": Automatically selects best method based on a 16-point probe of the parameter space.
"regular": Exhaustive regular grid (n_grid^4 evaluations). Use for exact bounds when computational budget allows.
"sobol": Sobol low-discrepancy sequences (Sobol, 1967). Similar to LHS but better for high-dimensional problems.
"adaptive": Two-stage coarse-to-fine refinement. Effective when falsification rate is high.
"binary": Binary search on parameter boundaries. Efficient when compatibility is monotonic in parameters.

Details

This function implements the partial identification approach described in [Author] (2025). The method derives bounds on causal mediation effects by specifying a plausible range for misclassification parameters and using testable implications to rule out empirically inconsistent values.

The sensitivity region Theta_Psi is defined by four parameters:

sn0: Baseline sensitivity (probability of correct classification when Y=0)
sp0: Baseline specificity
psi_sn: Differential sensitivity parameter (odds ratio)
psi_sp: Differential specificity parameter (odds ratio)

Non-differential misclassification corresponds to psi_sn = psi_sp = 1.

Value

An object of class medrobust_bounds containing:

NIE_lower

Lower bound for Natural Indirect Effect

NIE_upper

Upper bound for Natural Indirect Effect

NDE_lower

Lower bound for Natural Direct Effect

NDE_upper

Upper bound for Natural Direct Effect

compatible_sets

Data frame of compatible parameter sets

falsified_proportion

Proportion of sensitivity region falsified

effect_scale

Scale used for reporting

n_evaluated

Number of parameter sets evaluated

n_compatible

Number of compatible parameter sets

computation_time

Time taken for computation

call

Original function call

data_summary

Summary statistics of the data

bootstrap_results

Bootstrap results if bootstrap=TRUE

References

[Author] (2025). Partial Identification of Causal Mediation Effects Under Differential Misclassification. Biostatistics.

McKay, M. D., Beckman, R. J., & Conover, W. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2), 239-245.

Sobol', I. M. (1967). On the distribution of points in a cube and the approximate evaluation of integrals. USSR Computational Mathematics and Mathematical Physics, 7(4), 86-112.

Examples

## Not run: 
# Load example data
data("heals_data")

# Define sensitivity region
sens_region <- list(
  sn0_range = c(0.55, 0.65),
  sp0_range = c(0.85, 0.95),
  psi_sn_range = c(1.0, 2.0),
  psi_sp_range = c(0.5, 1.0)
)

# Compute bounds for exposure misclassification
bounds <- bound_ne(
  data = heals_data,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = c("age", "male", "smoking", "bmi"),
  misclassified_variable = "exposure",
  sensitivity_region = sens_region,
  n_grid = 50
)

# View results
print(bounds)
summary(bounds)

# Visualize
sensitivity_plot(bounds, param = "psi_sn")

## End(Not run)

## Not run: 
# Load example data
data("heals_data")

# Define sensitivity region
sens_region <- list(
  sn0_range = c(0.55, 0.65),
  sp0_range = c(0.85, 0.95),
  psi_sn_range = c(1.0, 2.0),
  psi_sp_range = c(0.5, 1.0)
)

# Compute bounds for exposure misclassification
bounds <- bound_ne(
  data = heals_data,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = c("age", "male", "smoking", "bmi"),
  misclassified_variable = "exposure",
  sensitivity_region = sens_region,
  n_grid = 50
)

# View results
print(bounds)
summary(bounds)

# Visualize
sensitivity_plot(bounds, param = "psi_sn")

## End(Not run)

Check Compatibility of Misclassification Parameters

Description

Tests whether a specific set of misclassification parameters is compatible with the observed data by checking testable implications. Returns detailed diagnostics about which constraints are satisfied or violated.

Usage

check_compatibility(
  data,
  exposure,
  mediator,
  outcome,
  confounders = NULL,
  psi,
  misclassified_variable = c("exposure", "mediator"),
  return_details = TRUE,
  tolerance = 1e-06
)
check_compatibility(
  data,
  exposure,
  mediator,
  outcome,
  confounders = NULL,
  psi,
  misclassified_variable = c("exposure", "mediator"),
  return_details = TRUE,
  tolerance = 1e-06
)

Arguments

data

Data frame containing the observed variables

exposure

Character string. Name of exposure variable

mediator

Character string. Name of mediator variable

outcome

Character string. Name of outcome variable

confounders

Character vector. Names of confounding variables

psi

Named list containing misclassification parameters:

sn0: Baseline sensitivity
sp0: Baseline specificity
psi_sn: Differential sensitivity (odds ratio)
psi_sp: Differential specificity (odds ratio)

misclassified_variable

Character string. Either "exposure" or "mediator"

return_details

Logical. If TRUE, returns detailed stratum-level diagnostics. Default is TRUE.

tolerance

Numeric. Tolerance for numerical precision when checking constraints. Default is 1e-6.

Details

This function implements the testable implications derived in Propositions 4.1 and 5.1 of the paper. For mediator misclassification, it checks whether the observed data can be explained by any true causal parameters given the specified misclassification mechanism. For exposure misclassification, it checks the likelihood ratio constraints on observed probabilities.

The function is useful for:

Testing specific hypotheses about misclassification parameters
Understanding which constraints are most informative
Debugging sensitivity analyses
Generating diagnostic plots

Value

A list with class compatibility_test containing:

compatible

Logical. TRUE if parameters are compatible

psi

The tested parameter set

n_constraints_total

Total number of testable constraints

n_constraints_satisfied

Number of satisfied constraints

n_constraints_violated

Number of violated constraints

violated_constraints

Data frame of violated constraints (if any)

implied_probabilities

Solved true probabilities (if compatible)

stratum_details

Stratum-level diagnostics (if return_details=TRUE)

Examples

## Not run: 
data("heals_data")

# Test non-differential misclassification with high accuracy
test_ndm <- check_compatibility(
  data = heals_data,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = c("age", "male", "smoking", "bmi"),
  psi = list(sn0 = 0.90, sp0 = 0.90, psi_sn = 1.0, psi_sp = 1.0),
  misclassified_variable = "exposure"
)

print(test_ndm)

# Test strong differential misclassification
test_strong_dm <- check_compatibility(
  data = heals_data,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = c("age", "male", "smoking", "bmi"),
  psi = list(sn0 = 0.85, sp0 = 0.85, psi_sn = 3.0, psi_sp = 1.0),
  misclassified_variable = "exposure"
)

print(test_strong_dm)

## End(Not run)

## Not run: 
data("heals_data")

# Test non-differential misclassification with high accuracy
test_ndm <- check_compatibility(
  data = heals_data,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = c("age", "male", "smoking", "bmi"),
  psi = list(sn0 = 0.90, sp0 = 0.90, psi_sn = 1.0, psi_sp = 1.0),
  misclassified_variable = "exposure"
)

print(test_ndm)

# Test strong differential misclassification
test_strong_dm <- check_compatibility(
  data = heals_data,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = c("age", "male", "smoking", "bmi"),
  psi = list(sn0 = 0.85, sp0 = 0.85, psi_sn = 3.0, psi_sp = 1.0),
  misclassified_variable = "exposure"
)

print(test_strong_dm)

## End(Not run)

Compare Bounds Across Multiple Analyses

Description

Compare partial identification bounds from multiple analyses (e.g., different sensitivity regions, different datasets, different assumptions).

Usage

compare_bounds(bounds_list, labels = NULL)
compare_bounds(bounds_list, labels = NULL)

Arguments

bounds_list

A named list of medrobust_bounds objects.

labels

Optional character vector of labels for each analysis. If NULL, uses names from bounds_list.

Value

A data frame comparing the bounds, and optionally a plot.

Examples

## Not run: 
# Compare bounds under different sensitivity assumptions
bounds1 <- bound_ne(data, ..., sensitivity_region = region1)
bounds2 <- bound_ne(data, ..., sensitivity_region = region2)

comparison <- compare_bounds(
  list(conservative = bounds1, liberal = bounds2)
)
print(comparison)

## End(Not run)

## Not run: 
# Compare bounds under different sensitivity assumptions
bounds1 <- bound_ne(data, ..., sensitivity_region = region1)
bounds2 <- bound_ne(data, ..., sensitivity_region = region2)

comparison <- compare_bounds(
  list(conservative = bounds1, liberal = bounds2)
)
print(comparison)

## End(Not run)

Compatibility Test Class

Description

S7 class for storing results of compatibility tests for specific misclassification parameter values.

Arguments

compatible

Logical indicating if parameters are compatible with data

psi

List of misclassification parameters tested

sn1

Sensitivity when Y=1 (implied from psi)

sp1

Specificity when Y=1 (implied from psi)

n_constraints_total

Total number of testable constraints

n_constraints_satisfied

Number of constraints satisfied

n_constraints_violated

Number of constraints violated

violated_constraints

Data frame with details of violated constraints

implied_probabilities

List of implied probability distributions

stratum_details

List with stratum-specific details

misclassified_variable

Character: "exposure" or "mediator"

reason

Character describing reason for incompatibility (if any)

Compute Standard Errors for Bounds

Description

Estimate standard errors for the lower and upper bounds using the bootstrap distribution.

Usage

compute_bound_se(bootstrap_results)
compute_bound_se(bootstrap_results)

Arguments

bootstrap_results

List returned by compute_bootstrap_ci

Value

Named vector of standard errors

Extract Compatible Parameter Sets

Description

Extract the compatible parameter sets from a bounds analysis.

Usage

extract_bounds(bounds_object)
extract_bounds(bounds_object)

Arguments

bounds_object

An object of class medrobust_bounds

Value

A data frame containing compatible parameter sets

Extract Falsified Region

Description

Extract the subset of the sensitivity region that is empirically falsified.

Usage

extract_falsified_region(bounds_object)
extract_falsified_region(bounds_object)

Arguments

bounds_object

An object of class medrobust_bounds

Value

A data frame containing parameter sets that were falsified

Summarize Falsification Results

Description

Provides a detailed summary of which regions of the sensitivity parameter space are empirically falsified by the data. Helps understand which assumptions about misclassification are most constrained by the observed data.

Usage

falsification_summary(
  bounds_object,
  by_parameter = TRUE,
  n_bins = 10,
  plot = TRUE
)
falsification_summary(
  bounds_object,
  by_parameter = TRUE,
  n_bins = 10,
  plot = TRUE
)

Arguments

bounds_object

An object of class medrobust_bounds returned by bound_ne

by_parameter

Logical. If TRUE, breaks down falsification by each parameter dimension. Default is TRUE.

n_bins

Integer. Number of bins for discretizing parameters when computing falsification rates. Default is 10.

plot

Logical. If TRUE, generates visualization of falsification patterns. Default is TRUE.

Details

This function analyzes the compatible parameter sets to understand which regions of the sensitivity space are ruled out by the testable implications. High falsification rates indicate that the data are informative about that particular parameter.

The falsification analysis is useful for:

Understanding which misclassification assumptions are most constrained
Identifying whether bounds are wide due to weak data vs. wide sensitivity region
Guiding choice of sensitivity parameters for future studies
Assessing whether additional data would meaningfully narrow bounds

Value

A list of class falsification_summary containing:

overall

Overall falsification rate

by_parameter

Parameter-specific falsification rates (if by_parameter=TRUE)

joint_falsification

2D falsification patterns

most_constrained

Parameters that are most falsified

least_constrained

Parameters that are least falsified

plot

ggplot2 object (if plot=TRUE)

Examples

## Not run: 
bounds <- bound_ne(...)

falsif <- falsification_summary(bounds)
print(falsif)

# View falsification plot
falsif$plot

## End(Not run)

## Not run: 
bounds <- bound_ne(...)

falsif <- falsification_summary(bounds)
print(falsif)

# View falsification plot
falsif$plot

## End(Not run)

Format Effect Estimate for Reporting

Description

Format an effect estimate as a string for tables/reports.

Usage

format_effect(estimate, effect_scale = "OR", digits = 2, ci = NULL)
format_effect(estimate, effect_scale = "OR", digits = 2, ci = NULL)

Arguments

estimate

Numeric effect estimate (or bounds)

effect_scale

Character string: "OR", "RR", or "RD"

digits

Integer: number of decimal places

ci

Optional: confidence interval (length 2 vector)

Value

Character string

Synthetic HEALS Data with Differential Measurement Error

Description

A synthetic dataset based on the Health Effects of Arsenic Longitudinal Study (HEALS) that demonstrates differential measurement error in exposure assessment. The dataset includes both observed (misclassified) and true (ground truth) arsenic exposure, allowing for validation of sensitivity analysis methods.

Usage

heals_data
heals_data

Format

A data frame with 450 observations and 9 variables:

id: Integer. Unique subject identifier (1 to 450).
Y: Binary. Cardiovascular disease status (0 = Absent, 1 = Present).
M: Binary. Elevated inflammation marker (sVCAM-1) (0 = Low, 1 = High).
A_star: Binary. Self-reported arsenic exposure (0 = No, 1 = Yes). Subject to severe differential recall bias.
A_true: Binary. True arsenic exposure status (0 = No, 1 = Yes). Included for validation; not available in real studies.
age: Numeric. Age in years (range: 18-75, mean ~42.8).
male: Binary. Sex (0 = Female, 1 = Male).
smoking: Binary. Ever smoker status (0 = Never, 1 = Ever).
bmi: Numeric. Body mass index in kg/m² (mean ~19.7).

Details

## Data Generation The data were generated to mimic the HEALS cohort using published effect sizes and demographic profiles from:

Ahsan et al. (2006) - Baseline demographics
Chen et al. (2007) - Arsenic-inflammation relationship
Argos et al. (2010) - Arsenic-CVD relationship (HR ~ 1.92)

## Measurement Error Model Differential misclassification was introduced with:

Cases (Y=1): Sensitivity = 0.90, Specificity = 0.55 (over-reporting)
Controls (Y=0): Sensitivity = 0.60, Specificity = 0.90

This creates severe recall bias where CVD cases over-report arsenic exposure, inflating the naive effect estimate by approximately 70

## True vs. Naive Effects

True Natural Direct Effect: OR ~ 1.68
Naive NDE (using A_star): OR ~ 2.85
Bias amplification: ~70

Source

Synthetic data generated using methods described in the package vignette "Synthetic HEALS Data: Ground Truth with Differential Measurement Error". See vignette("heals-synthetic-data", package = "medrobust") for full details.

Examples

# Load the data
data("heals_data")
str(heals_data)

# Examine prevalence
table(heals_data$A_star, heals_data$Y)

# Compare naive vs. true effect
naive_mod <- glm(Y ~ A_star + M + age + male + smoking + bmi,
                 data = heals_data, family = binomial)
true_mod <- glm(Y ~ A_true + M + age + male + smoking + bmi,
                data = heals_data, family = binomial)

cat("Naive NDE:", round(exp(coef(naive_mod)["A_star"]), 2), "\n")
cat("True NDE:", round(exp(coef(true_mod)["A_true"]), 2), "\n")

## Not run: 
# Use with medrobust bounds
bounds <- bound_ne(
  data = heals_data,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = c("age", "male", "smoking", "bmi"),
  misclassified_variable = "exposure",
  sensitivity_region = list(
    sn0_range = c(0.55, 0.65),
    sp0_range = c(0.85, 0.95),
    psi_sn_range = c(1.0, 2.0),
    psi_sp_range = c(0.5, 1.0)
  )
)

## End(Not run)

# Load the data
data("heals_data")
str(heals_data)

# Examine prevalence
table(heals_data$A_star, heals_data$Y)

# Compare naive vs. true effect
naive_mod <- glm(Y ~ A_star + M + age + male + smoking + bmi,
                 data = heals_data, family = binomial)
true_mod <- glm(Y ~ A_true + M + age + male + smoking + bmi,
                data = heals_data, family = binomial)

cat("Naive NDE:", round(exp(coef(naive_mod)["A_star"]), 2), "\n")
cat("True NDE:", round(exp(coef(true_mod)["A_true"]), 2), "\n")

## Not run: 
# Use with medrobust bounds
bounds <- bound_ne(
  data = heals_data,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = c("age", "male", "smoking", "bmi"),
  misclassified_variable = "exposure",
  sensitivity_region = list(
    sn0_range = c(0.55, 0.65),
    sp0_range = c(0.85, 0.95),
    psi_sn_range = c(1.0, 2.0),
    psi_sp_range = c(0.5, 1.0)
  )
)

## End(Not run)

Medrobust Bounds Class

Description

S7 class for storing partial identification bounds for natural direct and indirect effects under differential misclassification.

Arguments

NIE_lower

Lower bound for Natural Indirect Effect

NIE_upper

Upper bound for Natural Indirect Effect

NDE_lower

Lower bound for Natural Direct Effect

NDE_upper

Upper bound for Natural Direct Effect

compatible_sets

Data frame of parameter sets compatible with data

n_compatible

Number of compatible parameter sets

n_evaluated

Total number of parameter sets evaluated

falsified_proportion

Proportion of parameter space falsified

effect_scale

Character: "OR", "RR", or "RD"

misclassified_variable

Character: "exposure" or "mediator"

sensitivity_region

Sensitivity region specification

naive_estimates

List of naive effect estimates

bootstrap_results

Bootstrap inference results (if computed)

data_summary

Summary statistics from the data

call

The function call that created this object

Create Falsification Summary Object

Description

Low-level constructor for falsification_summary S7 objects. Most users should use the falsification_summary() function which analyzes bounds.

Usage

new_falsification_summary(
  overall,
  n_evaluated,
  n_compatible,
  n_falsified,
  by_parameter = NULL,
  joint_falsification = NULL,
  most_constrained = character(0),
  least_constrained = character(0),
  plot = NULL
)
new_falsification_summary(
  overall,
  n_evaluated,
  n_compatible,
  n_falsified,
  by_parameter = NULL,
  joint_falsification = NULL,
  most_constrained = character(0),
  least_constrained = character(0),
  plot = NULL
)

Arguments

overall

Overall falsification rate

n_evaluated

Number of parameter sets evaluated

n_compatible

Number of compatible parameter sets

n_falsified

Number of falsified parameter sets

by_parameter

Parameter-specific falsification (optional)

joint_falsification

Joint falsification patterns (optional)

most_constrained

Most constrained parameters (optional)

least_constrained

Least constrained parameters (optional)

plot

ggplot2 object (optional)

Value

A falsification_summary S7 object

Plot Bootstrap Distribution

Description

Visualize the bootstrap distribution of bounds.

Usage

plot_bootstrap_distribution(
  bootstrap_results,
  effect = "NIE",
  original_bounds = NULL
)
plot_bootstrap_distribution(
  bootstrap_results,
  effect = "NIE",
  original_bounds = NULL
)

Arguments

bootstrap_results

List returned by compute_bootstrap_ci

effect

Character string: "NIE" or "NDE"

original_bounds

Original bound estimates (optional)

Value

ggplot2 object

Power Analysis for Partial Identification Bounds

Description

Conducts simulation-based power analysis to determine the sample size needed to achieve a target bound width or to rule out the null hypothesis with high probability.

Usage

power_analysis(
  true_params,
  dm_params,
  sensitivity_region,
  misclass_type = c("exposure", "mediator"),
  sample_sizes = seq(100, 1000, by = 100),
  target_width = NULL,
  target_power = 0.8,
  alpha = 0.05,
  effect = c("NIE", "NDE"),
  n_sim = 100,
  n_grid = 30,
  confounders = 1,
  parallel = TRUE,
  n_cores = NULL,
  verbose = TRUE,
  seed = 12345
)
power_analysis(
  true_params,
  dm_params,
  sensitivity_region,
  misclass_type = c("exposure", "mediator"),
  sample_sizes = seq(100, 1000, by = 100),
  target_width = NULL,
  target_power = 0.8,
  alpha = 0.05,
  effect = c("NIE", "NDE"),
  n_sim = 100,
  n_grid = 30,
  confounders = 1,
  parallel = TRUE,
  n_cores = NULL,
  verbose = TRUE,
  seed = 12345
)

Arguments

true_params

Named list of true causal parameters (see simulate_dm_data)

dm_params

Named list of misclassification parameters

sensitivity_region

Named list defining Theta_Psi for bound_ne

misclass_type

Character string. "exposure" or "mediator"

sample_sizes

Integer vector. Sample sizes to evaluate. Default is seq(100, 1000, by = 100).

target_width

Numeric. Target bound width. If specified, finds minimum sample size to achieve this width with high probability. Default is NULL.

target_power

Numeric. Target power for rejecting null. Default is 0.80.

alpha

Numeric. Significance level. Default is 0.05.

effect

Character string. Which effect to power for: "NIE" or "NDE". Default is "NIE".

n_sim

Integer. Number of simulation replicates per sample size. Default is 100.

n_grid

Integer. Grid resolution for bound_ne. Default is 30.

confounders

Integer. Number of confounders. Default is 1.

parallel

Logical. Use parallel processing? Default is TRUE.

n_cores

Integer. Number of cores. Default is NULL (auto-detect).

verbose

Logical. Print progress? Default is TRUE.

seed

Integer. Random seed. Default is 12345.

Value

An S7 object of class power_analysis_result containing:

power_curve

Data frame with power by sample size

true_effect

True effect value

target_power

Target power level

target_width

Target bound width (if specified)

recommended_n_power

Recommended sample size for target power

recommended_n_width

Recommended sample size for target width

simulation_params

Parameters used for simulation

Examples

## Not run: 
# Power analysis for exposure DM with moderate effects
power_result <- power_analysis(
  true_params = list(beta_AM = 0.405, theta_AY = 0.405, theta_MY = 0.405),
  dm_params = list(sn0 = 0.85, sp0 = 0.85, psi_sn = 1.5, psi_sp = 1.0),
  sensitivity_region = list(
    sn0_range = c(0.80, 0.90),
    sp0_range = c(0.80, 0.90),
    psi_sn_range = c(1.0, 2.0),
    psi_sp_range = c(1.0, 1.0)
  ),
  misclass_type = "exposure",
  sample_sizes = c(200, 400, 600, 800, 1000),
  target_width = 0.3,
  target_power = 0.80,
  n_sim = 100
)

print(power_result)
plot(power_result)

## End(Not run)

## Not run: 
# Power analysis for exposure DM with moderate effects
power_result <- power_analysis(
  true_params = list(beta_AM = 0.405, theta_AY = 0.405, theta_MY = 0.405),
  dm_params = list(sn0 = 0.85, sp0 = 0.85, psi_sn = 1.5, psi_sp = 1.0),
  sensitivity_region = list(
    sn0_range = c(0.80, 0.90),
    sp0_range = c(0.80, 0.90),
    psi_sn_range = c(1.0, 2.0),
    psi_sp_range = c(1.0, 1.0)
  ),
  misclass_type = "exposure",
  sample_sizes = c(200, 400, 600, 800, 1000),
  target_width = 0.3,
  target_power = 0.80,
  n_sim = 100
)

print(power_result)
plot(power_result)

## End(Not run)

Power Analysis Result Class

Description

S7 class for storing power analysis results for partial identification bounds.

Arguments

power_curve

Data frame with power and width by sample size

true_effect

True effect value used in simulations

target_power

Target power level for sample size recommendations

target_width

Target bound width for sample size recommendations

recommended_n_power

Recommended sample size to achieve target power

recommended_n_width

Recommended sample size to achieve target width

simulation_params

List of simulation parameters used

Print Method for compatibility_test

Description

Print Method for compatibility_test

Usage

## S3 method for class 'compatibility_test'
print(x, ...)
## S3 method for class 'compatibility_test'
print(x, ...)

Arguments

x

An object of class compatibility_test

...

Additional arguments (currently unused)

Value

Invisibly returns the input object

Create Sensitivity Analysis Plots

Description

Generate publication-quality visualizations of partial identification bounds as a function of sensitivity parameters.

Usage

sensitivity_plot(
  bounds_object,
  param = "psi_sn",
  effect = c("both", "NIE", "NDE"),
  plot_type = c("bounds", "heatmap", "contour"),
  show_naive = TRUE,
  show_null = TRUE,
  color_scheme = "default",
  theme = "bw",
  ...
)
sensitivity_plot(
  bounds_object,
  param = "psi_sn",
  effect = c("both", "NIE", "NDE"),
  plot_type = c("bounds", "heatmap", "contour"),
  show_naive = TRUE,
  show_null = TRUE,
  color_scheme = "default",
  theme = "bw",
  ...
)

Arguments

bounds_object

An object of class medrobust_bounds returned by bound_ne.

param

Character vector specifying which parameter(s) to plot on the x-axis. Options: "psi_sn", "psi_sp", "sn0", "sp0". Can specify multiple for faceted plots.

effect

Character string: "NIE", "NDE", or "both" (default).

plot_type

Character string specifying plot type:

"bounds" (default): Line plot showing upper and lower bounds
"heatmap": 2D heatmap of bound width (requires two parameters)
"contour": Contour plot of bounds (requires two parameters)

show_naive

Logical indicating whether to overlay the naive estimate (assuming no misclassification). Default is TRUE.

show_null

Logical indicating whether to show horizontal line at null value (effect = 1 for OR/RR, effect = 0 for RD). Default is TRUE.

color_scheme

Character string specifying color palette: "default", "viridis", "colorblind", "grayscale".

theme

Character string for ggplot2 theme: "bw" (default), "minimal", "classic", "void".

...

Additional arguments passed to ggplot2 functions.

Details

The function creates different plot types to visualize sensitivity analysis:

Bounds plot: Shows how the lower and upper bounds vary as a function of one sensitivity parameter, with other parameters held fixed or averaged.

Heatmap: Shows the width of the identification region (upper - lower) as a function of two sensitivity parameters simultaneously.

Contour plot: Shows contour lines of the bounds in 2D parameter space.

Value

A ggplot2 object that can be further customized or saved.

Examples

## Not run: 
# Basic bounds plot
sensitivity_plot(bounds, param = "psi_sn", effect = "NIE")

# Show both effects
sensitivity_plot(bounds, param = "psi_sn", effect = "both")

# Heatmap of bound width
sensitivity_plot(bounds,
                param = c("psi_sn", "sn0"),
                plot_type = "heatmap")

# Customize and save
p <- sensitivity_plot(bounds, param = "psi_sn") +
  labs(title = "My Custom Title") +
  theme(legend.position = "bottom")
ggsave("sensitivity_plot.pdf", p, width = 8, height = 6)

## End(Not run)

## Not run: 
# Basic bounds plot
sensitivity_plot(bounds, param = "psi_sn", effect = "NIE")

# Show both effects
sensitivity_plot(bounds, param = "psi_sn", effect = "both")

# Heatmap of bound width
sensitivity_plot(bounds,
                param = c("psi_sn", "sn0"),
                plot_type = "heatmap")

# Customize and save
p <- sensitivity_plot(bounds, param = "psi_sn") +
  labs(title = "My Custom Title") +
  theme(legend.position = "bottom")
ggsave("sensitivity_plot.pdf", p, width = 8, height = 6)

## End(Not run)

Create Sensitivity Region

Description

Constructor for sensitivity_region S7 objects. Issues a warning if the region may be non-informative (Sn + Sp <= 1).

Usage

sensitivity_region(sn0_range, sp0_range, psi_sn_range, psi_sp_range)
sensitivity_region(sn0_range, sp0_range, psi_sn_range, psi_sp_range)

Arguments

sn0_range

Numeric vector of length 2: [min, max] for baseline sensitivity

sp0_range

Numeric vector of length 2: [min, max] for baseline specificity

psi_sn_range

Numeric vector of length 2: [min, max] for sensitivity OR

psi_sp_range

Numeric vector of length 2: [min, max] for specificity OR

Value

A sensitivity_region S7 object

Simulate Data with Differential Misclassification

Description

Generates synthetic data with known differential misclassification for power analysis, methods validation, and simulation studies. Allows control over true causal parameters and misclassification mechanisms.

Usage

simulate_dm_data(
  n = 500,
  true_params = list(beta_AM = 0.405, theta_AY = 0.405, theta_MY = 0.405),
  dm_params = list(sn0 = 0.85, sp0 = 0.85, psi_sn = 1.5, psi_sp = 1),
  misclass_type = c("exposure", "mediator"),
  confounders = 1,
  confounder_params = list(type = "binary", effect_on_A = 0.3, effect_on_M = 0.3,
    effect_on_Y = 0.3),
  effect_modification = FALSE,
  interaction_coef = 0,
  seed = NULL,
  return_truth = TRUE,
  return_params = TRUE
)
simulate_dm_data(
  n = 500,
  true_params = list(beta_AM = 0.405, theta_AY = 0.405, theta_MY = 0.405),
  dm_params = list(sn0 = 0.85, sp0 = 0.85, psi_sn = 1.5, psi_sp = 1),
  misclass_type = c("exposure", "mediator"),
  confounders = 1,
  confounder_params = list(type = "binary", effect_on_A = 0.3, effect_on_M = 0.3,
    effect_on_Y = 0.3),
  effect_modification = FALSE,
  interaction_coef = 0,
  seed = NULL,
  return_truth = TRUE,
  return_params = TRUE
)

Arguments

n

Integer. Sample size. Default is 500.

true_params

Named list of true causal parameters:

beta_AM: Effect of A on M (log-odds scale)
theta_AY: Direct effect of A on Y (log-odds scale)
theta_MY: Effect of M on Y (log-odds scale)
baseline_M: Baseline probability of M when A=0 (optional)
baseline_Y: Baseline probability of Y when A=0, M=0 (optional)

dm_params

Named list of misclassification parameters:

sn0: Baseline sensitivity (Y=0)
sp0: Baseline specificity (Y=0)
psi_sn: Differential sensitivity (odds ratio)
psi_sp: Differential specificity (odds ratio)

misclass_type

Character string. Either "exposure" or "mediator" to indicate which variable is misclassified. Default is "exposure".

confounders

Integer. Number of confounding variables to include. Default is 1. Set to 0 for no confounders.

confounder_params

Named list controlling confounder generation:

type: "binary" or "continuous". Default is "binary".
effect_on_A: Effect size on exposure (log-odds). Default is 0.3.
effect_on_M: Effect size on mediator (log-odds). Default is 0.3.
effect_on_Y: Effect size on outcome (log-odds). Default is 0.3.

effect_modification

Logical. Should there be effect modification (interaction between A and M on Y)? Default is FALSE.

interaction_coef

Numeric. Interaction coefficient if effect_modification=TRUE. Default is 0.

seed

Integer. Random seed for reproducibility. If NULL, no seed is set.

return_truth

Logical. Should true (unobserved) values be returned? Default is TRUE.

return_params

Logical. Should true causal effects be calculated and returned? Default is TRUE.

Value

An S7 object of class simulated_dm_data containing:

observed

Data frame with observed (potentially misclassified) variables

truth

Data frame with true (unobserved) values (if return_truth=TRUE)

true_effects

List of true causal effects (if return_params=TRUE)

generation_params

Parameters used for data generation

misclassification_applied

Summary of applied misclassification

Examples

## Not run: 
# Basic simulation with moderate effects and mild DM
sim_data <- simulate_dm_data(
  n = 500,
  true_params = list(
    beta_AM = 0.405,   # OR = 1.5
    theta_AY = 0.405,  # OR = 1.5
    theta_MY = 0.405   # OR = 1.5
  ),
  dm_params = list(
    sn0 = 0.85,
    sp0 = 0.85,
    psi_sn = 1.5,
    psi_sp = 1.0
  ),
  misclass_type = "exposure",
  seed = 12345
)

# Use in analysis
bounds <- bound_ne(
  data = sim_data@observed,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = "C1",
  misclassified_variable = "exposure",
  sensitivity_region = list(
    sn0_range = c(0.80, 0.90),
    sp0_range = c(0.80, 0.90),
    psi_sn_range = c(1.0, 2.0),
    psi_sp_range = c(1.0, 1.0)
  )
)

## End(Not run)

## Not run: 
# Basic simulation with moderate effects and mild DM
sim_data <- simulate_dm_data(
  n = 500,
  true_params = list(
    beta_AM = 0.405,   # OR = 1.5
    theta_AY = 0.405,  # OR = 1.5
    theta_MY = 0.405   # OR = 1.5
  ),
  dm_params = list(
    sn0 = 0.85,
    sp0 = 0.85,
    psi_sn = 1.5,
    psi_sp = 1.0
  ),
  misclass_type = "exposure",
  seed = 12345
)

# Use in analysis
bounds <- bound_ne(
  data = sim_data@observed,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = "C1",
  misclassified_variable = "exposure",
  sensitivity_region = list(
    sn0_range = c(0.80, 0.90),
    sp0_range = c(0.80, 0.90),
    psi_sn_range = c(1.0, 2.0),
    psi_sp_range = c(1.0, 1.0)
  )
)

## End(Not run)

Simulated Data with Differential Misclassification Class

Description

S7 class for storing simulated data with known differential misclassification, used for power analysis and methods validation.

Arguments

observed

Data frame with observed (potentially misclassified) variables

truth

Data frame with true (unobserved) values

true_effects

List of true causal effects

generation_params

List of parameters used to generate the data

misclassification_applied

List of misclassification parameters applied

Summary Method for compatibility_test

Description

Summary Method for compatibility_test

Usage

## S3 method for class 'compatibility_test'
summary(object, ...)
## S3 method for class 'compatibility_test'
summary(object, ...)

Arguments

object

An object of class compatibility_test

...

Additional arguments (currently unused)

Value

Invisibly returns the input object

Test Multiple Hypotheses

Description

Test multiple hypotheses about misclassification parameters simultaneously.

Usage

test_multiple_hypotheses(
  data,
  exposure,
  mediator,
  outcome,
  confounders,
  psi_list,
  misclassified_variable
)
test_multiple_hypotheses(
  data,
  exposure,
  mediator,
  outcome,
  confounders,
  psi_list,
  misclassified_variable
)

Arguments

data

Data frame

exposure

Character string

mediator

Character string

outcome

Character string

confounders

Character vector

psi_list

List of parameter sets to test

misclassified_variable

Character string

Value

Data frame with test results for each hypothesis

Package 'medrobust'

Help Index

Create sensitivity_region object from list

Description

Usage

Arguments

Value

Coerce to data frame (S3 - Legacy)

Description

Usage

Arguments

Value

Convert sensitivity_region S7 object to list

Description

Usage

Arguments

Value

Bootstrap Results Class

Description

Usage

Arguments

Compute Width of Bootstrap Distribution

Description

Usage

Arguments

Value

Confidence intervals for partial-identification bounds (Imbens-Manski)

Description

Usage

Arguments

Details

Value

References

See Also

Partial Identification Bounds for Natural Effects Under Differential Misclassification

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Check Compatibility of Misclassification Parameters

Description

Usage

Arguments

Details

Value

See Also

Examples

Compare Bounds Across Multiple Analyses

Description

Usage

Arguments

Value

Examples

Compatibility Test Class

Description

Arguments

Compute Standard Errors for Bounds

Description

Usage

Arguments

Value

Extract Compatible Parameter Sets

Description

Usage

Arguments

Value

Extract Falsified Region

Description

Usage

Arguments

Value

Summarize Falsification Results

Description

Usage

Arguments

Details