| Title: | Robust Causal Mediation Analysis Under Differential Misclassification |
|---|---|
| Description: | Provides tools for conducting sensitivity analysis for causal mediation effects when the exposure or mediator is measured with differential misclassification (e.g., recall bias, outcome-dependent measurement error). Unlike existing measurement error correction methods that assume non-differential error or require validation data, 'medrobust' derives partial identification bounds that remain valid without gold-standard measurements. The package implements methods developed in Tofighi (2025) for partial identification bounds for Natural Direct Effects (NDE) and Natural Indirect Effects (NIE), data-driven falsification via testable implications, sensitivity analysis over user-specified ranges of misclassification parameters, diagnostic tools and publication-quality visualizations, bootstrap inference for confidence intervals (percentile and BCa methods), and synthetic data generation for power analysis and methods research. The package handles both mediator misclassification and exposure misclassification within a unified framework. |
| Authors: | Davood Tofighi [aut, cre] (ORCID: <https://orcid.org/0000-0001-8523-7776>) |
| Maintainer: | Davood Tofighi <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0.9000 |
| Built: | 2026-06-12 11:48:16 UTC |
| Source: | https://github.com/Data-Wise/medrobust |
Create sensitivity_region object from list
as_sensitivity_region(region_list)as_sensitivity_region(region_list)
region_list |
List with sn0_range, sp0_range, psi_sn_range, psi_sp_range |
sensitivity_region S7 object
Extract bounds as a data frame for further analysis or export. NOTE: This is a legacy S3 method. The package now uses S7 methods (see s7-methods.R).
## S3 method for class 'medrobust_bounds' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'medrobust_bounds' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
An object of class |
row.names |
Optional row names (not used). |
optional |
Logical (not used). |
... |
Additional arguments (not used). |
A data frame with one row containing the bounds.
Convert sensitivity_region S7 object to list
## S3 method for class 'sensitivity_region' as.list(x, ...)## S3 method for class 'sensitivity_region' as.list(x, ...)
x |
A sensitivity_region S7 object |
... |
Additional arguments (ignored) |
A list with sn0_range, sp0_range, psi_sn_range, psi_sp_range
Stores bootstrap inference results for partial identification bounds.
bootstrap_results( method = character(0), n_reps = integer(0), n_failed = 0L, confidence_level = integer(0), nie_lower_ci = integer(0), nie_upper_ci = integer(0), nde_lower_ci = integer(0), nde_upper_ci = integer(0), boot_nie_lower = integer(0), boot_nie_upper = integer(0), boot_nde_lower = integer(0), boot_nde_upper = integer(0), z0 = NULL, acceleration = NULL )bootstrap_results( method = character(0), n_reps = integer(0), n_failed = 0L, confidence_level = integer(0), nie_lower_ci = integer(0), nie_upper_ci = integer(0), nde_lower_ci = integer(0), nde_upper_ci = integer(0), boot_nie_lower = integer(0), boot_nie_upper = integer(0), boot_nde_lower = integer(0), boot_nde_upper = integer(0), z0 = NULL, acceleration = NULL )
method |
Character: "percentile" or "bca" |
n_reps |
Number of bootstrap replications |
n_failed |
Number of failed bootstrap samples |
confidence_level |
Confidence level (e.g., 0.95) |
nie_lower_ci |
Confidence interval for NIE lower bound |
nie_upper_ci |
Confidence interval for NIE upper bound |
nde_lower_ci |
Confidence interval for NDE lower bound |
nde_upper_ci |
Confidence interval for NDE upper bound |
boot_nie_lower |
Bootstrap samples for NIE lower bound |
boot_nie_upper |
Bootstrap samples for NIE upper bound |
boot_nde_lower |
Bootstrap samples for NDE lower bound |
boot_nde_upper |
Bootstrap samples for NDE upper bound |
z0 |
BCa bias-correction parameter |
acceleration |
BCa acceleration parameter |
Compute the mean, median, and range of bound widths from bootstrap samples.
bootstrap_width_summary(bootstrap_results, effect = "NIE")bootstrap_width_summary(bootstrap_results, effect = "NIE")
bootstrap_results |
List returned by compute_bootstrap_ci |
effect |
Character string: "NIE" or "NDE" |
List with width statistics
Computes a confidence interval for the partial-identification set returned by
[bound_ne()]. The raw estimated bound is consistent but
is *not* a confidence set: when the identified set is narrow relative to the
sampling uncertainty of its endpoints, it under-covers the true effect at small
samples. 'bound_ci()' widens the endpoints by their standard errors using the
Imbens & Manski (2004) construction, restoring approximately nominal coverage of
the true effect.
bound_ci( bounds, data, exposure, mediator, outcome, confounders, misclassified_variable = c("exposure", "mediator"), n_boot = 200L, level = 0.95, seed = NULL )bound_ci( bounds, data, exposure, mediator, outcome, confounders, misclassified_variable = c("exposure", "mediator"), n_boot = 200L, level = 0.95, seed = NULL )
bounds |
A fitted 'medrobust_bounds' object from [bound_ne()]. |
data |
The data frame passed to [bound_ne()]. |
exposure, mediator, outcome, confounders
|
Column names, as in [bound_ne()]. |
misclassified_variable |
Either '"exposure"' or '"mediator"'; selects the recovery used to evaluate the effect at a single sensitivity parameter. |
n_boot |
Number of resamples for the endpoint standard errors (default 200). |
level |
Confidence level (default 0.95). |
seed |
Optional integer seed for reproducibility. |
Endpoint standard errors are obtained by re-evaluating the effect at the fixed minimizing/maximizing sensitivity parameter on resampled data (one evaluation per resample, with no grid search), which is far cheaper than a full bootstrap of the whole grid.
A named list with elements 'NIE' and 'NDE', each a numeric vector with 'lower', 'upper' (the point bounds), 'se_lower', 'se_upper' (endpoint SEs), and 'ci_lower', 'ci_upper' (the Imbens-Manski confidence interval).
Imbens, G. W. and Manski, C. F. (2004). Confidence Intervals for Partially Identified Parameters. *Econometrica*, 72(6), 1845-1857.
[bound_ne()]
Computes partial identification bounds for the Natural Direct Effect (NDE) and Natural Indirect Effect (NIE) when either the exposure or mediator is subject to differential misclassification. The method does not require validation data.
bound_ne( data, exposure, mediator, outcome, confounders = NULL, misclassified_variable = c("exposure", "mediator"), sensitivity_region = NULL, n_grid = 50, effect_scale = c("OR", "RR", "RD"), confidence_level = 0.95, ci_method = c("none", "analytic"), ci_n_boot = 200L, bootstrap = FALSE, bootstrap_reps = 1000, bootstrap_method = c("percentile", "bca"), parallel = FALSE, n_cores = NULL, cache = FALSE, cache_dir = NULL, verbose = TRUE, stratify_by = NULL, use_adaptive_grid = TRUE, grid_method = c("lhs", "auto", "regular", "adaptive", "sobol", "binary") )bound_ne( data, exposure, mediator, outcome, confounders = NULL, misclassified_variable = c("exposure", "mediator"), sensitivity_region = NULL, n_grid = 50, effect_scale = c("OR", "RR", "RD"), confidence_level = 0.95, ci_method = c("none", "analytic"), ci_n_boot = 200L, bootstrap = FALSE, bootstrap_reps = 1000, bootstrap_method = c("percentile", "bca"), parallel = FALSE, n_cores = NULL, cache = FALSE, cache_dir = NULL, verbose = TRUE, stratify_by = NULL, use_adaptive_grid = TRUE, grid_method = c("lhs", "auto", "regular", "adaptive", "sobol", "binary") )
data |
A data frame containing the observed variables. |
exposure |
Character string. Name of the exposure variable (A or A*). |
mediator |
Character string. Name of the mediator variable (M or M*). |
outcome |
Character string. Name of the outcome variable (Y). |
confounders |
Character vector. Names of confounding variables. |
misclassified_variable |
Character string. Either "exposure" or "mediator" to indicate which variable is subject to misclassification. |
sensitivity_region |
A named list defining the sensitivity region Theta_Psi.
Should contain: |
n_grid |
Integer. Number of grid points per parameter dimension. Default is 50. |
effect_scale |
Character string. Scale for reporting effects: "OR" (odds ratio), "RR" (risk ratio), or "RD" (risk difference). Default is "OR". |
confidence_level |
Numeric. Confidence level for confidence intervals. Default is 0.95. |
ci_method |
Character. '"none"' (default) or '"analytic"'. If '"analytic"', attaches Imbens-Manski confidence intervals (see [bound_ci()]) to the result's '@analytic_ci' slot. |
ci_n_boot |
Integer. Resamples for the analytic CI endpoint SEs. Default 200. |
bootstrap |
Logical. Whether to compute bootstrap confidence intervals. Default is FALSE. |
bootstrap_reps |
Integer. Number of bootstrap replicates if bootstrap=TRUE. Default is 1000. |
bootstrap_method |
Character string. Bootstrap CI method: "percentile" (default) or "bca" (bias-corrected and accelerated). The percentile method is faster and adequate for most applications. BCa provides second-order accurate intervals but requires jackknife estimation, which is computationally intensive for large datasets. |
parallel |
Logical. Whether to use parallel processing. Default is FALSE. |
n_cores |
Integer. Number of cores for parallel processing. If NULL, uses detectCores() - 1. Default is NULL. |
cache |
Logical. Whether to cache intermediate results. Default is FALSE. |
cache_dir |
Character string. Directory for cache files. If NULL, uses temp directory. |
verbose |
Logical. Whether to print progress messages. Default is TRUE. |
stratify_by |
Character vector. Additional variables to stratify by (advanced use). |
use_adaptive_grid |
Logical. Whether to use adaptive grid refinement for large grids (n_grid >= 20). This dramatically reduces computation time by focusing on compatible regions. Default is TRUE. |
grid_method |
Character string specifying grid search algorithm:
|
This function implements the partial identification approach described in [Author] (2025). The method derives bounds on causal mediation effects by specifying a plausible range for misclassification parameters and using testable implications to rule out empirically inconsistent values.
The sensitivity region Theta_Psi is defined by four parameters:
sn0: Baseline sensitivity (probability of correct classification
when Y=0)
sp0: Baseline specificity
psi_sn: Differential sensitivity parameter (odds ratio)
psi_sp: Differential specificity parameter (odds ratio)
Non-differential misclassification corresponds to psi_sn = psi_sp = 1.
An object of class medrobust_bounds containing:
NIE_lower |
Lower bound for Natural Indirect Effect |
NIE_upper |
Upper bound for Natural Indirect Effect |
NDE_lower |
Lower bound for Natural Direct Effect |
NDE_upper |
Upper bound for Natural Direct Effect |
compatible_sets |
Data frame of compatible parameter sets |
falsified_proportion |
Proportion of sensitivity region falsified |
effect_scale |
Scale used for reporting |
n_evaluated |
Number of parameter sets evaluated |
n_compatible |
Number of compatible parameter sets |
computation_time |
Time taken for computation |
call |
Original function call |
data_summary |
Summary statistics of the data |
bootstrap_results |
Bootstrap results if bootstrap=TRUE |
[Author] (2025). Partial Identification of Causal Mediation Effects Under Differential Misclassification. Biostatistics.
McKay, M. D., Beckman, R. J., & Conover, W. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2), 239-245.
Sobol', I. M. (1967). On the distribution of points in a cube and the approximate evaluation of integrals. USSR Computational Mathematics and Mathematical Physics, 7(4), 86-112.
check_compatibility, sensitivity_plot,
falsification_summary
## Not run: # Load example data data("heals_data") # Define sensitivity region sens_region <- list( sn0_range = c(0.55, 0.65), sp0_range = c(0.85, 0.95), psi_sn_range = c(1.0, 2.0), psi_sp_range = c(0.5, 1.0) ) # Compute bounds for exposure misclassification bounds <- bound_ne( data = heals_data, exposure = "A_star", mediator = "M", outcome = "Y", confounders = c("age", "male", "smoking", "bmi"), misclassified_variable = "exposure", sensitivity_region = sens_region, n_grid = 50 ) # View results print(bounds) summary(bounds) # Visualize sensitivity_plot(bounds, param = "psi_sn") ## End(Not run)## Not run: # Load example data data("heals_data") # Define sensitivity region sens_region <- list( sn0_range = c(0.55, 0.65), sp0_range = c(0.85, 0.95), psi_sn_range = c(1.0, 2.0), psi_sp_range = c(0.5, 1.0) ) # Compute bounds for exposure misclassification bounds <- bound_ne( data = heals_data, exposure = "A_star", mediator = "M", outcome = "Y", confounders = c("age", "male", "smoking", "bmi"), misclassified_variable = "exposure", sensitivity_region = sens_region, n_grid = 50 ) # View results print(bounds) summary(bounds) # Visualize sensitivity_plot(bounds, param = "psi_sn") ## End(Not run)
Tests whether a specific set of misclassification parameters is compatible with the observed data by checking testable implications. Returns detailed diagnostics about which constraints are satisfied or violated.
check_compatibility( data, exposure, mediator, outcome, confounders = NULL, psi, misclassified_variable = c("exposure", "mediator"), return_details = TRUE, tolerance = 1e-06 )check_compatibility( data, exposure, mediator, outcome, confounders = NULL, psi, misclassified_variable = c("exposure", "mediator"), return_details = TRUE, tolerance = 1e-06 )
data |
Data frame containing the observed variables |
exposure |
Character string. Name of exposure variable |
mediator |
Character string. Name of mediator variable |
outcome |
Character string. Name of outcome variable |
confounders |
Character vector. Names of confounding variables |
psi |
Named list containing misclassification parameters:
|
misclassified_variable |
Character string. Either "exposure" or "mediator" |
return_details |
Logical. If TRUE, returns detailed stratum-level diagnostics. Default is TRUE. |
tolerance |
Numeric. Tolerance for numerical precision when checking constraints. Default is 1e-6. |
This function implements the testable implications derived in Propositions 4.1 and 5.1 of the paper. For mediator misclassification, it checks whether the observed data can be explained by any true causal parameters given the specified misclassification mechanism. For exposure misclassification, it checks the likelihood ratio constraints on observed probabilities.
The function is useful for:
Testing specific hypotheses about misclassification parameters
Understanding which constraints are most informative
Debugging sensitivity analyses
Generating diagnostic plots
A list with class compatibility_test containing:
compatible |
Logical. TRUE if parameters are compatible |
psi |
The tested parameter set |
n_constraints_total |
Total number of testable constraints |
n_constraints_satisfied |
Number of satisfied constraints |
n_constraints_violated |
Number of violated constraints |
violated_constraints |
Data frame of violated constraints (if any) |
implied_probabilities |
Solved true probabilities (if compatible) |
stratum_details |
Stratum-level diagnostics (if return_details=TRUE) |
bound_ne, falsification_summary
## Not run: data("heals_data") # Test non-differential misclassification with high accuracy test_ndm <- check_compatibility( data = heals_data, exposure = "A_star", mediator = "M", outcome = "Y", confounders = c("age", "male", "smoking", "bmi"), psi = list(sn0 = 0.90, sp0 = 0.90, psi_sn = 1.0, psi_sp = 1.0), misclassified_variable = "exposure" ) print(test_ndm) # Test strong differential misclassification test_strong_dm <- check_compatibility( data = heals_data, exposure = "A_star", mediator = "M", outcome = "Y", confounders = c("age", "male", "smoking", "bmi"), psi = list(sn0 = 0.85, sp0 = 0.85, psi_sn = 3.0, psi_sp = 1.0), misclassified_variable = "exposure" ) print(test_strong_dm) ## End(Not run)## Not run: data("heals_data") # Test non-differential misclassification with high accuracy test_ndm <- check_compatibility( data = heals_data, exposure = "A_star", mediator = "M", outcome = "Y", confounders = c("age", "male", "smoking", "bmi"), psi = list(sn0 = 0.90, sp0 = 0.90, psi_sn = 1.0, psi_sp = 1.0), misclassified_variable = "exposure" ) print(test_ndm) # Test strong differential misclassification test_strong_dm <- check_compatibility( data = heals_data, exposure = "A_star", mediator = "M", outcome = "Y", confounders = c("age", "male", "smoking", "bmi"), psi = list(sn0 = 0.85, sp0 = 0.85, psi_sn = 3.0, psi_sp = 1.0), misclassified_variable = "exposure" ) print(test_strong_dm) ## End(Not run)
Compare partial identification bounds from multiple analyses (e.g., different sensitivity regions, different datasets, different assumptions).
compare_bounds(bounds_list, labels = NULL)compare_bounds(bounds_list, labels = NULL)
bounds_list |
A named list of |
labels |
Optional character vector of labels for each analysis. If NULL, uses names from bounds_list. |
A data frame comparing the bounds, and optionally a plot.
## Not run: # Compare bounds under different sensitivity assumptions bounds1 <- bound_ne(data, ..., sensitivity_region = region1) bounds2 <- bound_ne(data, ..., sensitivity_region = region2) comparison <- compare_bounds( list(conservative = bounds1, liberal = bounds2) ) print(comparison) ## End(Not run)## Not run: # Compare bounds under different sensitivity assumptions bounds1 <- bound_ne(data, ..., sensitivity_region = region1) bounds2 <- bound_ne(data, ..., sensitivity_region = region2) comparison <- compare_bounds( list(conservative = bounds1, liberal = bounds2) ) print(comparison) ## End(Not run)
S7 class for storing results of compatibility tests for specific misclassification parameter values.
compatible |
Logical indicating if parameters are compatible with data |
psi |
List of misclassification parameters tested |
sn1 |
Sensitivity when Y=1 (implied from psi) |
sp1 |
Specificity when Y=1 (implied from psi) |
n_constraints_total |
Total number of testable constraints |
n_constraints_satisfied |
Number of constraints satisfied |
n_constraints_violated |
Number of constraints violated |
violated_constraints |
Data frame with details of violated constraints |
implied_probabilities |
List of implied probability distributions |
stratum_details |
List with stratum-specific details |
misclassified_variable |
Character: "exposure" or "mediator" |
reason |
Character describing reason for incompatibility (if any) |
Estimate standard errors for the lower and upper bounds using the bootstrap distribution.
compute_bound_se(bootstrap_results)compute_bound_se(bootstrap_results)
bootstrap_results |
List returned by compute_bootstrap_ci |
Named vector of standard errors
Extract the compatible parameter sets from a bounds analysis.
extract_bounds(bounds_object)extract_bounds(bounds_object)
bounds_object |
An object of class |
A data frame containing compatible parameter sets
Extract the subset of the sensitivity region that is empirically falsified.
extract_falsified_region(bounds_object)extract_falsified_region(bounds_object)
bounds_object |
An object of class |
A data frame containing parameter sets that were falsified
Provides a detailed summary of which regions of the sensitivity parameter space are empirically falsified by the data. Helps understand which assumptions about misclassification are most constrained by the observed data.
falsification_summary( bounds_object, by_parameter = TRUE, n_bins = 10, plot = TRUE )falsification_summary( bounds_object, by_parameter = TRUE, n_bins = 10, plot = TRUE )
bounds_object |
An object of class |
by_parameter |
Logical. If TRUE, breaks down falsification by each parameter dimension. Default is TRUE. |
n_bins |
Integer. Number of bins for discretizing parameters when computing falsification rates. Default is 10. |
plot |
Logical. If TRUE, generates visualization of falsification patterns. Default is TRUE. |
This function analyzes the compatible parameter sets to understand which regions of the sensitivity space are ruled out by the testable implications. High falsification rates indicate that the data are informative about that particular parameter.
The falsification analysis is useful for:
Understanding which misclassification assumptions are most constrained
Identifying whether bounds are wide due to weak data vs. wide sensitivity region
Guiding choice of sensitivity parameters for future studies
Assessing whether additional data would meaningfully narrow bounds
A list of class falsification_summary containing:
overall |
Overall falsification rate |
by_parameter |
Parameter-specific falsification rates (if by_parameter=TRUE) |
joint_falsification |
2D falsification patterns |
most_constrained |
Parameters that are most falsified |
least_constrained |
Parameters that are least falsified |
plot |
ggplot2 object (if plot=TRUE) |
## Not run: bounds <- bound_ne(...) falsif <- falsification_summary(bounds) print(falsif) # View falsification plot falsif$plot ## End(Not run)## Not run: bounds <- bound_ne(...) falsif <- falsification_summary(bounds) print(falsif) # View falsification plot falsif$plot ## End(Not run)
Format an effect estimate as a string for tables/reports.
format_effect(estimate, effect_scale = "OR", digits = 2, ci = NULL)format_effect(estimate, effect_scale = "OR", digits = 2, ci = NULL)
estimate |
Numeric effect estimate (or bounds) |
effect_scale |
Character string: "OR", "RR", or "RD" |
digits |
Integer: number of decimal places |
ci |
Optional: confidence interval (length 2 vector) |
Character string
A synthetic dataset based on the Health Effects of Arsenic Longitudinal Study (HEALS) that demonstrates differential measurement error in exposure assessment. The dataset includes both observed (misclassified) and true (ground truth) arsenic exposure, allowing for validation of sensitivity analysis methods.
heals_dataheals_data
A data frame with 450 observations and 9 variables:
Integer. Unique subject identifier (1 to 450).
Binary. Cardiovascular disease status (0 = Absent, 1 = Present).
Binary. Elevated inflammation marker (sVCAM-1) (0 = Low, 1 = High).
Binary. Self-reported arsenic exposure (0 = No, 1 = Yes). Subject to severe differential recall bias.
Binary. True arsenic exposure status (0 = No, 1 = Yes). Included for validation; not available in real studies.
Numeric. Age in years (range: 18-75, mean ~42.8).
Binary. Sex (0 = Female, 1 = Male).
Binary. Ever smoker status (0 = Never, 1 = Ever).
Numeric. Body mass index in kg/m² (mean ~19.7).
## Data Generation The data were generated to mimic the HEALS cohort using published effect sizes and demographic profiles from:
Ahsan et al. (2006) - Baseline demographics
Chen et al. (2007) - Arsenic-inflammation relationship
Argos et al. (2010) - Arsenic-CVD relationship (HR ~ 1.92)
## Measurement Error Model Differential misclassification was introduced with:
Cases (Y=1): Sensitivity = 0.90, Specificity = 0.55 (over-reporting)
Controls (Y=0): Sensitivity = 0.60, Specificity = 0.90
This creates severe recall bias where CVD cases over-report arsenic exposure, inflating the naive effect estimate by approximately 70
## True vs. Naive Effects
True Natural Direct Effect: OR ~ 1.68
Naive NDE (using A_star): OR ~ 2.85
Bias amplification: ~70
Synthetic data generated using methods described in the package vignette
"Synthetic HEALS Data: Ground Truth with Differential Measurement Error".
See vignette("heals-synthetic-data", package = "medrobust") for full details.
# Load the data data("heals_data") str(heals_data) # Examine prevalence table(heals_data$A_star, heals_data$Y) # Compare naive vs. true effect naive_mod <- glm(Y ~ A_star + M + age + male + smoking + bmi, data = heals_data, family = binomial) true_mod <- glm(Y ~ A_true + M + age + male + smoking + bmi, data = heals_data, family = binomial) cat("Naive NDE:", round(exp(coef(naive_mod)["A_star"]), 2), "\n") cat("True NDE:", round(exp(coef(true_mod)["A_true"]), 2), "\n") ## Not run: # Use with medrobust bounds bounds <- bound_ne( data = heals_data, exposure = "A_star", mediator = "M", outcome = "Y", confounders = c("age", "male", "smoking", "bmi"), misclassified_variable = "exposure", sensitivity_region = list( sn0_range = c(0.55, 0.65), sp0_range = c(0.85, 0.95), psi_sn_range = c(1.0, 2.0), psi_sp_range = c(0.5, 1.0) ) ) ## End(Not run)# Load the data data("heals_data") str(heals_data) # Examine prevalence table(heals_data$A_star, heals_data$Y) # Compare naive vs. true effect naive_mod <- glm(Y ~ A_star + M + age + male + smoking + bmi, data = heals_data, family = binomial) true_mod <- glm(Y ~ A_true + M + age + male + smoking + bmi, data = heals_data, family = binomial) cat("Naive NDE:", round(exp(coef(naive_mod)["A_star"]), 2), "\n") cat("True NDE:", round(exp(coef(true_mod)["A_true"]), 2), "\n") ## Not run: # Use with medrobust bounds bounds <- bound_ne( data = heals_data, exposure = "A_star", mediator = "M", outcome = "Y", confounders = c("age", "male", "smoking", "bmi"), misclassified_variable = "exposure", sensitivity_region = list( sn0_range = c(0.55, 0.65), sp0_range = c(0.85, 0.95), psi_sn_range = c(1.0, 2.0), psi_sp_range = c(0.5, 1.0) ) ) ## End(Not run)
S7 class for storing partial identification bounds for natural direct and indirect effects under differential misclassification.
NIE_lower |
Lower bound for Natural Indirect Effect |
NIE_upper |
Upper bound for Natural Indirect Effect |
NDE_lower |
Lower bound for Natural Direct Effect |
NDE_upper |
Upper bound for Natural Direct Effect |
compatible_sets |
Data frame of parameter sets compatible with data |
n_compatible |
Number of compatible parameter sets |
n_evaluated |
Total number of parameter sets evaluated |
falsified_proportion |
Proportion of parameter space falsified |
effect_scale |
Character: "OR", "RR", or "RD" |
misclassified_variable |
Character: "exposure" or "mediator" |
sensitivity_region |
Sensitivity region specification |
naive_estimates |
List of naive effect estimates |
bootstrap_results |
Bootstrap inference results (if computed) |
data_summary |
Summary statistics from the data |
call |
The function call that created this object |
Low-level constructor for falsification_summary S7 objects. Most users should use the falsification_summary() function which analyzes bounds.
new_falsification_summary( overall, n_evaluated, n_compatible, n_falsified, by_parameter = NULL, joint_falsification = NULL, most_constrained = character(0), least_constrained = character(0), plot = NULL )new_falsification_summary( overall, n_evaluated, n_compatible, n_falsified, by_parameter = NULL, joint_falsification = NULL, most_constrained = character(0), least_constrained = character(0), plot = NULL )
overall |
Overall falsification rate |
n_evaluated |
Number of parameter sets evaluated |
n_compatible |
Number of compatible parameter sets |
n_falsified |
Number of falsified parameter sets |
by_parameter |
Parameter-specific falsification (optional) |
joint_falsification |
Joint falsification patterns (optional) |
most_constrained |
Most constrained parameters (optional) |
least_constrained |
Least constrained parameters (optional) |
plot |
ggplot2 object (optional) |
A falsification_summary S7 object
Visualize the bootstrap distribution of bounds.
plot_bootstrap_distribution( bootstrap_results, effect = "NIE", original_bounds = NULL )plot_bootstrap_distribution( bootstrap_results, effect = "NIE", original_bounds = NULL )
bootstrap_results |
List returned by compute_bootstrap_ci |
effect |
Character string: "NIE" or "NDE" |
original_bounds |
Original bound estimates (optional) |
ggplot2 object
Conducts simulation-based power analysis to determine the sample size needed to achieve a target bound width or to rule out the null hypothesis with high probability.
power_analysis( true_params, dm_params, sensitivity_region, misclass_type = c("exposure", "mediator"), sample_sizes = seq(100, 1000, by = 100), target_width = NULL, target_power = 0.8, alpha = 0.05, effect = c("NIE", "NDE"), n_sim = 100, n_grid = 30, confounders = 1, parallel = TRUE, n_cores = NULL, verbose = TRUE, seed = 12345 )power_analysis( true_params, dm_params, sensitivity_region, misclass_type = c("exposure", "mediator"), sample_sizes = seq(100, 1000, by = 100), target_width = NULL, target_power = 0.8, alpha = 0.05, effect = c("NIE", "NDE"), n_sim = 100, n_grid = 30, confounders = 1, parallel = TRUE, n_cores = NULL, verbose = TRUE, seed = 12345 )
true_params |
Named list of true causal parameters (see |
dm_params |
Named list of misclassification parameters |
sensitivity_region |
Named list defining Theta_Psi for bound_ne |
misclass_type |
Character string. "exposure" or "mediator" |
sample_sizes |
Integer vector. Sample sizes to evaluate. Default is seq(100, 1000, by = 100). |
target_width |
Numeric. Target bound width. If specified, finds minimum sample size to achieve this width with high probability. Default is NULL. |
target_power |
Numeric. Target power for rejecting null. Default is 0.80. |
alpha |
Numeric. Significance level. Default is 0.05. |
effect |
Character string. Which effect to power for: "NIE" or "NDE". Default is "NIE". |
n_sim |
Integer. Number of simulation replicates per sample size. Default is 100. |
n_grid |
Integer. Grid resolution for bound_ne. Default is 30. |
confounders |
Integer. Number of confounders. Default is 1. |
parallel |
Logical. Use parallel processing? Default is TRUE. |
n_cores |
Integer. Number of cores. Default is NULL (auto-detect). |
verbose |
Logical. Print progress? Default is TRUE. |
seed |
Integer. Random seed. Default is 12345. |
An S7 object of class power_analysis_result containing:
power_curve |
Data frame with power by sample size |
true_effect |
True effect value |
target_power |
Target power level |
target_width |
Target bound width (if specified) |
recommended_n_power |
Recommended sample size for target power |
recommended_n_width |
Recommended sample size for target width |
simulation_params |
Parameters used for simulation |
## Not run: # Power analysis for exposure DM with moderate effects power_result <- power_analysis( true_params = list(beta_AM = 0.405, theta_AY = 0.405, theta_MY = 0.405), dm_params = list(sn0 = 0.85, sp0 = 0.85, psi_sn = 1.5, psi_sp = 1.0), sensitivity_region = list( sn0_range = c(0.80, 0.90), sp0_range = c(0.80, 0.90), psi_sn_range = c(1.0, 2.0), psi_sp_range = c(1.0, 1.0) ), misclass_type = "exposure", sample_sizes = c(200, 400, 600, 800, 1000), target_width = 0.3, target_power = 0.80, n_sim = 100 ) print(power_result) plot(power_result) ## End(Not run)## Not run: # Power analysis for exposure DM with moderate effects power_result <- power_analysis( true_params = list(beta_AM = 0.405, theta_AY = 0.405, theta_MY = 0.405), dm_params = list(sn0 = 0.85, sp0 = 0.85, psi_sn = 1.5, psi_sp = 1.0), sensitivity_region = list( sn0_range = c(0.80, 0.90), sp0_range = c(0.80, 0.90), psi_sn_range = c(1.0, 2.0), psi_sp_range = c(1.0, 1.0) ), misclass_type = "exposure", sample_sizes = c(200, 400, 600, 800, 1000), target_width = 0.3, target_power = 0.80, n_sim = 100 ) print(power_result) plot(power_result) ## End(Not run)
S7 class for storing power analysis results for partial identification bounds.
power_curve |
Data frame with power and width by sample size |
true_effect |
True effect value used in simulations |
target_power |
Target power level for sample size recommendations |
target_width |
Target bound width for sample size recommendations |
recommended_n_power |
Recommended sample size to achieve target power |
recommended_n_width |
Recommended sample size to achieve target width |
simulation_params |
List of simulation parameters used |
Print Method for compatibility_test
## S3 method for class 'compatibility_test' print(x, ...)## S3 method for class 'compatibility_test' print(x, ...)
x |
An object of class |
... |
Additional arguments (currently unused) |
Invisibly returns the input object
Generate publication-quality visualizations of partial identification bounds as a function of sensitivity parameters.
sensitivity_plot( bounds_object, param = "psi_sn", effect = c("both", "NIE", "NDE"), plot_type = c("bounds", "heatmap", "contour"), show_naive = TRUE, show_null = TRUE, color_scheme = "default", theme = "bw", ... )sensitivity_plot( bounds_object, param = "psi_sn", effect = c("both", "NIE", "NDE"), plot_type = c("bounds", "heatmap", "contour"), show_naive = TRUE, show_null = TRUE, color_scheme = "default", theme = "bw", ... )
bounds_object |
An object of class |
param |
Character vector specifying which parameter(s) to plot on the x-axis. Options: "psi_sn", "psi_sp", "sn0", "sp0". Can specify multiple for faceted plots. |
effect |
Character string: "NIE", "NDE", or "both" (default). |
plot_type |
Character string specifying plot type:
|
show_naive |
Logical indicating whether to overlay the naive estimate (assuming no misclassification). Default is TRUE. |
show_null |
Logical indicating whether to show horizontal line at null value (effect = 1 for OR/RR, effect = 0 for RD). Default is TRUE. |
color_scheme |
Character string specifying color palette: "default", "viridis", "colorblind", "grayscale". |
theme |
Character string for ggplot2 theme: "bw" (default), "minimal", "classic", "void". |
... |
Additional arguments passed to ggplot2 functions. |
The function creates different plot types to visualize sensitivity analysis:
Bounds plot: Shows how the lower and upper bounds vary as a function of one sensitivity parameter, with other parameters held fixed or averaged.
Heatmap: Shows the width of the identification region (upper - lower) as a function of two sensitivity parameters simultaneously.
Contour plot: Shows contour lines of the bounds in 2D parameter space.
A ggplot2 object that can be further customized or saved.
## Not run: # Basic bounds plot sensitivity_plot(bounds, param = "psi_sn", effect = "NIE") # Show both effects sensitivity_plot(bounds, param = "psi_sn", effect = "both") # Heatmap of bound width sensitivity_plot(bounds, param = c("psi_sn", "sn0"), plot_type = "heatmap") # Customize and save p <- sensitivity_plot(bounds, param = "psi_sn") + labs(title = "My Custom Title") + theme(legend.position = "bottom") ggsave("sensitivity_plot.pdf", p, width = 8, height = 6) ## End(Not run)## Not run: # Basic bounds plot sensitivity_plot(bounds, param = "psi_sn", effect = "NIE") # Show both effects sensitivity_plot(bounds, param = "psi_sn", effect = "both") # Heatmap of bound width sensitivity_plot(bounds, param = c("psi_sn", "sn0"), plot_type = "heatmap") # Customize and save p <- sensitivity_plot(bounds, param = "psi_sn") + labs(title = "My Custom Title") + theme(legend.position = "bottom") ggsave("sensitivity_plot.pdf", p, width = 8, height = 6) ## End(Not run)
Constructor for sensitivity_region S7 objects. Issues a warning if the region may be non-informative (Sn + Sp <= 1).
sensitivity_region(sn0_range, sp0_range, psi_sn_range, psi_sp_range)sensitivity_region(sn0_range, sp0_range, psi_sn_range, psi_sp_range)
sn0_range |
Numeric vector of length 2: [min, max] for baseline sensitivity |
sp0_range |
Numeric vector of length 2: [min, max] for baseline specificity |
psi_sn_range |
Numeric vector of length 2: [min, max] for sensitivity OR |
psi_sp_range |
Numeric vector of length 2: [min, max] for specificity OR |
A sensitivity_region S7 object
Generates synthetic data with known differential misclassification for power analysis, methods validation, and simulation studies. Allows control over true causal parameters and misclassification mechanisms.
simulate_dm_data( n = 500, true_params = list(beta_AM = 0.405, theta_AY = 0.405, theta_MY = 0.405), dm_params = list(sn0 = 0.85, sp0 = 0.85, psi_sn = 1.5, psi_sp = 1), misclass_type = c("exposure", "mediator"), confounders = 1, confounder_params = list(type = "binary", effect_on_A = 0.3, effect_on_M = 0.3, effect_on_Y = 0.3), effect_modification = FALSE, interaction_coef = 0, seed = NULL, return_truth = TRUE, return_params = TRUE )simulate_dm_data( n = 500, true_params = list(beta_AM = 0.405, theta_AY = 0.405, theta_MY = 0.405), dm_params = list(sn0 = 0.85, sp0 = 0.85, psi_sn = 1.5, psi_sp = 1), misclass_type = c("exposure", "mediator"), confounders = 1, confounder_params = list(type = "binary", effect_on_A = 0.3, effect_on_M = 0.3, effect_on_Y = 0.3), effect_modification = FALSE, interaction_coef = 0, seed = NULL, return_truth = TRUE, return_params = TRUE )
n |
Integer. Sample size. Default is 500. |
true_params |
Named list of true causal parameters:
|
dm_params |
Named list of misclassification parameters:
|
misclass_type |
Character string. Either "exposure" or "mediator" to indicate which variable is misclassified. Default is "exposure". |
confounders |
Integer. Number of confounding variables to include. Default is 1. Set to 0 for no confounders. |
confounder_params |
Named list controlling confounder generation:
|
effect_modification |
Logical. Should there be effect modification (interaction between A and M on Y)? Default is FALSE. |
interaction_coef |
Numeric. Interaction coefficient if effect_modification=TRUE. Default is 0. |
seed |
Integer. Random seed for reproducibility. If NULL, no seed is set. |
return_truth |
Logical. Should true (unobserved) values be returned? Default is TRUE. |
return_params |
Logical. Should true causal effects be calculated and returned? Default is TRUE. |
An S7 object of class simulated_dm_data containing:
observed |
Data frame with observed (potentially misclassified) variables |
truth |
Data frame with true (unobserved) values (if return_truth=TRUE) |
true_effects |
List of true causal effects (if return_params=TRUE) |
generation_params |
Parameters used for data generation |
misclassification_applied |
Summary of applied misclassification |
## Not run: # Basic simulation with moderate effects and mild DM sim_data <- simulate_dm_data( n = 500, true_params = list( beta_AM = 0.405, # OR = 1.5 theta_AY = 0.405, # OR = 1.5 theta_MY = 0.405 # OR = 1.5 ), dm_params = list( sn0 = 0.85, sp0 = 0.85, psi_sn = 1.5, psi_sp = 1.0 ), misclass_type = "exposure", seed = 12345 ) # Use in analysis bounds <- bound_ne( data = sim_data@observed, exposure = "A_star", mediator = "M", outcome = "Y", confounders = "C1", misclassified_variable = "exposure", sensitivity_region = list( sn0_range = c(0.80, 0.90), sp0_range = c(0.80, 0.90), psi_sn_range = c(1.0, 2.0), psi_sp_range = c(1.0, 1.0) ) ) ## End(Not run)## Not run: # Basic simulation with moderate effects and mild DM sim_data <- simulate_dm_data( n = 500, true_params = list( beta_AM = 0.405, # OR = 1.5 theta_AY = 0.405, # OR = 1.5 theta_MY = 0.405 # OR = 1.5 ), dm_params = list( sn0 = 0.85, sp0 = 0.85, psi_sn = 1.5, psi_sp = 1.0 ), misclass_type = "exposure", seed = 12345 ) # Use in analysis bounds <- bound_ne( data = sim_data@observed, exposure = "A_star", mediator = "M", outcome = "Y", confounders = "C1", misclassified_variable = "exposure", sensitivity_region = list( sn0_range = c(0.80, 0.90), sp0_range = c(0.80, 0.90), psi_sn_range = c(1.0, 2.0), psi_sp_range = c(1.0, 1.0) ) ) ## End(Not run)
S7 class for storing simulated data with known differential misclassification, used for power analysis and methods validation.
observed |
Data frame with observed (potentially misclassified) variables |
truth |
Data frame with true (unobserved) values |
true_effects |
List of true causal effects |
generation_params |
List of parameters used to generate the data |
misclassification_applied |
List of misclassification parameters applied |
Summary Method for compatibility_test
## S3 method for class 'compatibility_test' summary(object, ...)## S3 method for class 'compatibility_test' summary(object, ...)
object |
An object of class |
... |
Additional arguments (currently unused) |
Invisibly returns the input object
Test multiple hypotheses about misclassification parameters simultaneously.
test_multiple_hypotheses( data, exposure, mediator, outcome, confounders, psi_list, misclassified_variable )test_multiple_hypotheses( data, exposure, mediator, outcome, confounders, psi_list, misclassified_variable )
data |
Data frame |
exposure |
Character string |
mediator |
Character string |
outcome |
Character string |
confounders |
Character vector |
psi_list |
List of parameter sets to test |
misclassified_variable |
Character string |
Data frame with test results for each hypothesis