---
title: "Getting Started with medrobust"
author: "Davood Tofighi, Ph.D."
date: today
format:
  html:
    toc: true
    toc-depth: 3
    code-fold: false
    code-tools: true
    theme: cosmo
    highlight-style: github
    df-print: paged
execute:
  eval: true
  echo: true
vignette: >
  %\VignetteIndexEntry{Getting Started with medrobust}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
---

## Introduction

The `medrobust` package provides tools for conducting sensitivity analysis for causal mediation effects when the exposure or mediator is measured with **differential misclassification**. This is particularly important when:

- The outcome may influence recall or reporting of the exposure/mediator (recall bias)
- Measurement error depends on other variables in the causal system
- Traditional measurement error correction methods requiring validation data are infeasible
- Gold-standard measurements are unavailable or too costly to obtain

### What is Differential Misclassification?

**Differential misclassification** occurs when the probability of mismeasurement depends on other variables. For example:

- **Recall bias**: Participants with the outcome (e.g., disease) may remember exposures differently than those without
- **Outcome-dependent measurement error**: The outcome affects how the mediator or exposure is measured
- **Non-random measurement error**: Sensitivity and specificity vary across strata

This contrasts with **non-differential misclassification**, where measurement error is independent of other variables.

### The Problem

Standard mediation analysis methods assume perfect measurement. When differential misclassification is present:

- Point estimates of mediation effects are **biased**
- Confidence intervals have **incorrect coverage**
- Causal conclusions may be **invalid**

Traditional measurement error correction requires:

- Validation data with gold-standard measurements
- Strong parametric assumptions about error structure
- These are often unavailable in practice

### The Solution: Partial Identification

Instead of point estimation under strong assumptions, `medrobust` uses **partial identification** to:

1. Derive **bounds** on causal effects that remain valid under differential misclassification
2. Test whether observed data are **compatible** with hypothesized misclassification parameters
3. **Falsify** implausible parameter combinations using testable implications
4. Provide **honest uncertainty quantification** via bootstrap confidence intervals

## Installation

```{r}
#| label: install-chunk
#| eval: false
# Install from GitHub
devtools::install_github("data-wise/medrobust")
```

```{r}
#| label: load-package
library(medrobust)
library(parallel)
n_cores <- detectCores() - 2 # Leave two cores free
```

## Key Concepts

### Natural Direct and Indirect Effects

In causal mediation analysis, we decompose the total effect of exposure $A$ on outcome $Y$ into:

- **Natural Direct Effect (NDE)**: Effect of $A$ on $Y$ not mediated through $M$
- **Natural Indirect Effect (NIE)**: Effect of $A$ on $Y$ mediated through $M$

On the odds ratio scale:

- $\text{Total Effect} = \text{NDE} \times \text{NIE}$

### Misclassification Parameters

The package uses four key parameters to characterize differential misclassification:

**Baseline parameters** (when outcome $Y = 0$):

- `sn0`: Sensitivity (probability of correctly classifying a true positive)
- `sp0`: Specificity (probability of correctly classifying a true negative)

**Differential parameters** (how they change when $Y = 1$):

- `psi_sn`: Sensitivity odds ratio comparing $Y=1$ to $Y=0$
- `psi_sp`: Specificity odds ratio comparing $Y=1$ to $Y=0$

**Special cases:**

- **Non-differential misclassification**: `psi_sn = 1.0` and `psi_sp = 1.0`
- **Perfect measurement**: `sn0 = 1.0` and `sp0 = 1.0`

## Basic Workflow

The typical analysis follows these steps:

1. **Simulate or load data**
2. **Define sensitivity region** (plausible range of misclassification parameters)
3. **Compute bounds** for NDE and NIE
4. **Examine compatibility** with hypothesized parameters
5. **Visualize results** using plot() and sensitivity_plot()
6. **Perform inference** via bootstrap
7. **Use generic methods** (summary, print, as.data.frame, as.list)
8. **Power analysis** (optional: plan future studies)

## Example 1: Exposure Misclassification

Let's analyze a scenario where the exposure is differentially misclassified.

### Step 1: Generate Synthetic Data

```{r}
#| label: data-generation
#| tbl-cap: "Simulated data with differential exposure misclassification"
#| echo: true
#| message: false
# Set parameters for data generation
set.seed(123)

# True causal parameters
true_params <- list(
  beta_AM = log(1.5), # Effect of A on M (OR = 1.5)
  theta_AY = log(1.3), # Direct effect of A on Y (OR = 1.3)
  theta_MY = log(1.4), # Effect of M on Y (OR = 1.4)
  p_A = 0.4 # Marginal probability of A
)

# Misclassification parameters (differential)
dm_params <- list(
  sn0 = 0.85, # Sensitivity when Y=0
  sp0 = 0.90, # Specificity when Y=0
  psi_sn = 1.5, # Sensitivity increases when Y=1 (recall bias)
  psi_sp = 1.0 # Specificity unchanged
)

# Generate data with exposure misclassification
sim_data <- simulate_dm_data(
  n = 1000,
  true_params = true_params,
  dm_params = dm_params,
  misclass_type = "exposure",
  confounders = 2,
  seed = 123
)

# View structure
print(sim_data)

# Check the class
cat("\nClass of sim_data:", class(sim_data), "\n")
```

The simulated data object is an S7 object that contains:

- `observed`: The data we actually observe (with misclassified exposure)
- `truth`: The true underlying data (for validation purposes)
- `true_effects`: The true causal effects we're trying to estimate
- `generation_params`: Parameters used to generate the data

```{r}
# Access the observed data using @ for S7 properties
head(sim_data@observed)
```

### Step 2: Define Sensitivity Region

We specify plausible ranges for the misclassification parameters:

```{r}
#| label: tbl-sensitivity-region
#| tbl-cap: "Defined sensitivity region for exposure misclassification"
#| echo: true
#| message: false

# Define sensitivity region
sens_region <- sensitivity_region(
  sn0_range = c(0.70, 0.95), # Sensitivity ranges from 70% to 95%
  sp0_range = c(0.80, 0.95), # Specificity ranges from 80% to 95%
  psi_sn_range = c(1.0, 2.0), # Sensitivity OR from 1.0 to 2.0
  psi_sp_range = c(1.0, 1.0) # No differential specificity
)

print(sens_region)
```

### Step 3: Compute Bounds

```{r}
#| label: tbl-compute-bounds
#| tbl-cap: "Computed bounds on NIE and NDE under differential exposure misclassification"
#| echo: true
#| message: false

# Compute bounds over the sensitivity region
#
# PERFORMANCE NOTE:
# - n_grid = 10 creates 10^4 = 10,000 parameter combinations to evaluate
# - grid_method = "lhs" (Latin Hypercube Sampling) reduces evaluations significantly
# - For faster computation, enable parallel processing (see below)
# - For production analyses, use n_grid = 50 or higher for better resolution
#
# For CRAN vignette, we disable parallel processing
bounds <- bound_ne(
  data = sim_data@observed,
  exposure = "A_star", # Misclassified exposure
  mediator = "M",
  outcome = "Y",
  confounders = c("C1", "C2"),
  misclassified_variable = "exposure",
  sensitivity_region = sens_region,
  n_grid = 10, # Grid resolution (use 50+ for production)
  effect_scale = "OR",
  parallel = FALSE, # Set to TRUE with n_cores for faster processing in production
  verbose = FALSE,
  grid_method = "lhs" # (default) Latin Hypercube Sampling for efficiency
)

# View results
print(bounds)
```

The output shows:

- **Bounds on NIE and NDE**: The range of plausible causal effect estimates
- **Width of bounds**: How much uncertainty remains
- **Sensitivity analysis summary**: How many parameter sets are compatible vs. falsified

```{r}
#| label: tbl-bounds-summary
#| tbl-cap: "Detailed summary of bounds and falsification results"
#| message: false

# Get more detailed summary
summary(bounds)
```

### Step 4: Visualize Results

```{r}
#| label: fig-bounds-plot
#| fig-cap: "Partial identification bounds for NIE and NDE"
#| fig-width: 7
#| fig-height: 5
#| message: false

# Use the plot() method to visualize bounds
plot(bounds)
```

The plot() method creates a clear visualization showing the bounds for NIE and NDE as error bars.

This plot shows:

- **Error bars**: The range of bounds for NIE and NDE
- **Points**: Lower and upper bounds for each effect
- **Dashed red line**: Null hypothesis value (OR = 1)
- **Subtitle**: Number of compatible vs. evaluated parameter sets

### Step 5: Test Specific Hypotheses

We can test whether specific misclassification parameters are compatible with the data:

```{r}
#| label: tbl-compatibility-test
#| tbl-cap: "Compatibility test results for specified misclassification parameters"
#| message: false

# Test specific misclassification parameters
# psi must contain all four parameters: sn0, sp0, psi_sn, psi_sp
compatibility <- check_compatibility(
  data = sim_data@observed,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = c("C1", "C2"),
  misclassified_variable = "exposure",
  psi = list(
    sn0 = 0.85, # Baseline sensitivity
    sp0 = 0.90, # Baseline specificity
    psi_sn = 1.5, # Differential sensitivity (OR)
    psi_sp = 1.0 # Non-differential specificity
  )
)

print(compatibility)
```

### Step 6: Bootstrap Inference

For inference, we can compute bootstrap confidence intervals:

```{r}
#| label: bootstrap-bounds
#| tbl-cap: "Computed bounds with bootstrap confidence intervals"
#| message: false

# This takes longer, so we use fewer bootstrap replications for the vignette
bounds_with_ci <- bound_ne(
  data = sim_data@observed,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = c("C1", "C2"),
  misclassified_variable = "exposure",
  sensitivity_region = sens_region,
  n_grid = 10,
  bootstrap = TRUE,
  bootstrap_reps = 100, # Use 1000+ for production
  parallel = FALSE, # Set to FALSE for CRAN vignette check
  confidence_level = 0.95,
  verbose = TRUE,
  grid_method = "lhs" # (default) Latin Hypercube Sampling for efficiency
)

print(bounds_with_ci)
```

### Step 7: Using Generic Methods

The package provides standard S7 generic methods for all result objects:

```{r}
#| label: generic-methods
#| message: false

# Summary method provides detailed statistics
summary(bounds)

# Convert to data frame for further analysis
bounds_df <- as.data.frame(bounds)
head(bounds_df)

# Convert to list for programmatic access
bounds_list <- as.list(bounds)
names(bounds_list)

# Plot method for visualization
plot(bounds)
```

### Step 8: Sensitivity Plots

Create customized sensitivity plots to visualize how bounds vary across different misclassification parameters:

```{r}
#| label: fig-sensitivity-params
#| fig-cap: "Bounds vs. sensitivity parameter (psi_sn)"
#| fig-width: 8
#| fig-height: 5
#| message: false

# Plot bounds as a function of sensitivity odds ratio
sensitivity_plot(
  bounds,
  param = "psi_sn",
  effect = "both",
  show_naive = TRUE,
  show_null = TRUE
)
```

```{r}
#| label: fig-sensitivity-baseline
#| fig-cap: "Bounds vs. baseline sensitivity (sn0)"
#| fig-width: 8
#| fig-height: 5
#| message: false

# Plot bounds as a function of baseline sensitivity
sensitivity_plot(bounds, param = "sn0", effect = "NIE", theme = "minimal")
```

These plots show:

- **Ribbons**: Range of bounds across all compatible parameter sets for each parameter value
- **Dashed lines**: Upper and lower bounds
- **Horizontal lines**: Naive estimates (assuming no misclassification) and null values
- **How bounds vary**: As misclassification parameters change

## Power Analysis

Power analysis helps determine the sample size needed to detect mediation effects despite measurement error.

### Planning a Study

```{r}
#| label: power-analysis-basic
#| tbl-cap: "Power analysis results for different sample sizes"
#| message: false

# Conduct power analysis
power_result <- power_analysis(
  true_params = list(
    beta_AM = log(1.5), # A → M effect
    theta_AY = log(1.3), # A → Y direct effect
    theta_MY = log(1.4) # M → Y effect
  ),
  dm_params = list(
    sn0 = 0.85,
    sp0 = 0.90,
    psi_sn = 1.5,
    psi_sp = 1.0
  ),
  sensitivity_region = sens_region,
  misclass_type = "exposure",
  sample_sizes = c(500, 1000, 2000),
  n_sim = 50, # Use 500+ for production
  n_grid = 10,
  parallel = FALSE,
  verbose = FALSE
)

# View results
print(power_result)
```

### Understanding Power Results

```{r}
#| label: power-summary
#| message: false

# Detailed summary
summary(power_result)

# Convert to data frame for custom analysis
power_df <- as.data.frame(power_result)
print(power_df)
```

### Visualizing Power Curves

```{r}
#| label: fig-power-curve
#| fig-cap: "Statistical power as a function of sample size"
#| fig-width: 8
#| fig-height: 6
#| message: false

# Plot power curves
plot(power_result)
```

The power plot shows:

- **Power curves**: Probability of detecting effects at different sample sizes
- **Target power line**: Common threshold at 0.80 (80% power)
- **Separate curves**: For NIE and NDE effects
- **Planning insight**: Sample size needed to achieve desired power

### Interpreting Power Analysis Results

The `power_analysis()` function computes:

- **Power**: Probability that confidence intervals exclude the null value
- **Coverage**: Actual coverage probability of confidence intervals
- **Bias**: Average difference between estimates and true values
- **MSE**: Mean squared error of estimates

**Example interpretation:**

If power = 0.85 for NIE at n = 1000:
- With 1000 participants, you have 85% probability of detecting the mediation effect
- This assumes the specified effect sizes and misclassification parameters
- You can be confident the study is adequately powered

## Example 2: Mediator Misclassification

Now let's consider mediator misclassification instead:

```{r}
#| label: tbl-mediator-misclass
#| tbl-cap: "Computed bounds on NIE and NDE under differential mediator misclassification"
#| message: false

# Generate data with mediator misclassification
sim_data_med <- simulate_dm_data(
  n = 1000,
  true_params = true_params,
  dm_params = dm_params,
  misclass_type = "mediator", # Mediator is misclassified
  confounders = 1,
  seed = 456
)

# Define sensitivity region for mediator misclassification
sens_region_med <- sensitivity_region(
  sn0_range = c(0.75, 0.90),
  sp0_range = c(0.75, 0.90),
  psi_sn_range = c(1.0, 1.5),
  psi_sp_range = c(1.0, 1.5)
)

# Compute bounds
bounds_med <- bound_ne(
  data = sim_data_med@observed,
  exposure = "A",
  mediator = "M_star", # Misclassified mediator
  outcome = "Y",
  confounders = "C1",
  misclassified_variable = "mediator",
  sensitivity_region = sens_region_med,
  n_grid = 10,
  verbose = FALSE,
  grid_method = "lhs" # (default) Latin Hypercube Sampling for efficiency
)

print(bounds_med)
```

## Example 3: Non-Differential Misclassification

As a special case, we can handle non-differential misclassification by setting `psi_sn = 1.0` and `psi_sp = 1.0`:

```{r}
#| label: tbl-nondiff-misclass
#| tbl-cap: "Computed bounds on NIE and NDE under non-differential exposure misclassification"
#| echo: true
#| message: false

# Non-differential misclassification: error doesn't depend on Y
sens_region_nondiff <- sensitivity_region(
  sn0_range = c(0.80, 0.90),
  sp0_range = c(0.80, 0.90),
  psi_sn_range = c(1.0, 1.0), # No differential sensitivity
  psi_sp_range = c(1.0, 1.0) # No differential specificity
)

bounds_nondiff <- bound_ne(
  data = sim_data@observed,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = c("C1", "C2"),
  misclassified_variable = "exposure",
  sensitivity_region = sens_region_nondiff,
  n_grid = 10,
  verbose = FALSE
)

print(bounds_nondiff)
```

Notice that bounds are typically **tighter** under non-differential misclassification compared to differential misclassification, because we've made a stronger assumption.

## Understanding the Output

### Bound Interpretation

The bounds tell us:

- **NIE bounds**: The range of plausible natural indirect effects (mediated effect)
- **NDE bounds**: The range of plausible natural direct effects
- **Width**: The amount of uncertainty remaining

**Example interpretation:**

If NIE is bounded in `[1.2, 1.8]` on the odds ratio scale:

- The true mediated effect is at least 20% increase in odds (OR ≥ 1.2)
- The true mediated effect is at most 80% increase in odds (OR ≤ 1.8)
- We cannot pin down the effect more precisely without stronger assumptions

### Compatibility Testing

The `check_compatibility()` function tests whether hypothesized misclassification parameters are **consistent with the observed data** using testable implications.

**Compatible**: The parameters could have generated the observed data
**Falsified**: The parameters are inconsistent with the data (violated testable constraints)

### Falsification Analysis

The sensitivity analysis automatically performs falsification:

- **Compatible sets**: Parameter combinations that satisfy all testable implications
- **Falsified proportion**: What fraction of the sensitivity region is ruled out

A high falsification rate (e.g., 95%) means the data strongly constrain plausible scenarios.

## Advanced Features

### Extracting Results

You can extract various components from the results using built-in methods:

```{r}
#| label: extract-results
#| eval: false
#| message: false

# Convert to data frame for custom analysis
bounds_df <- as.data.frame(bounds)
print(bounds_df)

# Convert to list for programmatic access
bounds_list <- as.list(bounds)
names(bounds_list)

# Access compatible parameter sets directly
compatible_sets <- bounds@compatible_sets
print(head(compatible_sets))
```

### Comparing Different Scenarios

```{r}
#| label: compare-bounds
#| message: false
# Compare bounds under different sensitivity assumptions
comparison <- compare_bounds(
  bounds_list = list(
    "Differential" = bounds,
    "Non-differential" = bounds_nondiff
  )
)
print(comparison)
```

### Falsification Summary

```{r}
#| label: tbl-falsification-summary
#| tbl-cap: "Falsification summary for the sensitivity analysis"
#| message: false

# Get detailed falsification summary
falsif_summary <- falsification_summary(bounds)
print(falsif_summary)
```

## Practical Recommendations

### Choosing the Sensitivity Region

1. **Literature review**: What misclassification rates have been reported in similar studies?
2. **Pilot studies**: Can you conduct a small validation study?
3. **Expert judgment**: Consult domain experts on plausible ranges
4. **Conservative approach**: Use wide ranges initially, then refine based on falsification

### Grid Resolution

- Start with `n_grid = 10` for exploration
- Use `n_grid = 20-50` for publication
- Higher values increase computational time but improve precision

### Bootstrap Replications

- Use `n_bootstrap = 1000` or more for final results
- Percentile method is fast and simple
- BCa method provides better coverage but is slower

### Parallel Processing

For large datasets or fine grids, enable parallelization:

```{r}
#| label: parallel-bounds
#| eval: false
bounds_parallel <- bound_ne(
  data = sim_data@observed,
  exposure = "A_star",
  mediator = "M",
  outcome = "Y",
  confounders = c("C1", "C2"),
  misclassified_variable = "exposure",
  sensitivity_region = sens_region,
  n_grid = 50,
  parallel = TRUE,
  n_cores = 4
)
```

## Interpreting Results

### When Bounds are Tight

If bounds are narrow (e.g., NIE in [1.3, 1.4]):

- Strong evidence for mediation despite measurement error
- Conclusions are robust to misclassification assumptions
- Sensitivity region may be well-chosen

### When Bounds are Wide

If bounds are wide (e.g., NIE in [0.8, 2.5]):

- High uncertainty about mediation effects
- Need stronger assumptions or better measurements
- Consider collecting validation data

### When Most Parameters Are Falsified

If 90%+ of sensitivity region is falsified:

- Data strongly constrain plausible scenarios
- Observed patterns rule out many error mechanisms
- Tighter bounds may be achievable with refined sensitivity region

## Common Use Cases

### 1. Recall Bias in Epidemiology

When participants with disease may recall exposures differently:

```{r}
#| label: recall-bias
# Disease may improve recall of past exposure
sens_region_recall <- sensitivity_region(
  sn0_range = c(0.60, 0.80), # Lower baseline sensitivity
  sp0_range = c(0.85, 0.95), # High specificity
  psi_sn_range = c(1.2, 2.5), # Cases recall better
  psi_sp_range = c(0.9, 1.1) # Specificity stable
)
```

### 2. Social Desirability Bias

When participants may underreport stigmatized behaviors:

```{r}
#| label: social-desirability-bias
# Underreporting of risk behaviors
sens_region_social <- sensitivity_region(
  sn0_range = c(0.50, 0.70), # Low sensitivity (underreporting)
  sp0_range = c(0.90, 0.98), # High specificity
  psi_sn_range = c(1.0, 1.0), # Non-differential
  psi_sp_range = c(1.0, 1.0)
)
```

### 3. Instrument Quality

When measurement instruments have known error rates:

```{r}
#| label: instrument-quality
# Based on validation study
sens_region_validated <- sensitivity_region(
  sn0_range = c(0.82, 0.88), # Narrow range from validation
  sp0_range = c(0.87, 0.93),
  psi_sn_range = c(1.0, 1.3), # Slight differential
  psi_sp_range = c(1.0, 1.0)
)
```

## Computational Performance

The `bound_ne()` function evaluates many parameter combinations, which can be time-consuming. Here are strategies to optimize performance:

### Grid Resolution Trade-offs

The `n_grid` parameter controls the number of points evaluated per dimension:

```{r}
#| label: grid-resolution
#| eval: false
#| message: false

# Quick exploration (625 combinations, ~10-30 seconds)
quick_bounds <- bound_ne(..., n_grid = 5)

# Standard analysis (10,000 combinations, 2-5 minutes)
standard_bounds <- bound_ne(..., n_grid = 10)

# High resolution (2.5 million combinations, 30-60 minutes)
detailed_bounds <- bound_ne(..., n_grid = 50)
```

**Recommendation**: Start with `n_grid = 5` for exploratory analysis, then increase to `n_grid = 10-20` for final results.

### Parallel Processing

Enable parallel processing to dramatically reduce computation time:

```{r}
#| label: parallel-processing
#| eval: false
#| message: false

# Detect available cores
library(parallel)
n_cores <- detectCores() - 2 # Leave two cores free

# Enable parallelization
bounds <- bound_ne(
  ...,
  parallel = TRUE,
  n_cores = n_cores # Use all available cores
)
```

**Performance gain**: With 8 cores, expect 5-7x speedup compared to single-core execution.

### Grid Search Algorithms

The `grid_method` parameter controls which algorithm is used to search the parameter space. The default is Latin Hypercube Sampling (LHS), which provides dramatic speedups:

```{r}
#| label: grid-methods
#| eval: false
#| message: false
# Latin Hypercube Sampling (default) - 99% fewer evaluations
bounds_lhs <- bound_ne(
  ...,
  n_grid = 10,
  grid_method = "lhs" # Default - fastest for most cases
)

# Regular exhaustive grid - exact but slow
bounds_regular <- bound_ne(
  ...,
  n_grid = 10,
  grid_method = "regular" # 10^4 = 10,000 evaluations
)

# Auto-select best method based on data characteristics
bounds_auto <- bound_ne(
  ...,
  n_grid = 10,
  grid_method = "auto" # Probes parameter space first
)
```

**Available methods**:

- `"lhs"` (default): Latin Hypercube Sampling - space-filling design that reduces evaluations by 99% while maintaining broad coverage (McKay et al., 1979)
- `"auto"`: Automatically selects best method based on problem characteristics
- `"regular"`: Exhaustive grid search (use for exact bounds when time permits)
- `"sobol"`: Sobol low-discrepancy sequences (Sobol, 1967) - similar to LHS
- `"adaptive"`: Two-stage coarse-to-fine refinement
- `"binary"`: Binary search on parameter boundaries (efficient when bounds are monotonic)

**Performance comparison** (n_grid = 10):

| Method  | Evaluations | Time      | Speedup |
| ------- | ----------- | --------- | ------- |
| Regular | 10,000      | 45 sec    | 1x      |
| LHS     | 100         | 0.7 sec   | 67x     |
| Sobol   | 100         | 0.7 sec   | 64x     |
| Auto    | 100-500     | 0.4-2 sec | 25-100x |

**Recommendation**: Use the default `"lhs"` for most analyses. Use `"regular"` only when exact bounds are required and computational budget allows.

### Caching Results

For repeated analyses with the same data:

```{r}
#| label: caching-results
#| eval: false
#| message: false
# Enable caching to reuse intermediate results
bounds <- bound_ne(
  ...,
  cache = TRUE,
  cache_dir = "cache/" # Optional: specify cache location
)
```

### Computational Complexity

Computation time scales as:

- **Grid size**: O(n_grid^4) - exponential in grid resolution
- **Sample size**: O(n) - linear in data size
- **Bootstrap**: O(bootstrap_reps) - linear in number of replicates

**Example timings** (approximate, on modern laptop):
- n_grid = 5, no bootstrap: 10-30 seconds
- n_grid = 10, no bootstrap: 2-5 minutes
- n_grid = 10, parallel (4 cores): 30-60 seconds
- n_grid = 50, parallel (8 cores): 10-20 minutes

## Limitations and Assumptions

The methods assume:

1. **No unmeasured confounding** of A-M, M-Y, and A-Y relationships
2. **Binary variables**: A, M, and Y are all binary
3. **Conditional exchangeability**: Standard causal identification assumptions hold
4. **Monotonicity**: Misclassification probabilities follow specified parametric form

The bounds are:

- **Partial identification**: Not point identification (intervals, not points)
- **Sensitive to sensitivity region**: Wide regions → wide bounds
- **Computationally intensive**: Grid search over parameter space

## Next Steps

After completing this introduction:

1. **Explore your own data**: Apply the methods to your research questions
2. **Read the methodology vignette**: Understand the theoretical foundations
3. **Check advanced examples**: See complex scenarios and diagnostics
4. **Consult the reference manual**: Detailed documentation of all functions

## Getting Help

- **Package documentation**: `?bound_ne`, `?simulate_dm_data`, etc.
- **GitHub issues**: Report bugs at <https://github.com/data-wise/medrobust/issues>
- **Vignettes**: Browse other vignettes for specific topics

## References

Tofighi, D. (2025). Partial identification bounds for causal mediation effects under differential misclassification. *Manuscript in preparation*.

VanderWeele, T. J. (2015). *Explanation in Causal Inference: Methods for Mediation and Interaction*. Oxford University Press.

McKay, M. D., Beckman, R. J., & Conover, W. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. *Technometrics*, 21(2), 239-245.

Sobol', I. M. (1967). On the distribution of points in a cube and the approximate evaluation of integrals. *USSR Computational Mathematics and Mathematical Physics*, 7(4), 86-112.

## Session Information

```{r}
sessionInfo()
```