Overview
The bound_ne() function computes partial identification bounds by evaluating compatibility across a 4-dimensional parameter space (sn0, sp0, ψ_sn, ψ_sp). This vignette describes 6 advanced grid search algorithms that dramatically reduce computation time while maintaining accuracy.
Key insight: These methods reduce evaluations by 90-99% compared to exhaustive grid search, enabling practical analysis with large grids.
The Computational Challenge
Exponential Complexity
With a regular grid approach:
- 4 dimensions: sn0, sp0, ψ_sn, ψ_sp
- Grid resolution:
n_grid points per dimension
- Total evaluations: n_grid4
Examples:
This exponential growth makes fine-grained grids (n_grid ≥ 50) impractical with exhaustive search.
The Solution: Smart Sampling
Instead of evaluating all n_grid4 points, advanced algorithms sample strategically:
- Space-filling designs (LHS, Sobol): Ensure coverage with fewer points
- Adaptive refinement: Focus on compatible regions
- Boundary search: Find edges efficiently
- Auto-selection: Choose best method automatically
Available Grid Methods
1. Latin Hypercube Sampling ("lhs") ⭐ DEFAULT
What it is: A space-filling experimental design that ensures uniform coverage across all dimensions.
How it works: 1. Divide each parameter range into N equal intervals 2. Sample once from each interval (guaranteed coverage) 3. Randomly permute assignments across dimensions 4. Result: N well-distributed points in 4D space
Performance:
When to use: - Default choice for most analyses - Exploratory data analysis - Large grids (n_grid ≥ 20) - Need broad parameter space coverage
Example:
# Default: LHS is used automatically
bounds <- bound_ne(
data = data,
exposure = "A_star",
mediator = "M",
outcome = "Y",
confounders = c("C1", "C2"),
misclassified_variable = "exposure",
sensitivity_region = sens_region,
n_grid = 50 # LHS evaluates ~2,500 points instead of 6.25M
)
2. Regular Grid ("regular")
What it is: Exhaustive evaluation of all n_grid4 combinations.
Performance:
- Evaluations: n_grid4 (always)
- Speed: Baseline (1x)
- Accuracy: Exact min/max over grid
When to use:
- Publication-quality results requiring exact bounds
- Small grids (n_grid ≤ 10) where speed is acceptable
- When computational budget allows
- Critical policy decisions
Example:
# Exact bounds (slow for large grids)
bounds_exact <- bound_ne(
...,
n_grid = 10,
grid_method = "regular" # 10,000 evaluations
)
3. Sobol Sequences ("sobol")
What it is: Low-discrepancy quasi-random sequences ensuring even coverage.
How it differs from LHS: - Deterministic (no random seed needed) - Better uniformity in high dimensions - Slightly better coverage for >4 dimensions
Performance: - Evaluations: sqrt(n_grid4) (same as LHS) - Speed: 50-100x faster than regular - Accuracy: Similar to LHS
When to use: - High-dimensional extensions (future features) - Need reproducibility without setting seed - Sparse compatible regions
Example:
bounds_sobol <- bound_ne(
...,
n_grid = 50,
grid_method = "sobol"
)
4. Adaptive Grid ("adaptive")
What it is: Two-stage coarse-to-fine refinement.
How it works: 1. Coarse stage: Evaluate small grid (e.g., 3×3×3×3) 2. Identify compatible regions: Find where constraints are satisfied 3. Fine stage: Refine only around compatible regions 4. Result: Focus computational effort where it matters
Performance: - Evaluations: 1-20% of full grid (when falsification is high) - Speed: 5-10x faster - Accuracy: Good for identifying compatible regions
When to use:
- High falsification rate expected (>80% incompatible)
- Sparse compatible regions
- Medium-sized grids (10 ≤ n_grid ≤ 30)
Example:
bounds_adaptive <- bound_ne(
...,
n_grid = 20,
grid_method = "adaptive"
)
5. Binary Search on Bounds ("binary")
What it is: Find exact boundaries between compatible/incompatible regions.
How it works: 1. Test corner points of parameter space 2. For each parameter, binary search to find boundaries 3. Sample densely near boundaries (Beta distribution) 4. Focus on finding min/max effect values
Performance: - Evaluations: O(log n) per dimension when monotonic - Speed: 10-50x faster - Accuracy: Excellent for boundaries
When to use: - Compatibility is monotonic in parameters - Need precise boundary estimates - Computational constraints
Limitations: - Assumes monotonicity (not always satisfied) - May miss complex interior structures - Falls back to LHS if assumptions fail
Example:
bounds_binary <- bound_ne(
...,
n_grid = 50,
grid_method = "binary"
)
6. Auto-Selection ("auto")
What it is: Automatically selects best method based on problem characteristics.
How it works: 1. Probe 16 corner points of parameter space 2. Estimate compatibility rate 3. Select method based on heuristics: - 100% compatible → Regular grid (all points useful) - 0% compatible → LHS (need dense search) - <25% compatible → Sobol (sparse search) - >75% compatible → Binary search (find edges) - 25-75% compatible → LHS (balanced)
Performance: - Evaluations: 100-500 typically - Speed: Optimal for problem - Overhead: Minimal (16 evaluations)
When to use: - Unsure which method is best - Problem characteristics unknown - Exploratory analysis
Example:
bounds_auto <- bound_ne(
...,
n_grid = 50,
grid_method = "auto" # Adapts to your data
)
Test Case: Exposure Misclassification
Results (n=1000, n_grid=10, 2 confounders):
Key observations: - LHS is 67x faster than regular grid - Bounds are similar across methods (slight variations due to sampling) - All methods identify the same general bound ranges
Recommended Workflow
Three-Stage Analysis
Stage 1: Quick Exploration
Use LHS with moderate grid for initial insights:
# Fast exploration (~30 seconds)
bounds_quick <- bound_ne(
data = data,
exposure = "A_star",
mediator = "M",
outcome = "Y",
confounders = c("C1", "C2"),
misclassified_variable = "exposure",
sensitivity_region = sens_region,
n_grid = 20,
grid_method = "lhs", # Fast exploration
verbose = TRUE
)
print(bounds_quick)
Stage 2: Refinement
Increase resolution or use auto-selection:
# Refined analysis (~2 minutes)
bounds_refined <- bound_ne(
data = data,
exposure = "A_star",
mediator = "M",
outcome = "Y",
confounders = c("C1", "C2"),
misclassified_variable = "exposure",
sensitivity_region = sens_region,
n_grid = 50,
grid_method = "auto", # Adapts to data characteristics
verbose = TRUE
)
Stage 3: Publication-Quality Results
Use regular grid with parallel processing:
# Exact bounds with parallel processing (~10-20 minutes)
library(parallel)
n_cores <- detectCores() - 1
bounds_exact <- bound_ne(
data = data,
exposure = "A_star",
mediator = "M",
outcome = "Y",
confounders = c("C1", "C2"),
misclassified_variable = "exposure",
sensitivity_region = sens_region,
n_grid = 100, # High resolution
grid_method = "regular", # Exact bounds
parallel = TRUE,
n_cores = n_cores,
verbose = TRUE
)
Accuracy vs. Speed Trade-offs
Bound Width Differences
Sampling methods (LHS, Sobol) produce approximate bounds:
When is approximation acceptable?
✅ Use sampling methods (LHS, Sobol) for: - Exploratory data analysis - Sensitivity checking - Interactive workflows - Preliminary results - Computational constraints
❌ Use exact methods (regular grid) for: - Final publication results - Critical policy decisions - Small grids (n_grid ≤ 10, cost is low) - When exact bounds are required
Combining with Parallel Processing
All grid methods benefit from parallelization:
# LHS + Parallel = Fastest combination
bounds_fast <- bound_ne(
data = data,
exposure = "A_star",
mediator = "M",
outcome = "Y",
confounders = c("C1", "C2"),
misclassified_variable = "exposure",
sensitivity_region = sens_region,
n_grid = 50,
grid_method = "lhs", # 99% fewer evaluations
parallel = TRUE, # Additional speedup
n_cores = 8,
verbose = TRUE
)
# Expected time: ~10-20 seconds
# vs. ~30 hours with sequential regular grid
Performance multipliers: - LHS alone: 50-100x faster - Parallel (8 cores): 5-7x faster - Combined: 250-700x faster!
Selection Guide
Algorithm Implementation Details
Reproducibility
Sampling methods (LHS, Sobol):
# Set seed for reproducibility
set.seed(42)
bounds1 <- bound_ne(..., grid_method = "lhs")
set.seed(42)
bounds2 <- bound_ne(..., grid_method = "lhs")
# bounds1 and bounds2 will be identical
Sobol sequences: Deterministic, no seed needed for reproducibility.
Memory Efficiency
All methods use: - Pre-computed probabilities: Calculate observed distributions once - Vectorized operations: Process combinations in batches - Early termination: Stop evaluation at first constraint violation - Streaming results: Don’t store incompatible parameter sets
Computational Complexity
Where: - n_grid: grid resolution - n_coarse: coarse grid size (typically 3-5) - k: proportion of compatible coarse regions - n_boundary: points near boundaries - n_compatible: number of compatible parameter sets (stored)
Practical Examples
Example 1: Quick sensitivity check
# 30-second sensitivity check
bounds <- bound_ne(
data = mydata,
exposure = "treatment",
mediator = "mediator_measure",
outcome = "outcome",
confounders = c("age", "sex", "baseline"),
misclassified_variable = "mediator",
sensitivity_region = list(
sn0_range = c(0.7, 0.9),
sp0_range = c(0.7, 0.9),
psi_sn_range = c(0.8, 1.5),
psi_sp_range = c(0.8, 1.5)
),
n_grid = 20,
grid_method = "lhs" # Fast
)
Example 2: Publication-ready analysis
# High-resolution exact bounds
bounds_pub <- bound_ne(
data = mydata,
exposure = "treatment",
mediator = "mediator_measure",
outcome = "outcome",
confounders = c("age", "sex", "baseline"),
misclassified_variable = "mediator",
sensitivity_region = list(
sn0_range = c(0.7, 0.9),
sp0_range = c(0.7, 0.9),
psi_sn_range = c(0.8, 1.5),
psi_sp_range = c(0.8, 1.5)
),
n_grid = 100, # High resolution
grid_method = "regular", # Exact
parallel = TRUE,
n_cores = 8,
bootstrap = TRUE, # Add CIs
bootstrap_reps = 1000
)
Example 3: Comparing methods
# Compare different grid methods
methods <- c("lhs", "sobol", "binary", "regular")
results <- list()
for (method in methods) {
cat("\nTesting method:", method, "\n")
results[[method]] <- system.time({
bounds <- bound_ne(
data = mydata,
exposure = "A_star",
mediator = "M",
outcome = "Y",
confounders = c("C1", "C2"),
misclassified_variable = "exposure",
sensitivity_region = sens_region,
n_grid = 10,
grid_method = method,
verbose = FALSE
)
})
cat(" Time:", results[[method]]["elapsed"], "seconds\n")
cat(" NIE bounds:", bounds@NIE_lower, "-", bounds@NIE_upper, "\n")
cat(" NDE bounds:", bounds@NDE_lower, "-", bounds@NDE_upper, "\n")
}
Future Enhancements
Potential algorithm additions:
- Bayesian Optimization: Learn from evaluations to guide search toward optimal regions
- Particle Swarm: Use swarm intelligence for global optimization
- Gradient-based methods: When bounds are differentiable
- Hybrid approaches: Combine methods (e.g., Sobol initialization + adaptive refinement)
- Uncertainty-guided sampling: Add points where bound uncertainty is highest
References
McKay, M. D., Beckman, R. J., & Conover, W. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2), 239-245. https://doi.org/10.2307/1268522
Sobol’, I. M. (1967). On the distribution of points in a cube and the approximate evaluation of integrals. USSR Computational Mathematics and Mathematical Physics, 7(4), 86-112. https://doi.org/10.1016/0041-5553(67)90144-9