# Multiverse Analysis Engine π’π
xarray-based multiverse engine testing vagueness β funding/success hypotheses across specification space.
## π― Overview
This engine implements a comprehensive multiverse analysis framework for testing strategic ambiguity (vagueness) hypotheses in venture funding. It systematically tests hypotheses across a multidimensional specification grid, providing robust evidence for:
- **H1 (Early Stage)**: Vagueness β Early Funding (Expected: negative, clarity premium)
- **H2 (Later Stage)**: Vagueness β Growth (Expected: positive, flexibility value)
- **Moderation**: Effects amplified by option exercisability and software industry
## π¦ Installation
### Dependencies
```bash
pip install -r requirements.txt
```
Required packages:
- pandas >= 2.0
- numpy >= 1.24
- xarray >= 2023.1
- statsmodels >= 0.14
- patsy >= 0.5
- scipy >= 1.10
- matplotlib >= 3.7
- seaborn >= 0.12
## π Quick Start
### 1. Run Multiverse Analysis
```bash
python run_multiverse.py --input /path/to/your/data.csv --outdir results/
```
### 2. View Results
Results will be saved to `results/`:
- `multiverse_results.nc` - xarray Dataset (NetCDF format)
- `spec_table.csv` - Full specification table
- `summary_stats.txt` - Summary statistics
- `*.png` - Visualization heatmaps and curves
## π Specification Grid
The multiverse spans **384 specifications**:
```python
{
"stage": ["E", "L1", "L2"], # 3 stages
"window": [3 time windows], # 3 windows
"scaling": ["zscore", "winsor99_z"], # 2 methods
"moderator": ["option", "software"], # 2 moderators
"ctrl_employee": [0, 1], # 2 toggles
"ctrl_region": [0, 1], # 2 toggles
"ctrl_founder": [0, 1], # 2 toggles
"ctrl_earlyfund": [0, 1] # 2 toggles
}
# 3 Γ 3 Γ 2 Γ 2 Γ 2 Γ 2 Γ 2 Γ 2 = 384 specifications
```
## ποΈ Architecture
### Data Pipeline
1. **Window Filtering**: Filter by founding year using `patsy.dmatrix()` for TRUE nobs
2. **Moderator Creation**: `isSoftware = 1 - is_hardware`
3. **Scaling**: Z-score transformation of ALL continuous variables
- `vagueness` β `z_vagueness` (aliased as `V`)
- `employees_log` β `z_employees_log`
- `early_funding_musd` β `z_early_funding_musd`
### Model Fitting
**Stage E (Early - OLS):**
```
z_early_funding_musd ~ V + controls
```
**Stages L1/L2 (Later - Logit):**
```
growth ~ V * moderator + z_early_funding_musd + controls
```
**3-Stage Fallback for Logit:**
1. MLE (maximum likelihood)
2. L1 regularization (Ξ±=0.1)
3. L1 regularization (Ξ±=0.5)
### Evidence Metrics
For each coefficient:
```python
evidence_score = sign(coef) * -log10(p)
is_consistent = (sign == expected) & (p < 0.05)
is_surprise = (sign != expected) & (p < 0.05)
```
## π Expected Signs
| Effect | Stage E | Stage L1 | Stage L2 | Rationale |
|--------|---------|----------|----------|-----------|
| vag_main | **-1** | **+1** | **+1** | Clarity premium β Flexibility value |
| vagXoption | **+1** | **+1** | **+1** | Flexible architecture amplifies |
| vagXsoftware | **+1** | **+1** | **+1** | Software industry amplifies |
## π File Structure
```
empirics_ent_strat_ops/
βββ multiverse_engine.py # Core analysis engine
βββ run_multiverse.py # CLI orchestrator
βββ requirements.txt # Package dependencies
βββ README_MULTIVERSE.md # This file
results/ # Output directory
βββ multiverse_results.nc # xarray Dataset
βββ spec_table.csv # Full results table
βββ summary_stats.txt # Summary statistics
βββ multiverse_h1_heatmap.png # Early stage evidence
βββ spec_curve_h1.png # H1 specification curve
βββ multiverse_h2_option_heatmap.png # Option interaction
βββ multiverse_h2_software_heatmap.png # Software interaction
```
## π¬ Usage Examples
### Basic Analysis
```bash
python run_multiverse.py \
--input data/startups.csv \
--outdir results/
```
### Quiet Mode
```bash
python run_multiverse.py \
--input data/startups.csv \
--outdir results/ \
--quiet
```
### Load Results in Python
```python
import xarray as xr
import pandas as pd
# Load xarray Dataset
ds = xr.open_dataset('results/multiverse_results.nc')
# Slice by specification
early_zscore = ds.sel(stage='E', scaling='zscore')
print(early_zscore['evidence_score_vag_main'].values)
# Load as DataFrame
df = pd.read_csv('results/spec_table.csv')
consistent = df[df['is_consistent_vag_main'] == 1]
print(f"Consistent specifications: {len(consistent)}/{len(df)}")
```
## π¨ Visualization
The engine generates direction-aware heatmaps where:
- **Green** indicates effects in expected direction
- **Red** indicates effects opposite to expected
- **Color intensity** reflects evidence strength
## β
Validation Checklist
- [x] Window filtering uses `patsy.dmatrix()` for TRUE nobs
- [x] `STAGE_PATTERNS` defined as module constants
- [x] `isSoftware = 1 - is_hardware` created
- [x] ALL continuous vars z-scored (vagueness, employees_log, early_funding_musd)
- [x] Formula builder includes all ctrl_* toggles
- [x] 3-stage fallback: MLE β L1(0.1) β L1(0.5)
- [x] `estimation_method` recorded in results
- [x] Returns `nobs` from design matrix
- [x] H1 uses z_early_funding_musd as DV
- [x] H2 uses z_early_funding_musd when ctrl_earlyfund=1
- [x] Expected signs: E vag_main=-1, L1/L2 vag_main=+1
- [x] Evidence metrics computed correctly
- [x] xarray structure with all 18 data_vars
- [x] Outputs: .nc, .csv, .png files
- [x] Type hints on public functions
- [x] Numpy-style docstrings
- [x] Clean code (<500 lines)
- [x] Random seed for reproducibility
## π Input Data Format
Required columns in CSV:
```
vagueness # Core IV (0-100)
growth # Binary DV for L1/L2 (0/1)
early_funding_musd # Continuous DV for E (USD millions)
year # Founding year (YYYY)
option_exercisability_level # Moderator (1-5 scale)
is_hardware # Industry (0=software, 1=hardware)
employees_log # Log employees
founder_credibility # Founder score (0-10)
region # Geographic region
founding_cohort # Cohort identifier
down_round_flag # Down round indicator (0/1)
```
## π Troubleshooting
### No convergence warnings
The 3-stage fallback automatically handles convergence issues by progressively adding regularization.
### Missing data
Missing values are handled by `patsy.dmatrix()` which drops incomplete observations.
### Memory issues
For very large datasets, consider reducing the specification grid or processing in batches.
## π References
- Steegen et al. (2016). "Increasing Transparency Through a Multiverse Analysis"
- Simonsohn et al. (2020). "Specification curve analysis"
- xarray documentation: https://docs.xarray.dev/
## π Citation
If you use this engine in your research, please cite:
```bibtex
@software{multiverse_engine_2025,
title = {Multiverse Analysis Engine for Strategic Ambiguity Research},
year = {2025},
note = {xarray-based specification curve analysis}
}
```
---
εΏ
ζ»εη π’π
*"If you are desperate, you will live"*