# Experiments Directory
Tools for testing and improving the vagueness scoring pipeline.
## π¨ Problem Identified
The current pipeline uses **V1 vagueness scorer** which produces:
```
25%: 50.0000
50%: 50.0000
75%: 50.0000 β 75% of companies have EXACTLY this score!
```
This **severely limits statistical power** for H1/H2 models.
## π¬ Experiment Scripts
### 1. `test_vagueness_variants.py`
Tests 6 different scoring approaches to find which has best variance:
**Variants tested:**
1. **V1 (Current)** - Old 3-component scorer β Too concentrated at 50.0
2. **V2 Standard** - 2-component with IDF weighting β Better variance
3. **V2 No IDF** - Same but without IDF β Simpler, still better than V1
4. **V2 MinMax** - Uses min-max normalization β Full 0-100 range
5. **Concrete Features** - Counts numbers/dates/metrics β Good spread
6. **Hybrid** - 50% V2 + 50% Concrete β Best of both worlds
**Metrics evaluated:**
- **IQR** (Interquartile Range) - Higher = more spread
- **CV** (Coefficient of Variation) - Higher = more relative variance
- **Spike @50%** - Lower = less concentration at midpoint
**Usage:**
```bash
python3 experiments/test_vagueness_variants.py
```
**Outputs:**
- `experiments/outputs/vagueness_distributions_comparison.png` - Visual comparison
- `experiments/outputs/vagueness_variants_sample.csv` - Score samples for each variant
- `experiments/outputs/vagueness_variants_stats.csv` - Summary statistics
### 2. `switch_to_v2_scorer.py`
Automatically updates `src/features.py` to use V2 scorer instead of V1.
**What it does:**
1. Adds `from vagueness_v2 import StrategicVaguenessScorerV2` import
2. Replaces `_SCORER = StrategicVaguenessScorer()` with V2 instance
3. Updates `compute_vagueness_vectorized()` to use V2 API
4. Creates backup of original file
**Usage:**
```bash
python3 experiments/switch_to_v2_scorer.py
```
**After switching:**
```bash
# Re-run feature engineering with V2
rm data/processed/features_all.parquet
python3 pipeline/02_engineer_features.py
# Continue pipeline
python3 pipeline/03_filter_datasets.py
python3 pipeline/04_run_models.py all
```
**To restore V1:**
```bash
mv src/features.py.backup src/features.py
```
## π Expected Results
**V1 (Current):**
```
Mean: 48.72, Std: 6.12, IQR: 0.00, Spike@50: 75%
```
**V2 Standard (Expected):**
```
Mean: 45-55, Std: 15-25, IQR: 20-35, Spike@50: <10%
```
**Why This Matters:**
With V1's concentration at 50.0:
- β Most companies look identical (no differentiation)
- β Weak correlations with outcomes (low statistical power)
- β Non-significant p-values even if real effect exists
With better variance:
- β
Clear differentiation between vague vs. concrete companies
- β
Stronger correlations detectable
- β
Significant effects can emerge
## π― Recommended Action
1. **Test variants first:**
```bash
python3 experiments/test_vagueness_variants.py
```
2. **Review the comparison table** - Pick variant with:
- Highest IQR (spread)
- Highest CV (relative variance)
- Lowest Spike@50 (less concentration)
3. **If V2 is best, switch to it:**
```bash
python3 experiments/switch_to_v2_scorer.py
```
4. **Re-run pipeline:**
```bash
./run_all.sh
```
5. **Compare H1/H2 results** - Check if:
- p-values improve
- Effect sizes become clearer
- Model fit improves (RΒ², Pseudo-RΒ²)
## π Root Cause
The issue is in `src/features.py` line 1034:
```python
_SCORER = StrategicVaguenessScorer() # β OLD V1 SCORER
```
Should be:
```python
_SCORER_V2 = StrategicVaguenessScorerV2(use_idf=True, random_state=42) # β NEW V2
```
The V2 implementation exists in `src/vagueness_v2.py` but was never integrated into the pipeline!
## π Next Steps
After switching to best variant:
1. **Re-run full pipeline** for all 3 datasets
2. **Check vagueness distribution** in Step 2 logs:
- Should see IQR > 20 (not 0)
- Should see Std > 15 (not 6)
- Should see <10% at median (not 75%)
3. **Compare H1/H2 results**:
- Check if p-values drop below 0.05
- Check if coefficients are larger
- Check if interaction effects are clearer
4. **Document which variant was chosen** and why