# Tech Spec β Placeholder Variables & Future Implementation
**Date**: 2025-10-27
**Status**: Pipeline using placeholders for single-snapshot Company data
---
## 1. Problem Statement
The current analysis pipeline depends on **proxy variables** because we lack multi-snapshot Deal data.
Key variables like `survival`, `series_a_funding`, and `series_b_funding` are placeholders, limiting accuracy in Hypothesis 2 tests.
---
## 2. Proposed Solution
Load and merge multiple **Deal snapshot files** (`20211201`, `20220101`, `20230501`) to create proper measures for:
- **Survival** (18-month persistence)
- **Series A/B funding**
- **Down rounds**
- Optional variables like **founder_credibility**
Once data is in place, re-run analyses with true DVs and compare results to proxy-based outputs.
---
## 3. Implementation Plan
### Phase 1 β True Survival (High Priority)
1. Load Deal data from 3 snapshots.
2. Implement:
```python
survival = 1 if (company in 20230501 snapshot)
and (LastFinancingDate >= 2021-11-01)
```
3. Merge with `Company20230501.dat`
4. Re-run H2 Main test (Survival ~ Vagueness Γ Integration)
### Phase 2 β Enable Robustness Test (Medium Priority)
1. Extract Series A/B funding from Deal data (`VCRound == 'Series A/B'`)
2. Detect down rounds (compare PostValuations)
3. Run H2 Robustness: `series_b_funding ~ vagueness Γ integration + controls`
### Phase 3 β Refine Variables (Low Priority)
- Implement `founder_credibility` if founder data available
- Improve `sector_fe` taxonomy
- Optionally upgrade `vagueness` using LLM scoring
---
## 4. Variable Summary
|Variable|Current|Needed|Impact|
|---|---|---|---|
|`survival`|Proxy (`later_success`)|True 18-month tracking|π΄ High|
|`series_a_funding`|`FirstFinancingSize`|Extract by VCRound|π‘ Medium|
|`series_b_funding`|Missing|Extract by VCRound|π‘ Medium|
|`is_down_round`|0|Compare PostValuations|π’ Low|
|`founder_credibility`|0|Use founder metadata|π’ Low|
|`sector_fe`|Keyword-based|Refine taxonomy|π’ Low|
|`vagueness`|Keyword count|Add LLM scoring|π’ Low|
---
## 5. Next Steps
- β
Keep current pipeline running for main models
- π΄ Add Deal data and implement true survival (Phase 1)
- π‘ Enable robustness analysis once Series A/B funding extracted
- π’ Refine optional controls later
---
**Summary:**
Pipeline works for now but relies on proxies.
Next step is to integrate Deal data and rebuild the survival and funding variables for accurate hypothesis testing.
---