# Tech Spec β€” Placeholder Variables & Future Implementation **Date**: 2025-10-27 **Status**: Pipeline using placeholders for single-snapshot Company data --- ## 1. Problem Statement The current analysis pipeline depends on **proxy variables** because we lack multi-snapshot Deal data. Key variables like `survival`, `series_a_funding`, and `series_b_funding` are placeholders, limiting accuracy in Hypothesis 2 tests. --- ## 2. Proposed Solution Load and merge multiple **Deal snapshot files** (`20211201`, `20220101`, `20230501`) to create proper measures for: - **Survival** (18-month persistence) - **Series A/B funding** - **Down rounds** - Optional variables like **founder_credibility** Once data is in place, re-run analyses with true DVs and compare results to proxy-based outputs. --- ## 3. Implementation Plan ### Phase 1 β€” True Survival (High Priority) 1. Load Deal data from 3 snapshots. 2. Implement: ```python survival = 1 if (company in 20230501 snapshot) and (LastFinancingDate >= 2021-11-01) ``` 3. Merge with `Company20230501.dat` 4. Re-run H2 Main test (Survival ~ Vagueness Γ— Integration) ### Phase 2 β€” Enable Robustness Test (Medium Priority) 1. Extract Series A/B funding from Deal data (`VCRound == 'Series A/B'`) 2. Detect down rounds (compare PostValuations) 3. Run H2 Robustness: `series_b_funding ~ vagueness Γ— integration + controls` ### Phase 3 β€” Refine Variables (Low Priority) - Implement `founder_credibility` if founder data available - Improve `sector_fe` taxonomy - Optionally upgrade `vagueness` using LLM scoring --- ## 4. Variable Summary |Variable|Current|Needed|Impact| |---|---|---|---| |`survival`|Proxy (`later_success`)|True 18-month tracking|πŸ”΄ High| |`series_a_funding`|`FirstFinancingSize`|Extract by VCRound|🟑 Medium| |`series_b_funding`|Missing|Extract by VCRound|🟑 Medium| |`is_down_round`|0|Compare PostValuations|🟒 Low| |`founder_credibility`|0|Use founder metadata|🟒 Low| |`sector_fe`|Keyword-based|Refine taxonomy|🟒 Low| |`vagueness`|Keyword count|Add LLM scoring|🟒 Low| --- ## 5. Next Steps - βœ… Keep current pipeline running for main models - πŸ”΄ Add Deal data and implement true survival (Phase 1) - 🟑 Enable robustness analysis once Series A/B funding extracted - 🟒 Refine optional controls later --- **Summary:** Pipeline works for now but relies on proxies. Next step is to integrate Deal data and rebuild the survival and funding variables for accurate hypothesis testing. ---