# W2-실증: Dataset Complete **발송일**: Tuesday, November 5, 2024 (저녁 6-8pm) **Subject**: [Empirical] Week 2: Dataset Complete --- Dear Charlie and Scott, **Progress This Week:** ✅ **Dataset Construction Complete** - Final sample: [N=75] firms × 2 observations (Series A + Series B) = 150 total observations - Time-stamped promissory texts (company descriptions from 2021-22) merged with funding outcomes (through 2025) - Panel structure validated: Each firm observed at both Series A and Series B stages - Clean data ready for analysis ✅ **Descriptive Statistics Generated** *Sample characteristics:* - Vagueness distribution: Mean = 58.3 (SD = 18.7) on 0-100 scale - Series A funding success rate: 73% (55/75 firms) - Series B funding success rate: 44% (33/75 firms) - Hardware vs. software split: 35% hardware/chip/robotics, 65% software/API *Initial patterns observed:* - Vagueness negatively correlated with Series A success (r = -0.23) - But positively correlated with Series B success among Series A winners (r = +0.31) - → Reversal pattern visible in raw data, consistent with hypothesis ✅ **Model Specification Finalized** *Model 1 (Reversal hypothesis):* ``` logit(Funding_Success_it) = β₀ + β₁·Vagueness_i + β₂·SeriesB_t + β₃·(Vagueness × SeriesB) + β₄·log(TeamSize) + β₅·PriorExit ``` Expected: β₁ < 0 (vague firms struggle at A), β₃ > 0 (reversal at B) *Model 2 (Integration cost moderator):* ``` Model 1 + β₇·(Vagueness × SeriesB × High_Integration_Cost) ``` Expected: β₇ > 0 (hardware firms benefit more from vagueness during shakeout) ⏳ **In Progress** - Running logistic regressions in Stata - Preliminary coefficient estimates obtained - Checking for multicollinearity (VIF < 3 for all predictors) - Assessing influential cases (Cook's D diagnostics) --- **Next Week Target:** Complete Model 1-2 estimation with significance tests, begin robustness checks. Best, Angie --- ## 작성 가이드 (당신이 채울 때) **실제 숫자로 교체할 부분:** 1. **Sample size**: "[N=75]" → 실제 확보한 firm 수 2. **Descriptive stats**: 모든 평균/표준편차/비율을 실제 데이터로 3. **Correlation coefficients**: r = -0.23, +0.31 → 실제 계산값 4. **If pattern doesn't match hypothesis**: 정직하게 보고 ``` *Initial patterns observed:* - Vagueness shows [unexpected pattern] - Will explore alternative specifications ``` **만약 dataset이 아직 완성 안 되었으면:** ``` ⏳ **Dataset Construction 95% Complete** - Sample: [N~70-80] firms identified, final cleaning in progress - Missing data handling: [imputation strategy or exclusion criteria] - Expected completion: This week ✅ **Descriptive Statistics Drafted** - Preliminary stats based on [N=60] clean cases - Will update with final sample next week ``` **핵심 원칙:** - 절대 거짓말 하지 않기 - 진행 상황 정직하게 보고 - 문제 있으면 해결 방안과 함께 언급 - "⏳ In Progress"를 전략적으로 사용 **톤:** - Factual, no drama - 숫자로 말하기 (구체적일수록 신뢰) - "Initial patterns observed" = 아직 해석 아님, 그냥 관찰