# W2-실증: Dataset Complete
**발송일**: Tuesday, November 5, 2024 (저녁 6-8pm)
**Subject**: [Empirical] Week 2: Dataset Complete
---
Dear Charlie and Scott,
**Progress This Week:**
✅ **Dataset Construction Complete**
- Final sample: [N=75] firms × 2 observations (Series A + Series B) = 150 total observations
- Time-stamped promissory texts (company descriptions from 2021-22) merged with funding outcomes (through 2025)
- Panel structure validated: Each firm observed at both Series A and Series B stages
- Clean data ready for analysis
✅ **Descriptive Statistics Generated**
*Sample characteristics:*
- Vagueness distribution: Mean = 58.3 (SD = 18.7) on 0-100 scale
- Series A funding success rate: 73% (55/75 firms)
- Series B funding success rate: 44% (33/75 firms)
- Hardware vs. software split: 35% hardware/chip/robotics, 65% software/API
*Initial patterns observed:*
- Vagueness negatively correlated with Series A success (r = -0.23)
- But positively correlated with Series B success among Series A winners (r = +0.31)
- → Reversal pattern visible in raw data, consistent with hypothesis
✅ **Model Specification Finalized**
*Model 1 (Reversal hypothesis):*
```
logit(Funding_Success_it) = β₀ + β₁·Vagueness_i + β₂·SeriesB_t
+ β₃·(Vagueness × SeriesB)
+ β₄·log(TeamSize) + β₅·PriorExit
```
Expected: β₁ < 0 (vague firms struggle at A), β₃ > 0 (reversal at B)
*Model 2 (Integration cost moderator):*
```
Model 1 + β₇·(Vagueness × SeriesB × High_Integration_Cost)
```
Expected: β₇ > 0 (hardware firms benefit more from vagueness during shakeout)
⏳ **In Progress**
- Running logistic regressions in Stata
- Preliminary coefficient estimates obtained
- Checking for multicollinearity (VIF < 3 for all predictors)
- Assessing influential cases (Cook's D diagnostics)
---
**Next Week Target:**
Complete Model 1-2 estimation with significance tests, begin robustness checks.
Best,
Angie
---
## 작성 가이드 (당신이 채울 때)
**실제 숫자로 교체할 부분:**
1. **Sample size**: "[N=75]" → 실제 확보한 firm 수
2. **Descriptive stats**: 모든 평균/표준편차/비율을 실제 데이터로
3. **Correlation coefficients**: r = -0.23, +0.31 → 실제 계산값
4. **If pattern doesn't match hypothesis**: 정직하게 보고
```
*Initial patterns observed:*
- Vagueness shows [unexpected pattern]
- Will explore alternative specifications
```
**만약 dataset이 아직 완성 안 되었으면:**
```
⏳ **Dataset Construction 95% Complete**
- Sample: [N~70-80] firms identified, final cleaning in progress
- Missing data handling: [imputation strategy or exclusion criteria]
- Expected completion: This week
✅ **Descriptive Statistics Drafted**
- Preliminary stats based on [N=60] clean cases
- Will update with final sample next week
```
**핵심 원칙:**
- 절대 거짓말 하지 않기
- 진행 상황 정직하게 보고
- 문제 있으면 해결 방안과 함께 언급
- "⏳ In Progress"를 전략적으로 사용
**톤:**
- Factual, no drama
- 숫자로 말하기 (구체적일수록 신뢰)
- "Initial patterns observed" = 아직 해석 아님, 그냥 관찰