# π Quick Start Guide - λ‘컬 μ€ν
## π λͺ©μ°¨
1. [νκ²½ μ€μ ](#1-νκ²½-μ€μ )
2. [λΉ λ₯Έ ν
μ€νΈ (5λΆ)](#2-λΉ λ₯Έ-ν
μ€νΈ-5λΆ)
3. [μ 체 νμ΄νλΌμΈ (30λΆ-2μκ°)](#3-μ 체-νμ΄νλΌμΈ-30λΆ-2μκ°)
4. [μμ±λλ μΆλ ₯λ¬Ό](#4-μμ±λλ-μΆλ ₯λ¬Ό)
5. [νΈλ¬λΈμν
](#5-νΈλ¬λΈμν
)
---
## 1. νκ²½ μ€μ
### 1.1 λΈλμΉ κ°μ Έμ€κΈ°
```bash
# νμ¬ master λΈλμΉμμ μμ
μ€μ΄λΌλ©΄
cd "/path/to/tolzul/Front/On/love(cs)/strategic ambiguity/empirics"
# μμ
λ΄μ© λ°±μ
(μ νμ¬ν)
cp -r . ~/backup_$(date +%Y%m%d_%H%M%S)
# μ΅μ λ³κ²½μ¬ν κ°μ Έμ€κΈ°
git fetch origin claude/moderator-bakeoff-analysis-011CUbKc3dAVgd5eU3SXn7k8
git merge origin/claude/moderator-bakeoff-analysis-011CUbKc3dAVgd5eU3SXn7k8
```
**λλ νΉμ νμΌλ§ κ°μ Έμ€κΈ°:**
```bash
git fetch origin claude/moderator-bakeoff-analysis-011CUbKc3dAVgd5eU3SXn7k8
git checkout origin/claude/moderator-bakeoff-analysis-011CUbKc3dAVgd5eU3SXn7k8 -- \
modules/models.py \
modules/plots.py \
modules/features.py \
run_analysis.py \
test_one_touch.py
```
### 1.2 Python νκ²½
**νμ ν¨ν€μ§ νμΈ:**
```bash
pip list | grep -E "pandas|numpy|matplotlib|statsmodels|scikit-learn|seaborn|scipy"
```
**μμΌλ©΄ μ€μΉ:**
```bash
pip install pandas numpy matplotlib statsmodels scikit-learn seaborn scipy
```
---
## 2. λΉ λ₯Έ ν
μ€νΈ (5λΆ) β‘ **μΆμ²!**
κΈ°μ‘΄ λ°μ΄ν°μ
(`outputs/h2_analysis_dataset.csv`)μ μ¬μ©νμ¬ λΉ λ₯΄κ² ν
μ€νΈ
### 2.1 ν
μ€νΈ μ€ν
```bash
cd "/path/to/tolzul/Front/On/love(cs)/strategic ambiguity/empirics"
python test_one_touch.py
```
### 2.2 μμ μΆλ ₯
```
================================================================================
ONE-TOUCH EXECUTION TEST
================================================================================
β Loading dataset: outputs/h2_analysis_dataset.csv
Rows: 5,000
H1: EARLY FUNDING ββββββββββββββββ
β H1 fitted: RΒ² = 0.002
H2: GROWTH Γ ARCHITECTURE βββββββ
β H2 fitted: Pseudo RΒ² = 0.019
H3: EARLY FUNDING Γ FOUNDER βββββ
β H3 fitted: RΒ² = 0.003
β Saved: outputs/h3_coefficients.csv
H4: GROWTH Γ FOUNDER βββββββββββββ
β H4 fitted: Pseudo RΒ² = 0.016
β Saved: outputs/h4_coefficients.csv
GENERATING FIGURES βββββββββββββββ
β Saved: outputs/figures/Figure_1_Reversal.png
β Saved: outputs/figures/Figure_2a_H3.png
β Saved: outputs/figures/Figure_2b_H4.png
TEST COMPLETE
```
### 2.3 μμ±λ νμΌ νμΈ
```bash
ls -lh outputs/h*.csv
ls -lh outputs/figures/*.png
```
**μμ νμΌ:**
- `outputs/h1_coefficients.csv`
- `outputs/h3_coefficients.csv` β NEW
- `outputs/h4_coefficients.csv` β NEW
- `outputs/figures/Figure_1_Reversal.png` β NEW
- `outputs/figures/Figure_2a_H3.png` β NEW
- `outputs/figures/Figure_2b_H4.png` β NEW
---
## 3. μ 체 νμ΄νλΌμΈ (30λΆ-2μκ°)
μ€μ PitchBook λ°μ΄ν°λΆν° μ²μλΆν° λκΉμ§ μ€ν
### 3.1 λ°μ΄ν° μ€λΉ
**λ°μ΄ν° μμΉ νμΈ:**
```bash
ls -lh data/raw/Company*.dat
```
**νμν νμΌ:**
- `data/raw/Company20211201.dat` (baseline, t0)
- `data/raw/Company20220101.dat` (mid1, tm1)
- `data/raw/Company20220501.dat` (mid2, tm2)
- `data/raw/Company20230501.dat` (endpoint, t1)
**λ°μ΄ν°κ° μμΌλ©΄:**
```bash
# ν©μ± λ°μ΄ν° μμ± (ν
μ€νΈμ©)
python generate_synthetic_data.py
```
### 3.2 μ 체 νμ΄νλΌμΈ μ€ν
```bash
python run_analysis.py
```
**μ€ν μκ°:**
- ν©μ± λ°μ΄ν° (5K rows): ~5λΆ
- μ€μ λ°μ΄ν° (50K+ rows): ~30λΆ-2μκ°
### 3.3 μμ μΆλ ₯
```
W1 HYPOTHESIS TESTING (CLEAN)
ββββββββββββββββββββββββββββββββ
Loading 4 snapshots...
β Baseline: 45,234 companies
β Mid1: 46,891 companies
β Mid2: 48,123 companies
β Endpoint: 51,456 companies
Feature engineering...
βΉοΈ Early funding filtered to Series A / Early Stage VC: 23,456 of 45,234
DV creation (Series B+ progression)...
π
Applying as-of date capping...
π― At-risk cohort: 23,456 companies
π Base rate: 13.8%
H1: EARLY FUNDING ββββββββββββββββ
β Saved: outputs/h1_coefficients.csv
H2: GROWTH Γ ARCHITECTURE βββββββ
β Saved: outputs/h2_main_coefficients.csv
H3: EARLY FUNDING Γ FOUNDER βββββ
β Saved: outputs/h3_coefficients.csv
H4: GROWTH Γ FOUNDER βββββββββββββ
β Saved: outputs/h4_coefficients.csv
BAKE-OFF: Architecture vs Credibility
β Saved: outputs/h2_model_architecture.csv
β Saved: outputs/h2_model_founder.csv
GENERATING FIGURES βββββββββββββββ
β Saved: outputs/figures/Figure_1_Reversal.png
β Saved: outputs/figures/Figure_2a_H3.png
β Saved: outputs/figures/Figure_2b_H4.png
ONE-TOUCH EXECUTION COMPLETE
```
---
## 4. μμ±λλ μΆλ ₯λ¬Ό
### 4.1 κ³μν (CSV)
```bash
outputs/
βββ h1_coefficients.csv # H1: Early Funding ~ Vagueness
βββ h2_main_coefficients.csv # H2: Growth ~ Vagueness Γ Architecture
βββ h3_coefficients.csv # H3: Early Funding ~ Vagueness Γ Founder
βββ h4_coefficients.csv # H4: Growth ~ Vagueness Γ Founder
βββ h2_model_architecture.csv # Bake-off: Architecture moderator
βββ h2_model_architecture_ame.csv
βββ h2_model_architecture_metrics.csv
βββ h2_model_founder.csv # Bake-off: Founder moderator
βββ h2_model_founder_ame.csv
βββ h2_model_founder_metrics.csv
```
### 4.2 μκ°ν (PNG)
```bash
outputs/figures/
βββ Figure_1_Reversal.png # H1 + H2 dual-axis plot
βββ Figure_2a_H3.png # Early Funding Γ Founder (scatter + OLS)
βββ Figure_2b_H4.png # Growth Γ Founder (scatter + logistic)
```
### 4.3 λ°μ΄ν°μ
```bash
outputs/
βββ h2_analysis_dataset.csv # λΆμμ© μ΅μ’
λ°μ΄ν°μ
```
---
## 5. νΈλ¬λΈμν
### λ¬Έμ 1: ModuleNotFoundError
**μ¦μ:**
```
ModuleNotFoundError: No module named 'pandas'
```
**ν΄κ²°:**
```bash
pip install pandas numpy matplotlib statsmodels scikit-learn seaborn scipy
```
### λ¬Έμ 2: λ°μ΄ν° νμΌ μμ
**μ¦μ:**
```
FileNotFoundError: data/raw/Company20211201.dat not found
```
**ν΄κ²° μ΅μ
:**
**A. ν©μ± λ°μ΄ν° μ¬μ© (λΉ λ¦):**
```bash
python generate_synthetic_data.py
python test_one_touch.py # λΉ λ₯Έ ν
μ€νΈ
```
**B. μ€μ λ°μ΄ν° κ²½λ‘ νμΈ:**
```bash
find . -name "Company*.dat" -type f
# νμΌμ data/raw/λ‘ μ΄λ
```
### λ¬Έμ 3: λ©λͺ¨λ¦¬ λΆμ‘± (λμ©λ λ°μ΄ν°)
**μ¦μ:**
```
MemoryError: Unable to allocate array
```
**ν΄κ²°:**
```bash
# μνλ§νμ¬ μ€ν
python run_analysis.py --sample 0.1 # 10% μν
```
**λλ run_analysis.py μμ :**
```python
# Line 58 κ·Όμ²
df = pd.read_csv(path, sep='|', encoding=encoding, low_memory=False, nrows=50000)
^^^^^^^^^^^^
```
### λ¬Έμ 4: κ·Έλ¦Όμ΄ μμ±λμ§ μμ
**μ¦μ:**
```
WARNING: Could not plot H1 predictions
```
**νμΈ:**
```bash
# νμν 컬λΌμ΄ μλμ§ νμΈ
python -c "
import pandas as pd
df = pd.read_csv('outputs/h2_analysis_dataset.csv')
print(df.columns.tolist())
"
```
**νμ 컬λΌ:**
- `z_vagueness`, `z_employees_log`, `founding_cohort`
- `early_funding_musd`, `growth`
- `founder_serial` (λλ `founder_credibility`)
### λ¬Έμ 5: Convergence μ€ν¨ (Logit)
**μ¦μ:**
```
PerfectSeparationError: Perfect separation detected
```
**ν΄κ²°:** μ½λκ° μλμΌλ‘ μ²λ¦¬ν©λλ€
```python
# models.py Line 85, 173
try:
model = smf.logit(formula, data=d).fit(disp=False)
except Exception:
model = smf.logit(formula, data=d).fit_regularized(method='l2', alpha=0.01)
```
---
## 6. μΆκ° λΆμ μ€ν¬λ¦½νΈ
### 6.1 Follow-up Period λΆμ
```bash
python test_followup_period.py
```
**μΆλ ₯:**
- Base rate λΆμ
- Right censoring μν₯
- Statistical power νκ°
### 6.2 Series A νν°λ§ κ²μ¦
```bash
python test_series_a_filter.py
```
**μΆλ ₯:**
- FirstFinancingDealType λΆν¬
- "Early Stage VC" λ§€μΉ μ
- νν°λ§ μ ν λΉκ΅
---
## 7. λΉ λ₯Έ 체ν¬λ¦¬μ€νΈ β
μ€ν μ :
- [ ] Python 3.7+ μ€μΉ
- [ ] νμ ν¨ν€μ§ μ€μΉ (pandas, statsmodels, etc.)
- [ ] λΈλμΉ μ
λ°μ΄νΈ λλ νμΌ λ³΅μ¬
- [ ] μμ
λλ ν 리 μ΄λ
λΉ λ₯Έ ν
μ€νΈ (μΆμ²):
- [ ] `python test_one_touch.py` μ€ν
- [ ] outputs/ ν΄λ νμΈ
- [ ] κ·Έλ¦Ό νμΌ μ΄μ΄λ³΄κΈ°
μ 체 νμ΄νλΌμΈ:
- [ ] λ°μ΄ν° νμΌ μ€λΉ (data/raw/)
- [ ] `python run_analysis.py` μ€ν
- [ ] μμ±λ CSV/PNG νμΈ
- [ ] κ³μνμ κ·Έλ¦Ό λΉκ΅ λΆμ
---
## 8. λ€μ λ¨κ³
### λΆμ κ²°κ³Ό νμΈ:
```bash
# κ³μν νμΈ
head outputs/h3_coefficients.csv
head outputs/h4_coefficients.csv
# μνΈμμ© p-value νμΈ
grep "vagueness.*founder_serial" outputs/h3_coefficients.csv
grep "vagueness.*founder_serial" outputs/h4_coefficients.csv
# κ·Έλ¦Ό μ΄κΈ° (macOS)
open outputs/figures/Figure_1_Reversal.png
open outputs/figures/Figure_2a_H3.png
open outputs/figures/Figure_2b_H4.png
# κ·Έλ¦Ό μ΄κΈ° (Linux)
xdg-open outputs/figures/Figure_1_Reversal.png
```
### Pythonμμ κ²°κ³Ό λ‘λ:
```python
import pandas as pd
import matplotlib.pyplot as plt
# κ³μν μ½κΈ°
h3 = pd.read_csv('outputs/h3_coefficients.csv')
h4 = pd.read_csv('outputs/h4_coefficients.csv')
# μνΈμμ© νλ§ νν°
h3_interaction = h3[h3['variable'].str.contains('vagueness.*founder_serial', regex=True)]
h4_interaction = h4[h4['variable'].str.contains('vagueness.*founder_serial', regex=True)]
print("H3 Interaction:")
print(h3_interaction[['variable', 'coefficient', 'p_value']])
print("\nH4 Interaction:")
print(h4_interaction[['variable', 'coefficient', 'p_value']])
```
---
## π λμλ§
**λ¬Έμ κ° κ³μλλ©΄:**
1. λ‘κ·Έ νμΌ νμΈ
2. Python λ²μ νμΈ: `python --version`
3. ν¨ν€μ§ λ²μ νμΈ: `pip list`
4. GitHub Issue μ΄κΈ° λλ λ¬Έμ
**μ±κ³΅μ μΌλ‘ μ€νλλ©΄:**
- μμ±λ κ·Έλ¦Όλ€μ λ
Όλ¬Έμ μ¬μ©
- κ³μνλ₯Ό λ°νμΌλ‘ ν΅κ³ κ²μ
- Robustness checks μ€ν (robustness_followup.md μ°Έμ‘°)