# πŸš€ Quick Start Guide - 둜컬 μ‹€ν–‰ ## πŸ“‹ λͺ©μ°¨ 1. [ν™˜κ²½ μ„€μ •](#1-ν™˜κ²½-μ„€μ •) 2. [λΉ λ₯Έ ν…ŒμŠ€νŠΈ (5λΆ„)](#2-λΉ λ₯Έ-ν…ŒμŠ€νŠΈ-5λΆ„) 3. [전체 νŒŒμ΄ν”„λΌμΈ (30λΆ„-2μ‹œκ°„)](#3-전체-νŒŒμ΄ν”„λΌμΈ-30λΆ„-2μ‹œκ°„) 4. [μƒμ„±λ˜λŠ” 좜λ ₯λ¬Ό](#4-μƒμ„±λ˜λŠ”-좜λ ₯λ¬Ό) 5. [νŠΈλŸ¬λΈ”μŠˆνŒ…](#5-νŠΈλŸ¬λΈ”μŠˆνŒ…) --- ## 1. ν™˜κ²½ μ„€μ • ### 1.1 브랜치 κ°€μ Έμ˜€κΈ° ```bash # ν˜„μž¬ master λΈŒλžœμΉ˜μ—μ„œ μž‘μ—… 쀑이라면 cd "/path/to/tolzul/Front/On/love(cs)/strategic ambiguity/empirics" # μž‘μ—… λ‚΄μš© λ°±μ—… (선택사항) cp -r . ~/backup_$(date +%Y%m%d_%H%M%S) # μ΅œμ‹  변경사항 κ°€μ Έμ˜€κΈ° git fetch origin claude/moderator-bakeoff-analysis-011CUbKc3dAVgd5eU3SXn7k8 git merge origin/claude/moderator-bakeoff-analysis-011CUbKc3dAVgd5eU3SXn7k8 ``` **λ˜λŠ” νŠΉμ • 파일만 κ°€μ Έμ˜€κΈ°:** ```bash git fetch origin claude/moderator-bakeoff-analysis-011CUbKc3dAVgd5eU3SXn7k8 git checkout origin/claude/moderator-bakeoff-analysis-011CUbKc3dAVgd5eU3SXn7k8 -- \ modules/models.py \ modules/plots.py \ modules/features.py \ run_analysis.py \ test_one_touch.py ``` ### 1.2 Python ν™˜κ²½ **ν•„μˆ˜ νŒ¨ν‚€μ§€ 확인:** ```bash pip list | grep -E "pandas|numpy|matplotlib|statsmodels|scikit-learn|seaborn|scipy" ``` **μ—†μœΌλ©΄ μ„€μΉ˜:** ```bash pip install pandas numpy matplotlib statsmodels scikit-learn seaborn scipy ``` --- ## 2. λΉ λ₯Έ ν…ŒμŠ€νŠΈ (5λΆ„) ⚑ **μΆ”μ²œ!** κΈ°μ‘΄ 데이터셋(`outputs/h2_analysis_dataset.csv`)을 μ‚¬μš©ν•˜μ—¬ λΉ λ₯΄κ²Œ ν…ŒμŠ€νŠΈ ### 2.1 ν…ŒμŠ€νŠΈ μ‹€ν–‰ ```bash cd "/path/to/tolzul/Front/On/love(cs)/strategic ambiguity/empirics" python test_one_touch.py ``` ### 2.2 μ˜ˆμƒ 좜λ ₯ ``` ================================================================================ ONE-TOUCH EXECUTION TEST ================================================================================ βœ“ Loading dataset: outputs/h2_analysis_dataset.csv Rows: 5,000 H1: EARLY FUNDING ──────────────── βœ“ H1 fitted: RΒ² = 0.002 H2: GROWTH Γ— ARCHITECTURE ─────── βœ“ H2 fitted: Pseudo RΒ² = 0.019 H3: EARLY FUNDING Γ— FOUNDER ───── βœ“ H3 fitted: RΒ² = 0.003 βœ“ Saved: outputs/h3_coefficients.csv H4: GROWTH Γ— FOUNDER ───────────── βœ“ H4 fitted: Pseudo RΒ² = 0.016 βœ“ Saved: outputs/h4_coefficients.csv GENERATING FIGURES ─────────────── βœ“ Saved: outputs/figures/Figure_1_Reversal.png βœ“ Saved: outputs/figures/Figure_2a_H3.png βœ“ Saved: outputs/figures/Figure_2b_H4.png TEST COMPLETE ``` ### 2.3 μƒμ„±λœ 파일 확인 ```bash ls -lh outputs/h*.csv ls -lh outputs/figures/*.png ``` **μ˜ˆμƒ 파일:** - `outputs/h1_coefficients.csv` - `outputs/h3_coefficients.csv` ← NEW - `outputs/h4_coefficients.csv` ← NEW - `outputs/figures/Figure_1_Reversal.png` ← NEW - `outputs/figures/Figure_2a_H3.png` ← NEW - `outputs/figures/Figure_2b_H4.png` ← NEW --- ## 3. 전체 νŒŒμ΄ν”„λΌμΈ (30λΆ„-2μ‹œκ°„) μ‹€μ œ PitchBook 데이터뢀터 μ²˜μŒλΆ€ν„° λκΉŒμ§€ μ‹€ν–‰ ### 3.1 데이터 μ€€λΉ„ **데이터 μœ„μΉ˜ 확인:** ```bash ls -lh data/raw/Company*.dat ``` **ν•„μš”ν•œ 파일:** - `data/raw/Company20211201.dat` (baseline, t0) - `data/raw/Company20220101.dat` (mid1, tm1) - `data/raw/Company20220501.dat` (mid2, tm2) - `data/raw/Company20230501.dat` (endpoint, t1) **데이터가 μ—†μœΌλ©΄:** ```bash # ν•©μ„± 데이터 생성 (ν…ŒμŠ€νŠΈμš©) python generate_synthetic_data.py ``` ### 3.2 전체 νŒŒμ΄ν”„λΌμΈ μ‹€ν–‰ ```bash python run_analysis.py ``` **μ‹€ν–‰ μ‹œκ°„:** - ν•©μ„± 데이터 (5K rows): ~5λΆ„ - μ‹€μ œ 데이터 (50K+ rows): ~30λΆ„-2μ‹œκ°„ ### 3.3 μ˜ˆμƒ 좜λ ₯ ``` W1 HYPOTHESIS TESTING (CLEAN) ════════════════════════════════ Loading 4 snapshots... βœ“ Baseline: 45,234 companies βœ“ Mid1: 46,891 companies βœ“ Mid2: 48,123 companies βœ“ Endpoint: 51,456 companies Feature engineering... ℹ️ Early funding filtered to Series A / Early Stage VC: 23,456 of 45,234 DV creation (Series B+ progression)... πŸ“… Applying as-of date capping... 🎯 At-risk cohort: 23,456 companies πŸ“Š Base rate: 13.8% H1: EARLY FUNDING ──────────────── βœ“ Saved: outputs/h1_coefficients.csv H2: GROWTH Γ— ARCHITECTURE ─────── βœ“ Saved: outputs/h2_main_coefficients.csv H3: EARLY FUNDING Γ— FOUNDER ───── βœ“ Saved: outputs/h3_coefficients.csv H4: GROWTH Γ— FOUNDER ───────────── βœ“ Saved: outputs/h4_coefficients.csv BAKE-OFF: Architecture vs Credibility βœ“ Saved: outputs/h2_model_architecture.csv βœ“ Saved: outputs/h2_model_founder.csv GENERATING FIGURES ─────────────── βœ“ Saved: outputs/figures/Figure_1_Reversal.png βœ“ Saved: outputs/figures/Figure_2a_H3.png βœ“ Saved: outputs/figures/Figure_2b_H4.png ONE-TOUCH EXECUTION COMPLETE ``` --- ## 4. μƒμ„±λ˜λŠ” 좜λ ₯λ¬Ό ### 4.1 κ³„μˆ˜ν‘œ (CSV) ```bash outputs/ β”œβ”€β”€ h1_coefficients.csv # H1: Early Funding ~ Vagueness β”œβ”€β”€ h2_main_coefficients.csv # H2: Growth ~ Vagueness Γ— Architecture β”œβ”€β”€ h3_coefficients.csv # H3: Early Funding ~ Vagueness Γ— Founder β”œβ”€β”€ h4_coefficients.csv # H4: Growth ~ Vagueness Γ— Founder β”œβ”€β”€ h2_model_architecture.csv # Bake-off: Architecture moderator β”œβ”€β”€ h2_model_architecture_ame.csv β”œβ”€β”€ h2_model_architecture_metrics.csv β”œβ”€β”€ h2_model_founder.csv # Bake-off: Founder moderator β”œβ”€β”€ h2_model_founder_ame.csv └── h2_model_founder_metrics.csv ``` ### 4.2 μ‹œκ°ν™” (PNG) ```bash outputs/figures/ β”œβ”€β”€ Figure_1_Reversal.png # H1 + H2 dual-axis plot β”œβ”€β”€ Figure_2a_H3.png # Early Funding Γ— Founder (scatter + OLS) └── Figure_2b_H4.png # Growth Γ— Founder (scatter + logistic) ``` ### 4.3 데이터셋 ```bash outputs/ └── h2_analysis_dataset.csv # λΆ„μ„μš© μ΅œμ’… 데이터셋 ``` --- ## 5. νŠΈλŸ¬λΈ”μŠˆνŒ… ### 문제 1: ModuleNotFoundError **증상:** ``` ModuleNotFoundError: No module named 'pandas' ``` **ν•΄κ²°:** ```bash pip install pandas numpy matplotlib statsmodels scikit-learn seaborn scipy ``` ### 문제 2: 데이터 파일 μ—†μŒ **증상:** ``` FileNotFoundError: data/raw/Company20211201.dat not found ``` **ν•΄κ²° μ˜΅μ…˜:** **A. ν•©μ„± 데이터 μ‚¬μš© (빠름):** ```bash python generate_synthetic_data.py python test_one_touch.py # λΉ λ₯Έ ν…ŒμŠ€νŠΈ ``` **B. μ‹€μ œ 데이터 경둜 확인:** ```bash find . -name "Company*.dat" -type f # νŒŒμΌμ„ data/raw/둜 이동 ``` ### 문제 3: λ©”λͺ¨λ¦¬ λΆ€μ‘± (λŒ€μš©λŸ‰ 데이터) **증상:** ``` MemoryError: Unable to allocate array ``` **ν•΄κ²°:** ```bash # μƒ˜ν”Œλ§ν•˜μ—¬ μ‹€ν–‰ python run_analysis.py --sample 0.1 # 10% μƒ˜ν”Œ ``` **λ˜λŠ” run_analysis.py μˆ˜μ •:** ```python # Line 58 근처 df = pd.read_csv(path, sep='|', encoding=encoding, low_memory=False, nrows=50000) ^^^^^^^^^^^^ ``` ### 문제 4: 그림이 μƒμ„±λ˜μ§€ μ•ŠμŒ **증상:** ``` WARNING: Could not plot H1 predictions ``` **확인:** ```bash # ν•„μš”ν•œ 컬럼이 μžˆλŠ”μ§€ 확인 python -c " import pandas as pd df = pd.read_csv('outputs/h2_analysis_dataset.csv') print(df.columns.tolist()) " ``` **ν•„μˆ˜ 컬럼:** - `z_vagueness`, `z_employees_log`, `founding_cohort` - `early_funding_musd`, `growth` - `founder_serial` (λ˜λŠ” `founder_credibility`) ### 문제 5: Convergence μ‹€νŒ¨ (Logit) **증상:** ``` PerfectSeparationError: Perfect separation detected ``` **ν•΄κ²°:** μ½”λ“œκ°€ μžλ™μœΌλ‘œ μ²˜λ¦¬ν•©λ‹ˆλ‹€ ```python # models.py Line 85, 173 try: model = smf.logit(formula, data=d).fit(disp=False) except Exception: model = smf.logit(formula, data=d).fit_regularized(method='l2', alpha=0.01) ``` --- ## 6. μΆ”κ°€ 뢄석 슀크립트 ### 6.1 Follow-up Period 뢄석 ```bash python test_followup_period.py ``` **좜λ ₯:** - Base rate 뢄석 - Right censoring 영ν–₯ - Statistical power 평가 ### 6.2 Series A 필터링 검증 ```bash python test_series_a_filter.py ``` **좜λ ₯:** - FirstFinancingDealType 뢄포 - "Early Stage VC" λ§€μΉ­ 수 - 필터링 μ „ν›„ 비ꡐ --- ## 7. λΉ λ₯Έ 체크리슀트 βœ… μ‹€ν–‰ μ „: - [ ] Python 3.7+ μ„€μΉ˜ - [ ] ν•„μˆ˜ νŒ¨ν‚€μ§€ μ„€μΉ˜ (pandas, statsmodels, etc.) - [ ] 브랜치 μ—…λ°μ΄νŠΈ λ˜λŠ” 파일 볡사 - [ ] μž‘μ—… 디렉토리 이동 λΉ λ₯Έ ν…ŒμŠ€νŠΈ (μΆ”μ²œ): - [ ] `python test_one_touch.py` μ‹€ν–‰ - [ ] outputs/ 폴더 확인 - [ ] κ·Έλ¦Ό 파일 열어보기 전체 νŒŒμ΄ν”„λΌμΈ: - [ ] 데이터 파일 μ€€λΉ„ (data/raw/) - [ ] `python run_analysis.py` μ‹€ν–‰ - [ ] μƒμ„±λœ CSV/PNG 확인 - [ ] κ³„μˆ˜ν‘œμ™€ κ·Έλ¦Ό 비ꡐ 뢄석 --- ## 8. λ‹€μŒ 단계 ### 뢄석 κ²°κ³Ό 확인: ```bash # κ³„μˆ˜ν‘œ 확인 head outputs/h3_coefficients.csv head outputs/h4_coefficients.csv # μƒν˜Έμž‘μš© p-value 확인 grep "vagueness.*founder_serial" outputs/h3_coefficients.csv grep "vagueness.*founder_serial" outputs/h4_coefficients.csv # κ·Έλ¦Ό μ—΄κΈ° (macOS) open outputs/figures/Figure_1_Reversal.png open outputs/figures/Figure_2a_H3.png open outputs/figures/Figure_2b_H4.png # κ·Έλ¦Ό μ—΄κΈ° (Linux) xdg-open outputs/figures/Figure_1_Reversal.png ``` ### Pythonμ—μ„œ κ²°κ³Ό λ‘œλ“œ: ```python import pandas as pd import matplotlib.pyplot as plt # κ³„μˆ˜ν‘œ 읽기 h3 = pd.read_csv('outputs/h3_coefficients.csv') h4 = pd.read_csv('outputs/h4_coefficients.csv') # μƒν˜Έμž‘μš© ν•­λ§Œ ν•„ν„° h3_interaction = h3[h3['variable'].str.contains('vagueness.*founder_serial', regex=True)] h4_interaction = h4[h4['variable'].str.contains('vagueness.*founder_serial', regex=True)] print("H3 Interaction:") print(h3_interaction[['variable', 'coefficient', 'p_value']]) print("\nH4 Interaction:") print(h4_interaction[['variable', 'coefficient', 'p_value']]) ``` --- ## πŸ“ž 도움말 **λ¬Έμ œκ°€ κ³„μ†λ˜λ©΄:** 1. 둜그 파일 확인 2. Python 버전 확인: `python --version` 3. νŒ¨ν‚€μ§€ 버전 확인: `pip list` 4. GitHub Issue μ—΄κΈ° λ˜λŠ” 문의 **μ„±κ³΅μ μœΌλ‘œ μ‹€ν–‰λ˜λ©΄:** - μƒμ„±λœ 그림듀을 논문에 μ‚¬μš© - κ³„μˆ˜ν‘œλ₯Ό λ°”νƒ•μœΌλ‘œ 톡계 κ²€μ • - Robustness checks μ‹€ν–‰ (robustness_followup.md μ°Έμ‘°)