# ๋
ผ๋ฌธ-์ฝ๋ ์ฐ๋ ์ ๋ต
# Paper-Code Integration Strategy
## ๐ฏ ๋ชฉํ (Goal)
32๊ฐ ๋
ผ๋ฌธ ๋ชจ๋์ ์ฝ๋๋ฒ ์ด์ค์ ์์ ํ ์ฐ๋ํ์ฌ:
1. **์ฌํ์ฑ ๋ณด์ฅ**: ๋
ผ๋ฌธ์ ๋ชจ๋ ํ
์ด๋ธ/๊ทธ๋ฆผ์ ์ฝ๋๋ก ์ฌํ
2. **์๋ ๊ฒ์ฆ**: ์ฝ๋ ๋ณ๊ฒฝ ์ ๋
ผ๋ฌธ ๊ฒฐ๊ณผ๊ฐ ๊นจ์ง์ง ์๋์ง ์๋ ์ฒดํฌ
3. **๋ฌธ์ํ**: ๊ฐ ๋ชจ๋์ด ์ด๋ค ์ฝ๋์ ๋งคํ๋๋์ง ๋ช
ํํ ๊ธฐ๋ก
---
## ๐ 4-Phase ์ ๊ทผ๋ฒ
### **Phase 1: Results-First (๊ฐ์ฅ ์ค์) - 2์ฃผ**
๋
ผ๋ฌธ์ ํต์ฌ ๊ฒฐ๊ณผ๋ถํฐ ์ฝ๋์ ์ฐ๋ โ ๊ฐ์ฅ ๋น ๋ฅธ ROI
#### Week 1: Main Results (Module #23-25)
```bash
# 1. ๋
ผ๋ฌธ ํ
์ด๋ธ ์์ ์ ์
# test/integration/test_paper_results.py์ ์ค์ ๋
ผ๋ฌธ ๊ฐ ์
๋ ฅ
class PaperConstants:
TABLE1_VAGUENESS_COEF = -0.234 # โ ๋
ผ๋ฌธ Table 1์์ ๋ณต์ฌ
TABLE1_VAGUENESS_SE = 0.089
# ...
# 2. ์ค์ ๋ฐ์ดํฐ๋ก ํ
์คํธ ์คํ
pytest test/integration/test_paper_results.py::TestTable1_H1_EarlyFunding -v
# 3. ๋ถ์ผ์น ๋ฐ๊ฒฌ โ ์ฝ๋ ๋๋ ๋
ผ๋ฌธ ์์ ํ์
```
**์ฒดํฌ๋ฆฌ์คํธ:**
- [ ] Table 1 (H1) ๊ณ์ ยฑ1% ์ด๋ด ์ฌํ
- [ ] Table 2 (H2) ๊ณ์ ยฑ1% ์ด๋ด ์ฌํ
- [ ] ์ํธ์์ฉ ํญ (VรF) ์ ์์ฑ ํ์ธ
- [ ] ์ํ ํฌ๊ธฐ ์ผ์น ํ์ธ
#### Week 2: Figures (Module #23-25)
```bash
# plotting.py์ ๋
ผ๋ฌธ ๊ทธ๋ฆผ ์์ฑ ํจ์ ์ถ๊ฐ
def generate_figure2_evf(df, output_path='outputs/fig2_evf.pdf'):
"""Generate Figure 2: E-V-F relationship"""
# ... plotting code
return output_path
def generate_figure3_lvf(df, h2_result, output_path='outputs/fig3_lvf.pdf'):
"""Generate Figure 3: L-V-F interaction"""
# ... interaction plot
return output_path
# ํ
์คํธ์์ ์๋ ์์ฑ ๊ฒ์ฆ
pytest test/integration/test_paper_results.py::TestFigureReproduction -v
```
**์ฒดํฌ๋ฆฌ์คํธ:**
- [ ] Figure 2 (E-V-F) ์๋ ์์ฑ
- [ ] Figure 3 (L-V-F interaction) ์๋ ์์ฑ
- [ ] Figure 4 (S-T-V trajectory) ์๋ ์์ฑ
- [ ] PDF + PNG ํฌ๋งท ๋ชจ๋ ์ถ๋ ฅ
---
### **Phase 2: Methodology Validation (์ค๊ฐ) - 1์ฃผ**
ํต๊ณ ๋ฐฉ๋ฒ๋ก ์ด ๋
ผ๋ฌธ๊ณผ ์ผ์นํ๋์ง ๊ฒ์ฆ
#### Module #17-22: Measurements & Specifications
```python
# test/unit/test_measurements.py
def test_vagueness_measurement_procedure():
"""Verify vagueness scoring matches paper description (Module #17)"""
# ๋
ผ๋ฌธ์ ์์๋ก ๋ ํ์ฌ 3๊ฐ๋ก ํ
์คํธ
examples = [
("AI-powered medical imaging, 50 hospitals, FDA approved", 25.3),
("Next-gen innovation platform", 78.9),
("Hardware sensors for aerospace", 42.1),
]
scorer = StrategicVaguenessScorerV2()
for description, expected_score in examples:
actual = scorer.score(description)
assert abs(actual - expected_score) < 5.0 # ยฑ5 tolerance
def test_h1_specification_complete():
"""Verify H1 includes all controls mentioned in paper (Module #20)"""
df = load_test_data()
result = test_h1_early_funding(df)
# ๋
ผ๋ฌธ์ ๋ช
์๋ control variables ํ์ธ
required_controls = ['z_employees_log', 'founder_serial',
'is_hardware', 'z_firm_age',
'sector_fe', 'founding_cohort']
for control in required_controls:
assert control in str(result.model.formula)
```
**์ฒดํฌ๋ฆฌ์คํธ:**
- [ ] Vagueness ์ธก์ ๋ฐฉ์ ๋ฌธ์ํ ๋ฐ ๊ฒ์ฆ
- [ ] Flexibility (F) ๋ถ๋ฅ ๋ก์ง ๊ฒ์ฆ
- [ ] H1 specification ์์ ์ฑ ํ์ธ
- [ ] H2 specification ์์ ์ฑ ํ์ธ
- [ ] Control variables ์ผ์น ํ์ธ
---
### **Phase 3: Data Pipeline (๊ธฐ์ด) - 1์ฃผ**
๋ฐ์ดํฐ ์ค๋น ๊ณผ์ ์ด ๋
ผ๋ฌธ๊ณผ ์ผ์นํ๋์ง ๊ฒ์ฆ
#### Module #14-16: Data Overview
```python
# test/integration/test_sample_construction.py
def test_sample_size_matches_paper():
"""Module #15: Verify sample construction"""
# ๋
ผ๋ฌธ Table X์ ๋ช
์๋ ์ํ ํฌ๊ธฐ
PAPER_REPORTED_N = 450
PAPER_QUANTUM_N = 450
PAPER_TRANSPORTATION_N = 320
df = consolidate_company_snapshots('data/raw')
df = engineer_features(df)
# ํํฐ ์ ์ฉ ์
assert len(df) >= PAPER_REPORTED_N
# Quantum ์นํฐ๋ง
df_quantum = df[df.sector_fe == 'quantum']
assert abs(len(df_quantum) - PAPER_QUANTUM_N) < 10 # ยฑ10 tolerance
def test_descriptive_statistics_table():
"""Module #16: Generate Table X (Descriptive Statistics)"""
df = load_analysis_data()
# ๋
ผ๋ฌธ Table X์ ์์ฝํต๊ณ ์ฌํ
stats = df[['E', 'L', 'V', 'F', 'z_vagueness']].describe()
# ํ๊ท ๊ฐ ๋น๊ต (๋
ผ๋ฌธ ๊ฐ vs ์ฝ๋ ๊ฐ)
PAPER_MEAN_V = 45.2
assert abs(stats.loc['mean', 'V'] - PAPER_MEAN_V) < 1.0
```
**์ฒดํฌ๋ฆฌ์คํธ:**
- [ ] ์ํ ํฌ๊ธฐ ์ผ์น (ยฑ10 ์ด๋ด)
- [ ] ์์ฝํต๊ณ ์ผ์น (ํ๊ท ยฑ1%, ํ์คํธ์ฐจ ยฑ5%)
- [ ] ์นํฐ ๋ถํฌ ์ผ์น
- [ ] ์ฝํธํธ ๋ถํฌ ์ผ์น
---
### **Phase 4: Robustness & Extensions (์ฌํ) - 2์ฃผ**
๊ฐ๊ฑด์ฑ ๊ฒ์ฆ ๋ฐ ์ถ๊ฐ ๋ถ์
#### Week 3: Mechanisms (Module #26)
```python
# src/models.py์ ์ถ๊ฐ
def test_mechanism_pivot_frequency(df, formula="pivot_count ~ z_vagueness + controls"):
"""
Module #26: Test mechanism - pivot frequency
H_mechanism: Companies with higher vagueness pivot more frequently
"""
# Detect pivots from description changes over time
df['pivot_count'] = detect_pivot_events(df)
model = smf.ols(formula, data=df).fit()
return model
def test_mechanism_learning_speed(df, formula="time_to_productmarket ~ z_vagueness * F_flexibility + controls"):
"""
Module #26: Test mechanism - learning speed
H_mechanism: Vague+flexible companies learn faster
"""
model = smf.ols(formula, data=df).fit()
return model
```
#### Week 4: Robustness (Module #27)
```python
# test/integration/test_robustness.py
def test_specification_curve_h2():
"""Module #27: Run 100+ specifications for H2"""
from multiverse import run_specification_curve
# Define specification space
specs = {
'controls': [
['z_employees_log'],
['z_employees_log', 'founder_serial'],
['z_employees_log', 'founder_serial', 'z_firm_age'],
],
'fixed_effects': [
[],
['sector_fe'],
['sector_fe', 'founding_cohort'],
],
'sample': [
'all',
'quantum_only',
'post_2015',
],
}
results = run_specification_curve(df, specs)
# 80% ์ด์์ spec์์ ์ ์ํ ์์ ๊ณ์
significant_positive = sum(
(r.params['z_vagueness'] > 0) & (r.pvalues['z_vagueness'] < 0.05)
for r in results
)
assert significant_positive / len(results) > 0.80
```
**์ฒดํฌ๋ฆฌ์คํธ:**
- [ ] Pivot ๋ฉ์ปค๋์ฆ ๊ตฌํ ๋ฐ ํ
์คํธ
- [ ] Learning ๋ฉ์ปค๋์ฆ ๊ตฌํ ๋ฐ ํ
์คํธ
- [ ] Specification curve ์คํ (100+ specs)
- [ ] Alternative measurements ํ
์คํธ
- [ ] Subsample robustness ๊ฒ์ฆ
---
## ๐ ๏ธ ์ค์ฉ์ ์ํฌํ๋ก์ฐ
### **์ผ์ผ ๋ฃจํด (Daily Workflow)**
```bash
# 1. ๋
ผ๋ฌธ ์์
์ : ํ์ฌ ์ํ ํ์ธ
pytest test/integration/test_paper_results.py -v
# 2. ์ฝ๋ ์์ (์: models.py)
# ... edit code ...
# 3. ํ
์คํธ ์คํ: ๋
ผ๋ฌธ ๊ฒฐ๊ณผ ๊นจ์ก๋?
pytest test/integration/test_paper_results.py::TestTable1_H1_EarlyFunding -v
# 4. ์คํจํ๋ฉด โ ์ฝ๋ ์์ or ๋
ผ๋ฌธ ์
๋ฐ์ดํธ
# 5. ์ฑ๊ณตํ๋ฉด โ git commit
git add .
git commit -m "Update H1 specification - all paper tests pass"
```
### **๋
ผ๋ฌธ ์ ์ถ ์ ์ฒดํฌ๋ฆฌ์คํธ**
```bash
# 1. ๋ชจ๋ ํ
์ด๋ธ ์ฌํ
pytest test/integration/test_paper_results.py::TestTable1 -v
pytest test/integration/test_paper_results.py::TestTable2 -v
# 2. ๋ชจ๋ ๊ทธ๋ฆผ ์ฌ์์ฑ
python -m src.cli generate-all-figures --output outputs/
# 3. LaTeX ํ
์ด๋ธ ์๋ ์์ฑ
python scripts/generate_paper_tables.py
# 4. ์ต์ข
๊ฒ์ฆ
pytest test/integration/ -v --cov=src
```
---
## ๐ ์งํ ์ํฉ ์ถ์
### **Current Status (2024-01-20)**
| Phase | Module | Status | Priority |
|-------|--------|--------|----------|
| 1 | #23 (H1) | ๐ก ์ฝ๋ ์๋ฃ, ํ
์คํธ ํ
ํ๋ฆฟ ์์ | HIGH |
| 1 | #24 (H2) | ๐ก ์ฝ๋ ์๋ฃ, ํ
์คํธ ํ
ํ๋ฆฟ ์์ | HIGH |
| 1 | #25 (VรF) | ๐ก ์ฝ๋ ์๋ฃ, ํ
์คํธ ํ
ํ๋ฆฟ ์์ | HIGH |
| 2 | #17 (Measurements) | ๐ข ์ฝ๋ + ํ
์คํธ ์๋ฃ | MEDIUM |
| 2 | #20 (Specifications) | ๐ก ์ฝ๋ ์๋ฃ, ๊ฒ์ฆ ํ์ | MEDIUM |
| 3 | #15 (Sample) | ๐ก ์ฝ๋ ์๋ฃ, ๊ฒ์ฆ ํ์ | MEDIUM |
| 3 | #16 (Variables) | ๐ก ์ฝ๋ ์๋ฃ, ๊ฒ์ฆ ํ์ | MEDIUM |
| 4 | #26 (Mechanisms) | ๐ด ์ฝ๋ ํ์ | LOW |
| 4 | #27 (Robustness) | ๐ก ์ฝ๋ ์์, ํ
์คํธ ํ์ | LOW |
Legend:
- ๐ข = ์๋ฃ
- ๐ก = ์งํ ์ค
- ๐ด = ์์ ์
### **Next 3 Actions (์ฐ์ ์์)**
1. **๋
ผ๋ฌธ ๊ฐ ์
๋ ฅ** (30๋ถ):
- `test/integration/test_paper_results.py`์ `PaperConstants` ํด๋์ค์ ์ค์ ๋
ผ๋ฌธ ํ
์ด๋ธ ๊ฐ ๋ณต์ฌ
2. **Table 1 ์ฌํ ํ
์คํธ** (1์๊ฐ):
```bash
# ์ค์ ๋ฐ์ดํฐ๋ก H1 ์คํ
pytest test/integration/test_paper_results.py::TestTable1_H1_EarlyFunding -v
# ๋ถ์ผ์น ๋ฐ๊ฒฌ โ ์์ธ ํ์
# - ๋ฐ์ดํฐ ํํฐ๋ง ๋ฌธ์ ?
# - Control variables ๋๋ฝ?
# - ๋
ผ๋ฌธ ์คํ?
```
3. **Figure 2 ์์ฑ ์คํฌ๋ฆฝํธ** (2์๊ฐ):
```python
# src/plotting.py์ ์ถ๊ฐ
def generate_figure2_evf(df):
"""Generate Figure 2 for paper"""
# ... plotting code
```
---
## ๐ก Best Practices
### **1. ๋
ผ๋ฌธ ๊ฐ์ ๋ณ๋ ํ์ผ๋ก ๊ด๋ฆฌ**
```python
# test/fixtures/paper_values.py
class PaperTable1:
"""Values from Table 1 in published paper"""
VAGUENESS_COEF = -0.234
VAGUENESS_SE = 0.089
N_OBS = 450
class PaperTable2:
"""Values from Table 2 in published paper"""
VAGUENESS_COEF = 0.456
INTERACTION_COEF = -0.321
```
### **2. ํ
์คํธ๋ ๊ด๋ํ๊ฒ (Tolerance)**
```python
# Bad: ์์ ์ผ์น ์๊ตฌ (๋ถ๊ฐ๋ฅ)
assert result.params['z_vagueness'] == -0.234
# Good: ยฑ1% tolerance (ํ์ค์ )
assert abs(result.params['z_vagueness'] - (-0.234)) < 0.01
```
### **3. ์คํจ ์ ์ ์ฉํ ์๋ฌ ๋ฉ์์ง**
```python
# Bad
assert coef == paper_coef
# Good
assert abs(coef - paper_coef) < 0.01, \
f"Coefficient mismatch: code={coef:.3f}, paper={paper_coef:.3f}, " \
f"diff={coef-paper_coef:.3f} ({(coef-paper_coef)/paper_coef*100:.1f}%)"
```
### **4. ๋
ผ๋ฌธ Figure๋ ๋ณ๋ ๋๋ ํ ๋ฆฌ**
```
outputs/
โโโ paper_figures/ # ๋
ผ๋ฌธ์ ๋ค์ด๊ฐ ์ต์ข
๊ทธ๋ฆผ
โ โโโ fig2_evf.pdf
โ โโโ fig2_evf.png
โ โโโ fig3_lvf.pdf
โ โโโ fig4_stv.pdf
โโโ paper_tables/ # LaTeX ํ
์ด๋ธ
โ โโโ table1_h1.tex
โ โโโ table2_h2.tex
โ โโโ table_descriptive.tex
โโโ diagnostics/ # ์ง๋จ์ฉ ์์ ๊ทธ๋ฆผ
โโโ ...
```
---
## ๐ Quick Start (์ง๊ธ ๋ฐ๋ก ์์)
### **10๋ถ ์์ ์ฒซ ํ
์คํธ ์คํ**
```bash
# 1. ๋
ผ๋ฌธ Table 1์์ ๊ณ์ ํ๋๋ง ๋ณต์ฌ
# test/integration/test_paper_results.py ์ด๊ธฐ
# PaperConstants.TABLE1_VAGUENESS_COEF = -0.234 โ ์ค์ ๊ฐ ์
๋ ฅ
# 2. ๋จ์ผ ํ
์คํธ ์คํ
pytest test/integration/test_paper_results.py::TestTable1_H1_EarlyFunding::test_table1_vagueness_coefficient -v
# 3. ๊ฒฐ๊ณผ ํ์ธ
# PASSED โ ์ฝ๋ ์ ํ โ
# FAILED โ ๋ถ์ผ์น ์์ธ ํ์
ํ์
```
---
## ๐ References
- **Paper Mapping**: `docs/PAPER_CODE_MAPPING.md`
- **Test Code**: `test/integration/test_paper_results.py`
- **Hypothesis Tests**: `src/models.py`
- **CI/CD**: `.github/workflows/test.yml`
---
## โ FAQ
**Q: ๋
ผ๋ฌธ ๊ฐ์ด ์ ํํ ์ฌํ ์ ๋๋ฉด?**
A: 3๊ฐ์ง ๊ฐ๋ฅ์ฑ:
1. ๋ฐ์ดํฐ ํํฐ๋ง ์ฐจ์ด (๊ฐ์ฅ ํํจ)
2. Control variables ์ฐจ์ด
3. ๋
ผ๋ฌธ ์คํ (๋๋ฌผ์ง๋ง ์์)
โ ยฑ1-2% ์ด๋ด๋ฉด ๊ด์ฐฎ์. ๋ ํฌ๋ฉด ์์ธ ํ์
ํ์.
**Q: ๋ชจ๋ ๋ชจ๋์ ๋ค ์ฐ๋ํด์ผ ํ๋?**
A: No! **Results (Module #23-27)๋ง 100% ์ฐ๋**ํ๋ฉด ์ถฉ๋ถ. Introduction/Discussion์ ์ฝ๋ ์์ด๋ ๋จ.
**Q: ๊ทธ๋ฆผ์ ์๋์ผ๋ก ์
๋ฐ์ดํธ๋๋?**
A:
```bash
# ๊ทธ๋ฆผ ์๋ ์ฌ์์ฑ
python -m src.cli generate-all-figures
# Git hook์ผ๋ก ์๋ํ ๊ฐ๋ฅ
# .git/hooks/pre-commit์ ์ถ๊ฐ
```
**Q: ๋
ผ๋ฌธ ์์ ์ ๋งค๋ฒ ํ
์คํธ ๋๋ ค์ผ?**
A: Results ์น์
์์ ์๋ง ํ์. Introduction/Discussion ์์ ์ ํ
์คํธ ๋ถํ์.