# ๋…ผ๋ฌธ-์ฝ”๋“œ ์—ฐ๋™ ์ „๋žต # Paper-Code Integration Strategy ## ๐ŸŽฏ ๋ชฉํ‘œ (Goal) 32๊ฐœ ๋…ผ๋ฌธ ๋ชจ๋“ˆ์„ ์ฝ”๋“œ๋ฒ ์ด์Šค์™€ ์™„์ „ํžˆ ์—ฐ๋™ํ•˜์—ฌ: 1. **์žฌํ˜„์„ฑ ๋ณด์žฅ**: ๋…ผ๋ฌธ์˜ ๋ชจ๋“  ํ…Œ์ด๋ธ”/๊ทธ๋ฆผ์„ ์ฝ”๋“œ๋กœ ์žฌํ˜„ 2. **์ž๋™ ๊ฒ€์ฆ**: ์ฝ”๋“œ ๋ณ€๊ฒฝ ์‹œ ๋…ผ๋ฌธ ๊ฒฐ๊ณผ๊ฐ€ ๊นจ์ง€์ง€ ์•Š๋Š”์ง€ ์ž๋™ ์ฒดํฌ 3. **๋ฌธ์„œํ™”**: ๊ฐ ๋ชจ๋“ˆ์ด ์–ด๋–ค ์ฝ”๋“œ์— ๋งคํ•‘๋˜๋Š”์ง€ ๋ช…ํ™•ํžˆ ๊ธฐ๋ก --- ## ๐Ÿ“‹ 4-Phase ์ ‘๊ทผ๋ฒ• ### **Phase 1: Results-First (๊ฐ€์žฅ ์ค‘์š”) - 2์ฃผ** ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ๊ฒฐ๊ณผ๋ถ€ํ„ฐ ์ฝ”๋“œ์™€ ์—ฐ๋™ โ†’ ๊ฐ€์žฅ ๋น ๋ฅธ ROI #### Week 1: Main Results (Module #23-25) ```bash # 1. ๋…ผ๋ฌธ ํ…Œ์ด๋ธ” ์ƒ์ˆ˜ ์ •์˜ # test/integration/test_paper_results.py์— ์‹ค์ œ ๋…ผ๋ฌธ ๊ฐ’ ์ž…๋ ฅ class PaperConstants: TABLE1_VAGUENESS_COEF = -0.234 # โ† ๋…ผ๋ฌธ Table 1์—์„œ ๋ณต์‚ฌ TABLE1_VAGUENESS_SE = 0.089 # ... # 2. ์‹ค์ œ ๋ฐ์ดํ„ฐ๋กœ ํ…Œ์ŠคํŠธ ์‹คํ–‰ pytest test/integration/test_paper_results.py::TestTable1_H1_EarlyFunding -v # 3. ๋ถˆ์ผ์น˜ ๋ฐœ๊ฒฌ โ†’ ์ฝ”๋“œ ๋˜๋Š” ๋…ผ๋ฌธ ์ˆ˜์ • ํ•„์š” ``` **์ฒดํฌ๋ฆฌ์ŠคํŠธ:** - [ ] Table 1 (H1) ๊ณ„์ˆ˜ ยฑ1% ์ด๋‚ด ์žฌํ˜„ - [ ] Table 2 (H2) ๊ณ„์ˆ˜ ยฑ1% ์ด๋‚ด ์žฌํ˜„ - [ ] ์ƒํ˜ธ์ž‘์šฉ ํ•ญ (Vร—F) ์œ ์˜์„ฑ ํ™•์ธ - [ ] ์ƒ˜ํ”Œ ํฌ๊ธฐ ์ผ์น˜ ํ™•์ธ #### Week 2: Figures (Module #23-25) ```bash # plotting.py์— ๋…ผ๋ฌธ ๊ทธ๋ฆผ ์ƒ์„ฑ ํ•จ์ˆ˜ ์ถ”๊ฐ€ def generate_figure2_evf(df, output_path='outputs/fig2_evf.pdf'): """Generate Figure 2: E-V-F relationship""" # ... plotting code return output_path def generate_figure3_lvf(df, h2_result, output_path='outputs/fig3_lvf.pdf'): """Generate Figure 3: L-V-F interaction""" # ... interaction plot return output_path # ํ…Œ์ŠคํŠธ์—์„œ ์ž๋™ ์ƒ์„ฑ ๊ฒ€์ฆ pytest test/integration/test_paper_results.py::TestFigureReproduction -v ``` **์ฒดํฌ๋ฆฌ์ŠคํŠธ:** - [ ] Figure 2 (E-V-F) ์ž๋™ ์ƒ์„ฑ - [ ] Figure 3 (L-V-F interaction) ์ž๋™ ์ƒ์„ฑ - [ ] Figure 4 (S-T-V trajectory) ์ž๋™ ์ƒ์„ฑ - [ ] PDF + PNG ํฌ๋งท ๋ชจ๋‘ ์ถœ๋ ฅ --- ### **Phase 2: Methodology Validation (์ค‘๊ฐ„) - 1์ฃผ** ํ†ต๊ณ„ ๋ฐฉ๋ฒ•๋ก ์ด ๋…ผ๋ฌธ๊ณผ ์ผ์น˜ํ•˜๋Š”์ง€ ๊ฒ€์ฆ #### Module #17-22: Measurements & Specifications ```python # test/unit/test_measurements.py def test_vagueness_measurement_procedure(): """Verify vagueness scoring matches paper description (Module #17)""" # ๋…ผ๋ฌธ์— ์˜ˆ์‹œ๋กœ ๋“  ํšŒ์‚ฌ 3๊ฐœ๋กœ ํ…Œ์ŠคํŠธ examples = [ ("AI-powered medical imaging, 50 hospitals, FDA approved", 25.3), ("Next-gen innovation platform", 78.9), ("Hardware sensors for aerospace", 42.1), ] scorer = StrategicVaguenessScorerV2() for description, expected_score in examples: actual = scorer.score(description) assert abs(actual - expected_score) < 5.0 # ยฑ5 tolerance def test_h1_specification_complete(): """Verify H1 includes all controls mentioned in paper (Module #20)""" df = load_test_data() result = test_h1_early_funding(df) # ๋…ผ๋ฌธ์— ๋ช…์‹œ๋œ control variables ํ™•์ธ required_controls = ['z_employees_log', 'founder_serial', 'is_hardware', 'z_firm_age', 'sector_fe', 'founding_cohort'] for control in required_controls: assert control in str(result.model.formula) ``` **์ฒดํฌ๋ฆฌ์ŠคํŠธ:** - [ ] Vagueness ์ธก์ • ๋ฐฉ์‹ ๋ฌธ์„œํ™” ๋ฐ ๊ฒ€์ฆ - [ ] Flexibility (F) ๋ถ„๋ฅ˜ ๋กœ์ง ๊ฒ€์ฆ - [ ] H1 specification ์™„์ „์„ฑ ํ™•์ธ - [ ] H2 specification ์™„์ „์„ฑ ํ™•์ธ - [ ] Control variables ์ผ์น˜ ํ™•์ธ --- ### **Phase 3: Data Pipeline (๊ธฐ์ดˆ) - 1์ฃผ** ๋ฐ์ดํ„ฐ ์ค€๋น„ ๊ณผ์ •์ด ๋…ผ๋ฌธ๊ณผ ์ผ์น˜ํ•˜๋Š”์ง€ ๊ฒ€์ฆ #### Module #14-16: Data Overview ```python # test/integration/test_sample_construction.py def test_sample_size_matches_paper(): """Module #15: Verify sample construction""" # ๋…ผ๋ฌธ Table X์— ๋ช…์‹œ๋œ ์ƒ˜ํ”Œ ํฌ๊ธฐ PAPER_REPORTED_N = 450 PAPER_QUANTUM_N = 450 PAPER_TRANSPORTATION_N = 320 df = consolidate_company_snapshots('data/raw') df = engineer_features(df) # ํ•„ํ„ฐ ์ ์šฉ ์ „ assert len(df) >= PAPER_REPORTED_N # Quantum ์„นํ„ฐ๋งŒ df_quantum = df[df.sector_fe == 'quantum'] assert abs(len(df_quantum) - PAPER_QUANTUM_N) < 10 # ยฑ10 tolerance def test_descriptive_statistics_table(): """Module #16: Generate Table X (Descriptive Statistics)""" df = load_analysis_data() # ๋…ผ๋ฌธ Table X์˜ ์š”์•ฝํ†ต๊ณ„ ์žฌํ˜„ stats = df[['E', 'L', 'V', 'F', 'z_vagueness']].describe() # ํ‰๊ท ๊ฐ’ ๋น„๊ต (๋…ผ๋ฌธ ๊ฐ’ vs ์ฝ”๋“œ ๊ฐ’) PAPER_MEAN_V = 45.2 assert abs(stats.loc['mean', 'V'] - PAPER_MEAN_V) < 1.0 ``` **์ฒดํฌ๋ฆฌ์ŠคํŠธ:** - [ ] ์ƒ˜ํ”Œ ํฌ๊ธฐ ์ผ์น˜ (ยฑ10 ์ด๋‚ด) - [ ] ์š”์•ฝํ†ต๊ณ„ ์ผ์น˜ (ํ‰๊ท  ยฑ1%, ํ‘œ์ค€ํŽธ์ฐจ ยฑ5%) - [ ] ์„นํ„ฐ ๋ถ„ํฌ ์ผ์น˜ - [ ] ์ฝ”ํ˜ธํŠธ ๋ถ„ํฌ ์ผ์น˜ --- ### **Phase 4: Robustness & Extensions (์‹ฌํ™”) - 2์ฃผ** ๊ฐ•๊ฑด์„ฑ ๊ฒ€์ฆ ๋ฐ ์ถ”๊ฐ€ ๋ถ„์„ #### Week 3: Mechanisms (Module #26) ```python # src/models.py์— ์ถ”๊ฐ€ def test_mechanism_pivot_frequency(df, formula="pivot_count ~ z_vagueness + controls"): """ Module #26: Test mechanism - pivot frequency H_mechanism: Companies with higher vagueness pivot more frequently """ # Detect pivots from description changes over time df['pivot_count'] = detect_pivot_events(df) model = smf.ols(formula, data=df).fit() return model def test_mechanism_learning_speed(df, formula="time_to_productmarket ~ z_vagueness * F_flexibility + controls"): """ Module #26: Test mechanism - learning speed H_mechanism: Vague+flexible companies learn faster """ model = smf.ols(formula, data=df).fit() return model ``` #### Week 4: Robustness (Module #27) ```python # test/integration/test_robustness.py def test_specification_curve_h2(): """Module #27: Run 100+ specifications for H2""" from multiverse import run_specification_curve # Define specification space specs = { 'controls': [ ['z_employees_log'], ['z_employees_log', 'founder_serial'], ['z_employees_log', 'founder_serial', 'z_firm_age'], ], 'fixed_effects': [ [], ['sector_fe'], ['sector_fe', 'founding_cohort'], ], 'sample': [ 'all', 'quantum_only', 'post_2015', ], } results = run_specification_curve(df, specs) # 80% ์ด์ƒ์˜ spec์—์„œ ์œ ์˜ํ•œ ์–‘์˜ ๊ณ„์ˆ˜ significant_positive = sum( (r.params['z_vagueness'] > 0) & (r.pvalues['z_vagueness'] < 0.05) for r in results ) assert significant_positive / len(results) > 0.80 ``` **์ฒดํฌ๋ฆฌ์ŠคํŠธ:** - [ ] Pivot ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๊ตฌํ˜„ ๋ฐ ํ…Œ์ŠคํŠธ - [ ] Learning ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๊ตฌํ˜„ ๋ฐ ํ…Œ์ŠคํŠธ - [ ] Specification curve ์‹คํ–‰ (100+ specs) - [ ] Alternative measurements ํ…Œ์ŠคํŠธ - [ ] Subsample robustness ๊ฒ€์ฆ --- ## ๐Ÿ› ๏ธ ์‹ค์šฉ์  ์›Œํฌํ”Œ๋กœ์šฐ ### **์ผ์ผ ๋ฃจํ‹ด (Daily Workflow)** ```bash # 1. ๋…ผ๋ฌธ ์ž‘์—… ์ „: ํ˜„์žฌ ์ƒํƒœ ํ™•์ธ pytest test/integration/test_paper_results.py -v # 2. ์ฝ”๋“œ ์ˆ˜์ • (์˜ˆ: models.py) # ... edit code ... # 3. ํ…Œ์ŠคํŠธ ์‹คํ–‰: ๋…ผ๋ฌธ ๊ฒฐ๊ณผ ๊นจ์กŒ๋‚˜? pytest test/integration/test_paper_results.py::TestTable1_H1_EarlyFunding -v # 4. ์‹คํŒจํ•˜๋ฉด โ†’ ์ฝ”๋“œ ์ˆ˜์ • or ๋…ผ๋ฌธ ์—…๋ฐ์ดํŠธ # 5. ์„ฑ๊ณตํ•˜๋ฉด โ†’ git commit git add . git commit -m "Update H1 specification - all paper tests pass" ``` ### **๋…ผ๋ฌธ ์ œ์ถœ ์ „ ์ฒดํฌ๋ฆฌ์ŠคํŠธ** ```bash # 1. ๋ชจ๋“  ํ…Œ์ด๋ธ” ์žฌํ˜„ pytest test/integration/test_paper_results.py::TestTable1 -v pytest test/integration/test_paper_results.py::TestTable2 -v # 2. ๋ชจ๋“  ๊ทธ๋ฆผ ์žฌ์ƒ์„ฑ python -m src.cli generate-all-figures --output outputs/ # 3. LaTeX ํ…Œ์ด๋ธ” ์ž๋™ ์ƒ์„ฑ python scripts/generate_paper_tables.py # 4. ์ตœ์ข… ๊ฒ€์ฆ pytest test/integration/ -v --cov=src ``` --- ## ๐Ÿ“Š ์ง„ํ–‰ ์ƒํ™ฉ ์ถ”์  ### **Current Status (2024-01-20)** | Phase | Module | Status | Priority | |-------|--------|--------|----------| | 1 | #23 (H1) | ๐ŸŸก ์ฝ”๋“œ ์™„๋ฃŒ, ํ…Œ์ŠคํŠธ ํ…œํ”Œ๋ฆฟ ์žˆ์Œ | HIGH | | 1 | #24 (H2) | ๐ŸŸก ์ฝ”๋“œ ์™„๋ฃŒ, ํ…Œ์ŠคํŠธ ํ…œํ”Œ๋ฆฟ ์žˆ์Œ | HIGH | | 1 | #25 (Vร—F) | ๐ŸŸก ์ฝ”๋“œ ์™„๋ฃŒ, ํ…Œ์ŠคํŠธ ํ…œํ”Œ๋ฆฟ ์žˆ์Œ | HIGH | | 2 | #17 (Measurements) | ๐ŸŸข ์ฝ”๋“œ + ํ…Œ์ŠคํŠธ ์™„๋ฃŒ | MEDIUM | | 2 | #20 (Specifications) | ๐ŸŸก ์ฝ”๋“œ ์™„๋ฃŒ, ๊ฒ€์ฆ ํ•„์š” | MEDIUM | | 3 | #15 (Sample) | ๐ŸŸก ์ฝ”๋“œ ์™„๋ฃŒ, ๊ฒ€์ฆ ํ•„์š” | MEDIUM | | 3 | #16 (Variables) | ๐ŸŸก ์ฝ”๋“œ ์™„๋ฃŒ, ๊ฒ€์ฆ ํ•„์š” | MEDIUM | | 4 | #26 (Mechanisms) | ๐Ÿ”ด ์ฝ”๋“œ ํ•„์š” | LOW | | 4 | #27 (Robustness) | ๐ŸŸก ์ฝ”๋“œ ์žˆ์Œ, ํ…Œ์ŠคํŠธ ํ•„์š” | LOW | Legend: - ๐ŸŸข = ์™„๋ฃŒ - ๐ŸŸก = ์ง„ํ–‰ ์ค‘ - ๐Ÿ”ด = ์‹œ์ž‘ ์ „ ### **Next 3 Actions (์šฐ์„ ์ˆœ์œ„)** 1. **๋…ผ๋ฌธ ๊ฐ’ ์ž…๋ ฅ** (30๋ถ„): - `test/integration/test_paper_results.py`์˜ `PaperConstants` ํด๋ž˜์Šค์— ์‹ค์ œ ๋…ผ๋ฌธ ํ…Œ์ด๋ธ” ๊ฐ’ ๋ณต์‚ฌ 2. **Table 1 ์žฌํ˜„ ํ…Œ์ŠคํŠธ** (1์‹œ๊ฐ„): ```bash # ์‹ค์ œ ๋ฐ์ดํ„ฐ๋กœ H1 ์‹คํ–‰ pytest test/integration/test_paper_results.py::TestTable1_H1_EarlyFunding -v # ๋ถˆ์ผ์น˜ ๋ฐœ๊ฒฌ โ†’ ์›์ธ ํŒŒ์•… # - ๋ฐ์ดํ„ฐ ํ•„ํ„ฐ๋ง ๋ฌธ์ œ? # - Control variables ๋ˆ„๋ฝ? # - ๋…ผ๋ฌธ ์˜คํƒ€? ``` 3. **Figure 2 ์ƒ์„ฑ ์Šคํฌ๋ฆฝํŠธ** (2์‹œ๊ฐ„): ```python # src/plotting.py์— ์ถ”๊ฐ€ def generate_figure2_evf(df): """Generate Figure 2 for paper""" # ... plotting code ``` --- ## ๐Ÿ’ก Best Practices ### **1. ๋…ผ๋ฌธ ๊ฐ’์€ ๋ณ„๋„ ํŒŒ์ผ๋กœ ๊ด€๋ฆฌ** ```python # test/fixtures/paper_values.py class PaperTable1: """Values from Table 1 in published paper""" VAGUENESS_COEF = -0.234 VAGUENESS_SE = 0.089 N_OBS = 450 class PaperTable2: """Values from Table 2 in published paper""" VAGUENESS_COEF = 0.456 INTERACTION_COEF = -0.321 ``` ### **2. ํ…Œ์ŠคํŠธ๋Š” ๊ด€๋Œ€ํ•˜๊ฒŒ (Tolerance)** ```python # Bad: ์™„์ „ ์ผ์น˜ ์š”๊ตฌ (๋ถˆ๊ฐ€๋Šฅ) assert result.params['z_vagueness'] == -0.234 # Good: ยฑ1% tolerance (ํ˜„์‹ค์ ) assert abs(result.params['z_vagueness'] - (-0.234)) < 0.01 ``` ### **3. ์‹คํŒจ ์‹œ ์œ ์šฉํ•œ ์—๋Ÿฌ ๋ฉ”์‹œ์ง€** ```python # Bad assert coef == paper_coef # Good assert abs(coef - paper_coef) < 0.01, \ f"Coefficient mismatch: code={coef:.3f}, paper={paper_coef:.3f}, " \ f"diff={coef-paper_coef:.3f} ({(coef-paper_coef)/paper_coef*100:.1f}%)" ``` ### **4. ๋…ผ๋ฌธ Figure๋Š” ๋ณ„๋„ ๋””๋ ‰ํ† ๋ฆฌ** ``` outputs/ โ”œโ”€โ”€ paper_figures/ # ๋…ผ๋ฌธ์— ๋“ค์–ด๊ฐˆ ์ตœ์ข… ๊ทธ๋ฆผ โ”‚ โ”œโ”€โ”€ fig2_evf.pdf โ”‚ โ”œโ”€โ”€ fig2_evf.png โ”‚ โ”œโ”€โ”€ fig3_lvf.pdf โ”‚ โ””โ”€โ”€ fig4_stv.pdf โ”œโ”€โ”€ paper_tables/ # LaTeX ํ…Œ์ด๋ธ” โ”‚ โ”œโ”€โ”€ table1_h1.tex โ”‚ โ”œโ”€โ”€ table2_h2.tex โ”‚ โ””โ”€โ”€ table_descriptive.tex โ””โ”€โ”€ diagnostics/ # ์ง„๋‹จ์šฉ ์ž„์‹œ ๊ทธ๋ฆผ โ””โ”€โ”€ ... ``` --- ## ๐Ÿš€ Quick Start (์ง€๊ธˆ ๋ฐ”๋กœ ์‹œ์ž‘) ### **10๋ถ„ ์•ˆ์— ์ฒซ ํ…Œ์ŠคํŠธ ์‹คํ–‰** ```bash # 1. ๋…ผ๋ฌธ Table 1์—์„œ ๊ณ„์ˆ˜ ํ•˜๋‚˜๋งŒ ๋ณต์‚ฌ # test/integration/test_paper_results.py ์—ด๊ธฐ # PaperConstants.TABLE1_VAGUENESS_COEF = -0.234 โ† ์‹ค์ œ ๊ฐ’ ์ž…๋ ฅ # 2. ๋‹จ์ผ ํ…Œ์ŠคํŠธ ์‹คํ–‰ pytest test/integration/test_paper_results.py::TestTable1_H1_EarlyFunding::test_table1_vagueness_coefficient -v # 3. ๊ฒฐ๊ณผ ํ™•์ธ # PASSED โ†’ ์ฝ”๋“œ ์ •ํ™• โœ“ # FAILED โ†’ ๋ถˆ์ผ์น˜ ์›์ธ ํŒŒ์•… ํ•„์š” ``` --- ## ๐Ÿ“š References - **Paper Mapping**: `docs/PAPER_CODE_MAPPING.md` - **Test Code**: `test/integration/test_paper_results.py` - **Hypothesis Tests**: `src/models.py` - **CI/CD**: `.github/workflows/test.yml` --- ## โ“ FAQ **Q: ๋…ผ๋ฌธ ๊ฐ’์ด ์ •ํ™•ํžˆ ์žฌํ˜„ ์•ˆ ๋˜๋ฉด?** A: 3๊ฐ€์ง€ ๊ฐ€๋Šฅ์„ฑ: 1. ๋ฐ์ดํ„ฐ ํ•„ํ„ฐ๋ง ์ฐจ์ด (๊ฐ€์žฅ ํ”ํ•จ) 2. Control variables ์ฐจ์ด 3. ๋…ผ๋ฌธ ์˜คํƒ€ (๋“œ๋ฌผ์ง€๋งŒ ์žˆ์Œ) โ†’ ยฑ1-2% ์ด๋‚ด๋ฉด ๊ดœ์ฐฎ์Œ. ๋” ํฌ๋ฉด ์›์ธ ํŒŒ์•… ํ•„์š”. **Q: ๋ชจ๋“  ๋ชจ๋“ˆ์„ ๋‹ค ์—ฐ๋™ํ•ด์•ผ ํ•˜๋‚˜?** A: No! **Results (Module #23-27)๋งŒ 100% ์—ฐ๋™**ํ•˜๋ฉด ์ถฉ๋ถ„. Introduction/Discussion์€ ์ฝ”๋“œ ์—†์–ด๋„ ๋จ. **Q: ๊ทธ๋ฆผ์€ ์ž๋™์œผ๋กœ ์—…๋ฐ์ดํŠธ๋˜๋‚˜?** A: ```bash # ๊ทธ๋ฆผ ์ž๋™ ์žฌ์ƒ์„ฑ python -m src.cli generate-all-figures # Git hook์œผ๋กœ ์ž๋™ํ™” ๊ฐ€๋Šฅ # .git/hooks/pre-commit์— ์ถ”๊ฐ€ ``` **Q: ๋…ผ๋ฌธ ์ˆ˜์ • ์‹œ ๋งค๋ฒˆ ํ…Œ์ŠคํŠธ ๋Œ๋ ค์•ผ?** A: Results ์„น์…˜ ์ˆ˜์ • ์‹œ๋งŒ ํ•„์š”. Introduction/Discussion ์ˆ˜์ •์€ ํ…Œ์ŠคํŠธ ๋ถˆํ•„์š”.