# ๋
ผ๋ฌธ ์๋ ์์ฑ ํ
์คํธ ๊ฐ์ด๋
# Paper Auto-Generation Testing Guide
## 1. ํ์ ํ
์คํธ (Essential Tests)
๋
ผ๋ฌธ ์๋ ์์ฑ ํ์ดํ๋ผ์ธ์ ์ ๋ขฐ์ฑ์ ๋ณด์ฅํ๊ธฐ ์ํ **3๋จ๊ณ ํ
์คํธ ์ ๋ต**:
### Tier 1: ํต์ฌ ํต๊ณ ํ
์คํธ (Critical - Must Pass)
**๋ชฉ์ **: ๋
ผ๋ฌธ์ ๋ค์ด๊ฐ๋ ๋ชจ๋ ์ซ์๊ฐ ์ ํํ์ง ๊ฒ์ฆ
#### 1.1 ๋ชจ๋ธ ๊ณ์ ํ
์คํธ (H1/H2 Coefficients)
```python
# test/integration/test_paper_results.py
def test_h1_vagueness_coefficient_sign():
"""H1: Vagueness ๊ณ์๊ฐ ์์์ธ์ง ํ์ธ (์ ๋ณด๋น์ฉ ๊ฐ์ค)"""
result = test_h1_early_funding(df)
coef = result.params['z_vagueness']
assert coef < 0, "H1: Vagueness should reduce early funding"
def test_h2_interaction_exists():
"""H2: VรF ์ํธ์์ฉ ํญ์ด ๋ชจ๋ธ์ ํฌํจ๋์๋์ง ํ์ธ"""
result = test_h2_main_growth(df)
interaction_terms = [p for p in result.params.index
if 'vagueness' in p and 'hardware' in p]
assert len(interaction_terms) > 0, "H2: VรF interaction must exist"
```
**์ด๊ฒ์ด ์ค์ํ ์ด์ **:
- ๋
ผ๋ฌธ์ ํต์ฌ ์ฃผ์ฅ์ด ๋ฐ์ดํฐ์์ ์ค์ ๋ก ๋์ค๋์ง ํ์ธ
- ๊ณ์ ๋ถํธ๊ฐ ๋ฐ๋๋ฉด ๋
ผ๋ฌธ ์ ์ฒด ๋ด๋ฌํฐ๋ธ๊ฐ ๋ฐ๋
- ๋ฆฌ๋ทฐ์ด๊ฐ ์ฌํํ ๋ ๊ฐ์ ๊ฒฐ๊ณผ๊ฐ ๋์์ผ ํจ
#### 1.2 ํ
์ด๋ธ ๊ฐ ์ผ์น ํ
์คํธ (Table Validation)
```python
def test_table1_matches_h1_model():
"""Table 1์ ๊ณ์๊ฐ H1 ๋ชจ๋ธ ๊ฒฐ๊ณผ์ ์ ํํ ์ผ์นํ๋์ง"""
result = test_h1_early_funding(df)
# Generate table
from scripts.generate_paper_tables import generate_table1_h1
latex_table = generate_table1_h1(df, output_path='/tmp/table1.tex')
# Extract coefficient from LaTeX
import re
coef_match = re.search(r'Vagueness.*?(-?\d+\.\d+e[+-]\d+)', latex_table)
table_coef = float(coef_match.group(1))
model_coef = result.params['z_vagueness']
# Must match to at least 3 significant figures
assert abs(table_coef - model_coef) < 1e-10
```
**์ด๊ฒ์ด ์ค์ํ ์ด์ **:
- ์ฌ๋์ด ์์ผ๋ก LaTeX ํ
์ด๋ธ์ ๋ง๋ค๋ฉด ์คํ ๋ฐ์
- ์๋ ์์ฑ ์คํฌ๋ฆฝํธ๊ฐ ์ฌ๋ฐ๋ฅด๊ฒ ๊ฐ์ ์ถ์ถํ๋์ง ํ์ธ
- ๋ฐ์ดํฐ๊ฐ ๋ฐ๋์ด๋ ํ
์ด๋ธ์ด ์๋์ผ๋ก ์
๋ฐ์ดํธ๋๋์ง ํ์ธ
#### 1.3 Figure ์์ฑ ํ
์คํธ (Figure Generation)
```python
def test_figure2_file_created():
"""Figure 2 (Early Funding vs Vagueness)๊ฐ ์์ฑ๋๋์ง"""
from src.cli import cmd_generate_plots
# Run plotting
args = type('obj', (object,), {'dataset': 'all'})
cmd_generate_plots(args)
# Check file exists
fig_path = Path('paper/figures/fig2_early_funding.pdf')
assert fig_path.exists(), "Figure 2 PDF must be created"
assert fig_path.stat().st_size > 1000, "Figure 2 must not be empty"
```
**์ด๊ฒ์ด ์ค์ํ ์ด์ **:
- LaTeX ์ปดํ์ผ ์ ๊ทธ๋ฆผ์ด ์์ผ๋ฉด ์ค๋ฅ ๋ฐ์
- ๊ทธ๋ฆผ์ด ๋น ํ์ผ์ด๋ฉด ๋
ผ๋ฌธ์ ์๋ฌด๊ฒ๋ ์ ๋์ด
- ์๋ํ ์คํฌ๋ฆฝํธ๊ฐ ๋๊น์ง ์คํ๋๋์ง ํ์ธ
---
### Tier 2: ๋ฐ์ดํฐ ํ์ง ํ
์คํธ (Important - Should Pass)
**๋ชฉ์ **: ์
๋ ฅ ๋ฐ์ดํฐ๊ฐ ๋ถ์์ ์ ํฉํ์ง ํ์ธ
#### 2.1 ์ํ ํฌ๊ธฐ ํ
์คํธ
```python
def test_sample_size_sufficient():
"""์ต์ ์ํ ํฌ๊ธฐ ํ์ธ (ํต๊ณ์ ๊ฒ์ ๋ ฅ)"""
df = load_dataframe('data/processed/features_engineered.nc')
# H1 requires at least 30 observations (rule of thumb)
assert len(df) >= 30, f"Sample too small: {len(df)} < 30"
# H2 requires balanced classes
growth_counts = df['growth'].value_counts()
minority_class = growth_counts.min()
assert minority_class >= 10, f"Minority class too small: {minority_class}"
```
#### 2.2 ๊ฒฐ์ธก์น ํ
์คํธ
```python
def test_no_missing_values_in_key_vars():
"""ํต์ฌ ๋ณ์์ ๊ฒฐ์ธก์น๊ฐ ์๋์ง ํ์ธ"""
df = load_dataframe('data/processed/features_engineered.nc')
key_vars = ['vagueness', 'early_funding_musd', 'is_hardware', 'growth']
for var in key_vars:
missing_pct = df[var].isna().sum() / len(df) * 100
assert missing_pct < 5, f"{var} has {missing_pct:.1f}% missing"
```
#### 2.3 ์ด์์น ํ
์คํธ
```python
def test_vagueness_range():
"""Vagueness ์ ์๊ฐ ์ ํจํ ๋ฒ์ ๋ด์ ์๋์ง"""
df = load_dataframe('data/processed/features_engineered.nc')
assert df['vagueness'].min() >= 0, "Vagueness cannot be negative"
assert df['vagueness'].max() <= 100, "Vagueness cannot exceed 100"
# Check for unrealistic values
extreme_high = (df['vagueness'] > 95).sum()
assert extreme_high < len(df) * 0.01, "Too many extreme vagueness scores"
```
---
### Tier 3: ํ์ดํ๋ผ์ธ ํตํฉ ํ
์คํธ (Good to Have)
**๋ชฉ์ **: ์ ์ฒด ํ์ดํ๋ผ์ธ์ด ์ฒ์๋ถํฐ ๋๊น์ง ์คํ๋๋์ง ํ์ธ
#### 3.1 End-to-End ํ
์คํธ
```python
def test_full_pipeline_runs():
"""์ ์ฒด ํ์ดํ๋ผ์ธ ์คํ (๋ฐ์ดํฐ โ ๋ถ์ โ ๋
ผ๋ฌธ)"""
import subprocess
# Clean previous outputs
subprocess.run(['make', 'clean-all'], check=True)
# Run full pipeline
result = subprocess.run(['make', 'all'], capture_output=True)
# Check all outputs exist
assert Path('data/processed/features_engineered.nc').exists()
assert Path('paper/results_auto.tex').exists()
assert Path('paper/tables/table1_h1.tex').exists()
assert Path('paper/figures/fig2_early_funding.pdf').exists()
```
#### 3.2 ์ฌํ์ฑ ํ
์คํธ
```python
def test_results_are_reproducible():
"""๋์ผํ ๋ฐ์ดํฐ๋ก ๋ ๋ฒ ์คํํ๋ฉด ๊ฐ์ ๊ฒฐ๊ณผ๊ฐ ๋์ค๋์ง"""
df = load_dataframe('data/processed/features_engineered.nc')
# Run H1 twice
result1 = test_h1_early_funding(df)
result2 = test_h1_early_funding(df)
# Coefficients must be identical
np.testing.assert_array_almost_equal(
result1.params.values,
result2.params.values,
decimal=10,
err_msg="H1 results not reproducible"
)
```
---
## 2. ํ
์คํธ ํ์ผ ๊ตฌ์กฐ (Test Organization)
```
test/
โโโ unit/ # Tier 1: ๋จ์ ํ
์คํธ
โ โโโ test_models.py # H1/H2/H3/H4 ๋ชจ๋ธ ํจ์ ํ
์คํธ (53 tests)
โ โโโ test_features.py # Vagueness scorer ํ
์คํธ (25 tests)
โ โโโ test_data_io.py # NetCDF I/O ํ
์คํธ (NEW)
โ
โโโ integration/ # Tier 2: ํตํฉ ํ
์คํธ
โ โโโ test_paper_results.py # ๋
ผ๋ฌธ ๊ฒฐ๊ณผ ๊ฒ์ฆ (Table/Figure ์ผ์น)
โ โโโ test_data_quality.py # ๋ฐ์ดํฐ ํ์ง ๊ฒ์ฌ (NEW)
โ โโโ test_pipeline.py # ์ ์ฒด ํ์ดํ๋ผ์ธ ์คํ (NEW)
โ
โโโ fixtures/ # ํ
์คํธ ๋ฐ์ดํฐ
โ โโโ sample_data.nc # ์ํ ๋ฐ์ดํฐ (50 companies)
โ โโโ expected_outputs/ # ๊ธฐ๋ ์ถ๋ ฅ๊ฐ
โ โโโ table1_expected.tex
โ โโโ h1_expected_coef.json
โ
โโโ conftest.py # ๊ณต์ fixtures (pytest)
```
---
## 3. ํ
์คํธ ์คํ ๋ฐฉ๋ฒ (How to Run Tests)
### ๋น ๋ฅธ ํ
์คํธ (Quick - 1๋ถ)
```bash
# ํต์ฌ ๋ชจ๋ธ ํ
์คํธ๋ง (๊ณ์๊ฐ ๋ง๋์ง)
pytest test/unit/test_models.py::TestH1EarlyFunding -v --no-cov
# ๋
ผ๋ฌธ ๊ฒฐ๊ณผ ๊ฒ์ฆ (ํ
์ด๋ธ ์ผ์นํ๋์ง)
pytest test/integration/test_paper_results.py -v --no-cov
```
### ์ ์ฒด ํ
์คํธ (Full - 5๋ถ)
```bash
# ๋ชจ๋ ํ
์คํธ ์คํ + ์ปค๋ฒ๋ฆฌ์ง ๋ฆฌํฌํธ
make test
# ๋๋
pytest test/ -v --cov=src --cov-report=html
```
### ๋
ผ๋ฌธ ์ ์ถ ์ ๊ฒ์ฆ (Before Submission - 10๋ถ)
```bash
# 1. ์ ์ฒด ํ์ดํ๋ผ์ธ ์ฌ์คํ
make clean-all
make all
# 2. ๋ชจ๋ ํ
์คํธ ์คํ
make test
# 3. ๋
ผ๋ฌธ ๊ฐ ๊ฒ์ฆ
make validate
# 4. PDF ์ปดํ์ผ
make paper
```
---
## 4. ๋ก์ปฌ ํ๊ฒฝ ํ
์คํธ ์์ (Local Testing Example)
### 4.1 ์ค์น (Installation)
```bash
# 1. Clone repository
git clone https://github.com/user/empirics_ent_strat_ops.git
cd empirics_ent_strat_ops
# 2. Install dependencies (NO pyarrow needed!)
pip install -r requirements.txt
# 3. Verify installation
python -c "import xarray; import pandas; import statsmodels; print('โ All dependencies OK')"
```
### 4.2 ๋ฐ์ดํฐ ๋ณํ (Convert existing Parquet to NetCDF)
```bash
# If you have existing .parquet files:
python scripts/convert_to_netcdf.py --directory data/processed
# Expected output:
# Converting features_engineered.parquet...
# โ features_engineered.parquet (2.3 MB)
# โ features_engineered.nc (1.8 MB)
# Ratio: 0.78x
```
### 4.3 ์ ์ฒด ํ์ดํ๋ผ์ธ ์คํ (Run Full Pipeline)
```bash
# Step-by-step (recommended for first time)
make data # โ data/processed/features_engineered.nc
make analysis # โ paper/results_auto.tex
make tables # โ paper/tables/*.tex
make figures # โ paper/figures/*.pdf
make paper # โ paper/output/main.pdf
# Or all at once:
make all
```
### 4.4 ํ
์คํธ ์คํ (Run Tests)
```bash
# Quick test (ํต์ฌ๋ง)
pytest test/unit/test_models.py -v --no-cov
# Expected output:
# test_h1_negative_vagueness_effect PASSED
# test_h2_interaction_term_exists PASSED
# ...
# ======================== 53 passed in 2.34s ========================
# Full test (์ ์ฒด)
make test
# Expected output:
# test/unit/test_models.py .................... [ 68%]
# test/unit/test_features.py ............. [ 84%]
# test/integration/test_paper_results.py .... [100%]
# ======================== 78 passed in 4.12s ========================
```
---
## 5. ํ
์คํธ ์คํจ ์ ๋์ (Troubleshooting)
### Case 1: H1 ๊ณ์ ๋ถํธ๊ฐ ๋ฐ๋
```
FAILED test_h1_negative_vagueness_effect
AssertionError: H1: Vagueness should reduce early funding
```
**์์ธ**:
- ๋ฐ์ดํฐ๊ฐ ๋ฐ๋
- ๋ชจ๋ธ ์คํ ๋ณ๊ฒฝ (๋ณ์ ์ถ๊ฐ/์ ๊ฑฐ)
- ์ฝ๋ฉ ์ค๋ฅ
**๋์**:
1. ๋ฐ์ดํฐ ํ์ธ: `df['vagueness'].describe()` - ๋ถํฌ๊ฐ ์ด์ํ๊ฐ?
2. ๋ชจ๋ธ ํ์ธ: `test_h1_early_funding(df).summary()` - ์ด๋ค ๋ณ์๊ฐ ๋ฌธ์ ?
3. ์ด๋ก ์ฌ๊ฒํ : H1 ๊ฐ์ค์ด ํ๋ ธ์ ์๋ ์์
### Case 2: ํ
์ด๋ธ ๊ฐ ๋ถ์ผ์น
```
FAILED test_table1_matches_h1_model
AssertionError: Table coefficient -0.234 != Model coefficient -0.235
```
**์์ธ**:
- LaTeX ์์ฑ ์คํฌ๋ฆฝํธ์์ ๋ฐ์ฌ๋ฆผ ์ฐจ์ด
- ํ
์ด๋ธ ์์ฑ ์ ๋ค๋ฅธ ๋ฐ์ดํฐ ์ฌ์ฉ
**๋์**:
1. `scripts/generate_paper_tables.py` ํ์ธ
2. `format_coef_se()` ํจ์์ ์์์ ์๋ฆฌ์ ํ์ธ
3. ํ
์คํธ์ ํ์ฉ ์ค์ฐจ ์กฐ์ (`decimal=3` โ `decimal=2`)
### Case 3: Figure ์์ฑ ์คํจ
```
FileNotFoundError: paper/figures/fig2_early_funding.pdf not found
```
**์์ธ**:
- ๊ทธ๋ฆผ ์์ฑ ์คํฌ๋ฆฝํธ ์ค๋ฅ
- ๊ฒฝ๋ก ์คํ
- ๋ฐ์ดํฐ ๋ถ์กฑ (๋น ๊ทธ๋ฆผ)
**๋์**:
1. ์ง์ ์คํ: `python -m src.cli generate-plots --dataset all`
2. ๋ก๊ทธ ํ์ธ: ์ด๋ ๋จ๊ณ์์ ์คํจ?
3. ์ํ ํฌ๊ธฐ ํ์ธ: ๊ทธ๋ฆผ ๊ทธ๋ฆด ๋ฐ์ดํฐ๊ฐ ์ถฉ๋ถํ๊ฐ?
---
## 6. CI/CD ํตํฉ (GitHub Actions)
### ์๋ ํ
์คํธ (Every Push)
```yaml
# .github/workflows/test.yml
name: Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Run unit tests
run: pytest test/unit/ -v
- name: Run integration tests
run: pytest test/integration/ -v
```
### ๋
ผ๋ฌธ ์๋ ๋น๋ (On Main Branch)
```yaml
# .github/workflows/paper.yml
name: Build Paper
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run pipeline
run: make all
- name: Upload PDF
uses: actions/upload-artifact@v3
with:
name: paper
path: paper/output/main.pdf
```
---
## 7. ์ฒดํฌ๋ฆฌ์คํธ (Checklist)
### ๋
ผ๋ฌธ ์ ์ถ ์ (Before Submission)
- [ ] ๋ชจ๋ ํ
์คํธ ํต๊ณผ (`make test`)
- [ ] ์ ์ฒด ํ์ดํ๋ผ์ธ ์คํ ์๋ฃ (`make all`)
- [ ] PDF ์ปดํ์ผ ์ฑ๊ณต (`paper/output/main.pdf` ์กด์ฌ)
- [ ] Table 1-2 ๊ฐ์ด ๋ชจ๋ธ ๊ฒฐ๊ณผ์ ์ผ์น
- [ ] Figure 2-3 ํ์ผ ์์ฑ๋จ
- [ ] Results section ์๋ ์์ฑ๋จ (`paper/results_auto.tex`)
- [ ] Git commit์ ๋ชจ๋ ๋ณ๊ฒฝ์ฌํญ ํฌํจ
- [ ] README์ ์ฌํ ๋ฐฉ๋ฒ ๋ช
์
### ๋ฆฌ๋ทฐ ํผ๋๋ฐฑ ํ (After Review)
- [ ] ๋ฐ์ดํฐ ๋ณ๊ฒฝ ์ `make clean-all && make all` ์ฌ์คํ
- [ ] ๋ชจ๋ธ ์คํ ๋ณ๊ฒฝ ์ ํ
์คํธ ์
๋ฐ์ดํธ
- [ ] ์๋ก์ด ๊ฐ์ค ์ถ๊ฐ ์ ํ
์คํธ ์ถ๊ฐ
- [ ] ๋ชจ๋ ํ
์คํธ ์ฌ๊ฒ์ฆ
---
## ์์ฝ (Summary)
**3๊ฐ์ง ํต์ฌ ํ
์คํธ**:
1. **๋ชจ๋ธ ๊ณ์ ํ
์คํธ**: ๋
ผ๋ฌธ์ ํต์ฌ ์ฃผ์ฅ์ด ๋ฐ์ดํฐ์์ ๋์ค๋๊ฐ?
2. **ํ
์ด๋ธ ๊ฒ์ฆ ํ
์คํธ**: ์๋ ์์ฑ๋ ํ
์ด๋ธ์ด ๋ชจ๋ธ ๊ฒฐ๊ณผ์ ์ผ์นํ๋๊ฐ?
3. **ํ์ดํ๋ผ์ธ E2E ํ
์คํธ**: ์ฒ์๋ถํฐ ๋๊น์ง ์ค๋ฅ ์์ด ์คํ๋๋๊ฐ?
**ํ
์คํธ ์คํ ์์**:
```bash
# 1. ๋น ๋ฅธ ๊ฒ์ฆ (1๋ถ)
pytest test/unit/test_models.py -v --no-cov
# 2. ์ ์ฒด ํ์ดํ๋ผ์ธ (5๋ถ)
make all
# 3. ๋ชจ๋ ํ
์คํธ (5๋ถ)
make test
# 4. ๋
ผ๋ฌธ ํ์ธ (์๋)
open paper/output/main.pdf
```
**์ฑ๊ณต ๊ธฐ์ค**:
- ๋ชจ๋ ํ
์คํธ ํต๊ณผ (78/78 passed)
- PDF ์์ฑ ์ฑ๊ณต
- ํ
์ด๋ธ/๊ทธ๋ฆผ ์๋ ์์ฑ
- Git์ ๋ชจ๋ ๋ณ๊ฒฝ์ฌํญ commit
์ด์ ๋ฐ์ดํฐ๊ฐ ๋ฐ๋์ด๋ `make all`๋ง ์คํํ๋ฉด ๋
ผ๋ฌธ์ด ์๋์ผ๋ก ์
๋ฐ์ดํธ๋ฉ๋๋ค! ๐