- data description by imdb: [[imdb22]]
- goal:
- compare old and new
- track person-wise movie
- idata plan
```
bottomup_format = dict(
"iter0_otitle": ["titleId"],
"iter0_ntitle": ["titleId"],
"iter1_title": ["titleId"],
"iter2_title": ["titleId"], # usregion, english title,
"iter0_operson": ["personId"],
"iter0_nperson": ["personId"],
"iter1_person": ["personId"],
"iter2_person": ["personId"],
)
indexed by (title, year). Interestingly this reminds me of stacking chain and draw into `prior_draw`.
```
out of 8199,165 movies, 4,179,580 are unique meaning most of them have two titles
dataprevelance which sorts the data quality
- 100k out of 8000k ~ 1/80 display differences (relatively good enough data)
- old movie's aka title
- year difference of +-1
[[spandrel/data/others/frankfrut_datab/imdb17]]
| - | title exact | a.k.a. title match |
| ---- | ----------- | ------------------ |
| year | 1 |2 |
| year +-1 | 3 | 4 |
- us film중에는 품질1은 old에만 있는 것 없음
- 품질2는 이미 반은 해결. newtitle + newtitleaka합친것과 old를 비교중 (old+oldaka)
- 품질3 위해, title별로 정렬 후 year1년 이내인것 count하기
![[vaccine_misinf_value.png]]