1 m immigrates to 10m country; ``` ###### ITER perparing COMPLETE, size : 1048575 ###### ITER perparing COMPLETE, size : 10301370 ``` after identifying gene composition [[common_mutant_nativeex.png]] - how to translate? 1001 gece -1001 Nights protocol: after reading [[_ref/spandrel/nopublish_spandrel/johan_dominance💫🔭/johan_angie/def(dominance)]], write summary of a day [[Pasted image 20221228074709.png]] shows nan can result from both outer-join with mutant and native's absense - [cinemagoer's 2006 log](https://github.com/cinemagoer/cinemagoer/blob/f23449c1ac5ee9ab728874c3fa18ba05a285ed9b/CHANGELOG.txt#L1178) introduced the isSame() method for both Movie and Person classes,useful to compare object by movieID/personID and accessSystem. 12.25 - old data title aka is not usuable (doesn't have id to match with the original) - `old_id` as (title, year) of old movie confuses the analysis as most operations can be done with (title, year) groupy, instead of (old_id) groupby. - after filtering only year +-1, remaining old but not new would be between 85k to 92k (~ 10k decrease) - 12.13 - finding - 12.5 - rename `old_movie.pkl` to `old_title.pkl` (the initial dataset where iter0 starts from) 12.4 #jcq list of verification check? currently, three checkpoints (after iter1-weak_sel, after join, after iter2-strong_sel) list of validation check? (are we building the right product) 12.3 - table: - old_movie_lang: `old_id`, `lang` (`old_id` is (language, production year) ) - tracing movies (movie marketing paper [Lane04 traced 2001: A space odyssey and Saving private Ryan](marginnote3app://note/EB711E20-6073-4F57-83F9-41C4B0CACBA5) , Alkowatly19 traced A day after tomorrow) - datafication: legitimate mean to access, understand, observe people's behavior 12.1 - For year 1980, 1990, 2000 (old에만 있는 영화 솎아내기) ``` # 1980, 1990, 2000 (remove old_only, old\new; filtering빼고 old-id - new-id -genre - filtering) ``` - - join first processed old movie series with new movie + genre and check wether the 300 movies that were in old but not in new is - the following 11 files are genre ``` short movie tvSeries tvShort tvMovie tvEpisode tvMiniSeries tvSpecial video videoGame tvPilot ``` 11.19 - new: use `startYear` as `productionYear` >new_movie = pd.merge(new_movie_en, new_movie_basic[['titleId','primaryTitle', 'title', 'startYear']], on = 'titleId', how = "left") print(new_movie_basic.shape[0],new_movie_en.shape[0], new_movie.shape[0], ) > 9m, 400k, 400k - dropna for `NA` for old_movie year - > old_movie['year'] = old_movie['year'].dropna().astype(int) - 11.18 - remake tsv file containing (title, year), language = 'en' - upload to engaing db (oldtitle.tsv, newtitle.tsv) - join with index (title, year) - if oldtitle's one (2m?) is spanish :) - actor 11.13 - used cinemagoer, easy for checking ### oldtitle - `dropna` from oldtitle.title (two rows which prevented filtering us films) - filter out #, -0, ! containing filmtitles: 2.5m -> 1.8m ### newtitle - filter `region` == US: 33m -> 1m - filter language == en: 33m -> 400k ({'\\N', 'cr', 'en', 'es', 'fr', 'haw', 'hi', 'myv', 'yi'}) - filter type == imdbDisplay: 33m -> 3m {'\\N','alternative','dvd','dvd\x02video','festival','festival\x02working','imdbDisplay','original','tv', 'tv\x02video','tv\x02working','video','video\x02working','working'} #jcq agree region filter is better - study: trade can worsen income inequality - <img width="406" alt="image" src="https://user-images.githubusercontent.com/30194633/194927543-634a13d8-84f8-48d6-bc19-cf47ac44cad8.png">