Maria Cracium, Youness Kaddar, Matt Bowers, Wendy Sun, Francisco Marchi, Sid Cox, Yudi Xie, James Sum 🎯: learning pp structure from data, dsl with fast, exact inference, automated modeling of multivariate and db [[abe() ---- both encouraging and labor of love - teaching material invite the slack channel (comment) ![[Pasted image 20230824131210.png]] - how to scale up - we can have ppl that captures both symbolict and diff model including key maths - single particle do litttle bit of generative and inference program (user supplied) ![[Pasted image 20230824131435.png]] - less clear to write programs (start from dsl, then move on to less contraints) - fast, exact (if you domain knowledge falls within certain region) - make meta program which cooperatively infer from structured data (time series, panel, cross-sectional, relational; how many different panel datas (individual - longitude - Roge chetty's tax return, macro econ, core financial report, sec companies, lab values for clinical trial)) - flexibility of human thought vs wide spread (econ, bio-statistc - flexible), oncology, economic policy (prospect of grounded model is good), - scaling up to common sense - if we could go from data to probprog, symbolically analyze (how2quantify - calc rare events) - - ![[Pasted image 20230824131552.png]] scale the method - m3 database - 10times faster (temp fusion) - self-tuned arima models, - statistical model hard to beat (7yrs ago on google health) - in the supplement (logistic regression is within) - illusion of progress / random forest - empircal support from accuracy ![[Pasted image 20230824132058.png]] - 3d (shapes are arbirary for box-sized model) - don't need to - probablistic + nonparameteric (1. model for shape voxal grid - 2. mixture model - mixture model) - vm prefers to can empirical model (memorize data - ) rather than semi-parateric - run time and accuracy (whiskers - confidnec) - smc converge smoothly - gradient had problem with local min, sgd converged better (wrong cuz part of the data) - MCMC: SMC = gd:sgd (likelihood data) - pure jacob has done some work (short parallel chain) - what more (speed and mind - adaptive biological perspective should matter e.g. selection) - dsl (domain specific language) - linear(1.0), - matrix (sigma, noise level, series of ) - ![[Pasted image 20230824133028.png]] - ![[Pasted image 20230824133734.png]] - ![[Pasted image 20230824133802.png]] - noise level and 0 to N - ![[Pasted image 20230824133814.png]] - ![[Pasted image 20230824133947.png]] - variation - periodic covariance - high covariance or length scale covariance ![[Pasted image 20230824134001.png]] - sin squared (general length scale where covariance is low) - warped by the (covariance is also periodic) - far apart - combine (symbolic - linear trend over lay / memorize data within the structure) - whole field that can be mapped to gen - this to next (change dsl and change expressiveness) - - ![[Pasted image 20230824134512.png]] - sketching (filling in wholes of deterministic program) - ![[Pasted image 20230824134538.png]] - schedule (enough compute) - lean and get better anything (takes work to remain flexible) - pl is studying (expressiveness and specialization) - tradeoff is essential (compilers show - express formalism - automate specialization - sufficiently good without penalty) - for intelligence, core q is ppl question + new domain specific (learned collaboratively) - arc of this field first principled reasons (type of theory how the tradeoff playoff) - didn't expect to learn gpu programming (need to understand system and architecture - ppl has one more area"codesign of dsl") - changepoints - robustness - ![[Pasted image 20230824135309.png]] - ![[Pasted image 20230824135328.png]] - ![[Pasted image 20230824135336.png]] - ![[Pasted image 20230824135345.png]] - some change![[Pasted image 20230824135412.png]] - bayes structural (neural variant failed) ![[Pasted image 20230824135426.png]] - why ml can fail - ![[Pasted image 20230824135459.png]] - flexible of new data, (gambling - hedge your bets) - single program is totally crazy - dsl_program doesn't include unkown unkowns - ![[Pasted image 20230824135828.png]] - google deepmind teeunecities![[Pasted image 20230824140351.png]] - ![[Pasted image 20230824140635.png]] (sampled prior) - when you're not interpolating, posterior doesn't capture - posterior are far apart - space around the actual (area around the actual mode) - ![[Pasted image 20230824141521.png]]- spread mass to too many (assign low likelihood to all dataset) too simple - assing to much likelihood to - uniform model (failfied ability bias) - more falsified model gain more evidence - compared to model that can't falsify (optimize no luck) - monk (optimization - electric monastry - trick into believeing sth and having no flexiblity left) - ![[Pasted image 20230824141936.png]] - ![[Pasted image 20230824142017.png]] - ![[Pasted image 20230824142027.png]] - ![[Pasted image 20230824142034.png]] - ![[Pasted image 20230824142044.png]] - prob.programming made it too tempting not to try - ![[Pasted image 20230824142204.png]] - choose type of code - parameterize and (fills in dsl) - ![[Pasted image 20230824142239.png]] - infiintely branching time series tree - short symbolic (Q. why thought is even possible) - what dsl capture (infernce process is much fater) - ![[Pasted image 20230824142244.png]] - not simplistic prior but ignorance (run time would grow too much - ) ; - foundation for induction under computational constraint (eliia - inductive bias = computational bias - learning over circuits under - not a norm but smooth - circuts quickly) - what's the compute cost for prior and likelihood - don't need probablity mass concentrate - ![[Pasted image 20230824142341.png]] - ![[Pasted image 20230824142531.png]] - ![[Pasted image 20230824142601.png]] - resample move smc - ![[Pasted image 20230824142651.png]] - gen's radom nickodym (jacobian - normal ad and treat it as part of nd) - ![[Pasted image 20230824142713.png]] - incremental computation - re-excuate local parts of the trace) - ![[Pasted image 20230824142750.png]] - need too many iteration (smc - resample ![[Pasted image 20230824142819.png]] - ![[Pasted image 20230824142833.png]] - drops much faster ![[Pasted image 20230824142845.png]] - resample mode smc - hmc for re - multi-threaded![[Pasted image 20230824142900.png]] - how to make; modern cpu is powerful than gpu (esp. for prob syntehsis) genjax (other parallel) - ![[Pasted image 20230824143011.png]] - the right (everything is underfit) - robust model + fast to converge infence alg![[Pasted image 20230824143020.png]] - how this might scale for brain-level