Scaling AI that understands the world like we do, using prob.prog. Build AI system we understand vs AI system the works the way we understand the world ![[Pasted image 20230821130950.png]] marco gen maccoy -genjax, alex: language method; george: spiking neuron; feras: learning structural program ![[Pasted image 20230821131211.png]] alternative route? two prob prog; structure online - thursday (detail of structure) ![[Pasted image 20230821131322.png]] ![[Pasted image 20230821131359.png]] ![[Pasted image 20230821131719.png]] this time is robust and faster (strong machine learning) ![[Pasted image 20230821131629.png]] reverse engineering - structure of the model (encode strong constraints from domain knowledge) data of staggering scale - augmentation of web crawl (simulation ; sticky human feedback) ![[Pasted image 20230821131840.png]] neuro symbolic program; need hybrid to ai engineering - alternatives? end2end explainable? - not only ai architecture but also (accellerator) has brittleness - no meta awareness of sth gone wrong ![[Pasted image 20230821132015.png]] ![[Pasted image 20230821132054.png]]![[Pasted image 20230821132100.png]] - project shut down - bc how many billions L4,5 autonomy - argo ai (was improving but management couldn't know how fast; will it converge in your lifetime) - don't question their ability to see (attention and judgement - verification for human perception cost is much lower) - modularity btw perception and planning teach ![[Pasted image 20230821132413.png]] even in narrow (computer is not the bottleneck - human's strategic decision) structural synthesize (game modules) ![[Pasted image 20230821132601.png]] parallel stack for deep - learn ; test not sticky product loops (but by modular comparison) ![[Pasted image 20230821132800.png]] generative and inference programs (specifying models and robust, speed, paralleligm) - domain specific languages rest on platform hardware fabric; ![[Pasted image 20230821132944.png]] ![[Pasted image 20230821133314.png]] generate hypo space + do inference in hypo space (take gen program as input and output) ![[Pasted image 20230821133414.png]] doesn't attempt to automate (positive results) - Q. compiler (just as needed); some level of automation ![[Pasted image 20230821133616.png]] representing generative code (unifying - covid and ) ![[Pasted image 20230821133659.png]] auto-diff take a program to evaluatie function value and code calc its derivative (unsung heros: diff) - tweak and make optimize ; compare likelihood (ratio of ps and qs - estimates of rn derivative) - symbolic computation () ![[Pasted image 20230821133745.png]] javascript + c++ ![[Pasted image 20230821133950.png]] model-agnostic (not religion of computation); but that doesn't mean we're uniform interested; neuro symbolic is easier for commercialization objects of neural networks (goal: learn prob.prog that learn accuratvely) ensembel of - tighter correspondance (broda uncertantiy - concentrate l- higher freq is resolveed as does the linear trend) - fine grained ![[Pasted image 20230821134224.png]] prob.![[Pasted image 20230821134320.png]] ![[Pasted image 20230821134325.png]]![[Pasted image 20230821134341.png]] compiler is good enough if you have (how to use platform well, can do this better - better than worse example) ![[Pasted image 20230821134533.png]] guiding firm data (gates foundation)![[Pasted image 20230821134628.png]] Q. cross checking sense reality in probablistic ![[Pasted image 20230821134743.png]] input rgb ( reconstruction of depth data - robust six deg of objects) - google deplot this capability with youtube ![[Pasted image 20230821134828.png]] symbolic and probablistic (rgb and depth) - probablistics as it hasn't seen whaat's inside the cup deep learning system (non-sensical errors) ![[Pasted image 20230821135050.png]] depth estimation algorithm - gnerative program (put mass in where it seems to be) - shallowest ignorance possible (data efficiency and robustness) - 6 deg of freedom - beach - ![[Pasted image 20230821135241.png]] - ![[Pasted image 20230821135251.png]] multiscale version of hypothesis (genjax - mcmc + jax + cuda - possible but difficult to get in genjax) ![[Pasted image 20230821135315.png]] seq. mc steering of llm using prob.prog (xuan and alex) - constraint generation (all words have less than five) constrinat on predictate - ![[Pasted image 20230821135516.png]] mcmc have problems (lots of iteration) ![[Pasted image 20230821135550.png]] llamappl - use seq.monte carlo steering (transormate context - const) ![[Pasted image 20230821135613.png]] token ![[Pasted image 20230821135646.png]] solve constraint ![[Pasted image 20230821135719.png]] trustworthy ![[Pasted image 20230821135733.png]] learn structure from data (parallize - robustly learn open ended) differnt level of scaling ![[Pasted image 20230821135831.png]] ![[Pasted image 20230821135850.png]] ![[Pasted image 20230821135904.png]] - end of (green curve would flatten - intell - 32yr roadmap - deeplearnig woudl be a contender Q.) - write a Bayes sampler (just as slow when V was in grad.) - using thousands (5k) - 2010 headroom of architcuture to get suffficient gain - optimistic (5k to be captuer - slightly better complier) - seq. mc can scale better compared to sgd doesn't have - exact bayes inf is sllower (hard separation) exponnetially for nonconvex - feasbilty (when can sampling be faster than optimization ) - ![[Pasted image 20230821140121.png]] - ![[Pasted image 20230821140237.png]] - memory efficiency and collaboration () - finer scale (reverse engineering of brain) - ![[Pasted image 20230821140321.png]] > 1. Is there any measure that includes prior elicitation stage into speed? I feel different models (e.g. prob.prog vs machine learning) would have different ease and depth of prior elicitation. low-hanging fruit (hand writing, learning them from data) - hybrid (human judge) give syntehtic data to machine and judge > 2. Is it correct each stack has different evolution speed (e.g. generative programs evolve faster than inference programs / software (PyTorch) faster than hardware (cpu, gpu) ? How do you modularize different versions of prob.comp stack for their effective orchestration? speed of cultural evolution , biological, development (process might operate); may have similar laws under-neath > 3. What do you mean by “ignorance” prior? (From ignorance prior over prob. Programs) >4. Are you aware of any example or case-study where developer’s performance jump after learning how to use platform? i.e. empirical evidence for worse (time to climb learning curve of computing stack) than better dynamics? - product perspective than ui/ux (user studies; document, marginal gain) > Do neural nets do well for perceiving the 3D scenes if you steer them with facts about objects? Can you do something like the SMC LLM paper you mentioned? parse the images - using (conditioned that machine is 3d) parse the image and large energy spent (various last mile problems); intellectual debts (inflection point would cross, won't be appealing) Where is the automatic differentiation in the probabilistic programming framework usually used? Could you give some examples? Thanks! steering stuff (3d - gradient based) gradient -based is local (need good surrogate - so fall flat) - (where is backprop in brain - challenging to implement) > I saw that Intermediate representations were part of the PPL stack. Are there PPL-specific IRs or optimizations that can exploit the model structure as opposed to traditional instruction level optimization? green field (ir is for higher level language) - one prob. specialized (gen elimination lib, generative code (changing model representation - have larger affect)vs inference code); few years ago (autodiff - ir is for generative code; could design (know what ir (host language complied to it))) critical (bets ; school - empowered to challenge the bets); critical of the environment + economic play would fold differently - convergence (both profit and nonprofit - governance to be useful in different setting - big tech has reasons to prefer neural network(price capture) - leading edge of the field was industry settings - prob.prog - substantially cheaper (2~3 for same robustness) - number of entities to participate / scrutinize) - time series used (resampled us seq. mc); selection - infinite timeseries (hypothesis space is infinite - sample time series ; optimal ) - posterior (ignorance priors are diff from greedily) - consensus prior (oncology) - 3d paper rc car with prob.prog (drives lot slower (policy is saying much slower)) - verification effort > (James Sum) Within the realm of 3D vision perception in urban settings, how do probabilistic approaches address depth estimation errors introduced by occlusions from moving vehicles and pedestrians, and have there been advancements made in combining probabilistic graphical models with convolutional neural networks to refine these depth maps? > (Lulu Ito) How important is it to be dedicated in the brain structure to get a better result? inspiriation for convolution, humans can rec hum will get better design () interested computation bottleneck (not interested in neuro-insipred compuation) learnble from brain