1. ask me the minimal sufficient statistics of supply information in order to generate two implementations (lower variability with higher cost VS higher variability with lower cost)
2. ask me the minimal sufficient statistics of demand information in order to generate two customer description (xx1 with yy1 vs xx2 with yy2)
mockup (without functionality) vs prototype (that functions)
lead user vs general market
for example, when i'm writing a paper in entrepreneurship, there are different axes to segment audience. but we need to increase the flow of quantitative-based qualitative people.
1. two ways to
# summary of the thirteen special topics to follow below
| Module | Topic | Rational AI Principle | How | Contrast | Example | |
| ------- | ----------------------------------------------------------------------------------------------------------------------- | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --- |
| 4 | Why automate math Automatic differentiation of expected values Probabilistic programming with stochastic probabilities | 🧩Compose To Simplify | Implement higher-order operations (gradient, expectation, Radon-Nikodym derivative) as composable program transformations, allowing complex inference algorithms to emerge from simple building blocks | Traditional implementations require specialized derivations for each model; composable transformations enable automatic generation of efficient estimators for any well-formed program | **Stochastic gradient descent with language models.** We represent both the language model and a reward function as probabilistic programs. The objective is to maximize the expected reward of generated text. Traditional approaches require complex, model-specific gradient estimators with manual derivations spanning multiple pages. Using our approach, gradient estimation is automated by applying composable transformations: (1) transform sample-based language model into expectation-operator form, (2) apply automatic differentiation to this form, and (3) generate unbiased gradient estimators with user-selectable variance-cost tradeoffs. This enables rapid prototyping of RLHF pipelines that would otherwise require specialized implementations, and can yield up to 10x faster training compared to handcrafted estimators by automatically exploring the space of possible gradient estimation strategies. | |
| 1 | Scaling behavior of intelligence vs machine learning | 📐Shallow to Deep | Define broad model space with structured ignorance priors, allowing models to evolve incrementally as evidence accumulates | Traditional ML fixes model architecture and optimizes parameters; shallow-to-deep allows structure itself to adapt based on evidence | **Time-series forecasting with a dynamic DSL**. We specify a generative model (e.g. linear trend, seasonal/periodic components, change-point jumps) with latent hyperparameters for amplitudes, frequencies, etc. The system samples _which_ model components to include, plus their parameters, instead of committing to a single fixed architecture. Then, using particle-based sampling (e.g. SMC), it _automatically_ refines and prunes these structural choices as new observations arrive. In **airline-traffic demo**, this approach successfully adapts to the sudden COVID drop, while a transformer or LSTM—pre-trained on stationary patterns—fails to track the abrupt change. The code can be implemented in Gen/GenJAX by writing a short “kernel DSL” for time-series structure, assigning a broad prior over kernels, and applying sequential Monte Carlo so that the model _grows in complexity only if/when the data demand it_. | |
| 2 | Perception and navigation | 👁️See Flowing Mass | Visualize probability distribution through entire parameter space using particle-based methods with adaptive computation | Deep learning provides point estimates without uncertainty; flowing mass reveals full distribution of possibilities | **2D Robot Localization with noisy sensors**. Gen/GenJAX _generative model_ of a robot’s motion (uncertain rotation & translation) and sensor (noisy distance readings to walls). Instead of a single “best guess,” it **maintains a _distribution_ of possible poses** using Sequential Monte Carlo. The “mass” of particles _flows_ from one area of the map to another when sensor data contradict the current pose. A single-point estimate (like Tesla’s autopilot) can fail catastrophically if, e.g., the map alignment is off by one room; but a particle-based inference method _automatically_ _rejuvenates_ proposals and corrects itself when local mismatches accumulate. The code is straightforward: define a pose+motion generative function, a sensor-likelihood function, then run “resample‐move” SMC so multiple pose hypotheses are re-weighted each step, bridging _uncertainty_ in real time. | |
| 3 | Foundations of modeling and inference | 🪒Auto Occam's Razor | Sample from hierarchical models with latent hyperparameters rather than optimize fixed models | Explicit regularization requires manual tuning; sampling naturally favors typical solutions | **Polynomial regression with outlier detection**. Assign a prior over both the degree of the polynomial and outlier/noise parameters; then use posterior _sampling_ to infer model complexity. A naive optimizer might pick a high-degree polynomial that overfits, whereas the Bayesian sampler automatically leans toward simpler polynomials unless evidence strongly demands complexity, thus embodying “Occam’s razor” without manually fiddling with penalty terms. | |
| Module | Topic | Rational AI Principle | How | Contrast | Example |
| ------ | ---------------------------------------------------------------------------------------------------------------------- | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | Scaling behavior of intelligence vs machine learning | 📐Shallow to Deep | Define broad model space with structured ignorance priors, allowing models to evolve incrementally as evidence accumulates | Traditional ML fixes model architecture and optimizes parameters; shallow-to-deep allows structure itself to adapt based on evidence | **Time-series forecasting with a dynamic DSL**. We specify a generative model (e.g. linear trend, seasonal/periodic components, change-point jumps) with latent hyperparameters for amplitudes, frequencies, etc. The system samples _which_ model components to include, plus their parameters, instead of committing to a single fixed architecture. Then, using particle-based sampling (e.g. SMC), it _automatically_ refines and prunes these structural choices as new observations arrive. In **airline-traffic demo**, this approach successfully adapts to the sudden COVID drop, while a transformer or LSTM—pre-trained on stationary patterns—fails to track the abrupt change. The code can be implemented in Gen/GenJAX by writing a short “kernel DSL” for time-series structure, assigning a broad prior over kernels, and applying sequential Monte Carlo so that the model _grows in complexity only if/when the data demand it_. |
| 2 | Perception and navigation | 👁️See Flowing Mass | Visualize probability distribution through entire parameter space using particle-based methods with adaptive computation | Deep learning provides point estimates without uncertainty; flowing mass reveals full distribution of possibilities | **2D Robot Localization with noisy sensors**. Gen/GenJAX _generative model_ of a robot’s motion (uncertain rotation & translation) and sensor (noisy distance readings to walls). Instead of a single “best guess,” it **maintains a _distribution_ of possible poses** using Sequential Monte Carlo. The “mass” of particles _flows_ from one area of the map to another when sensor data contradict the current pose. A single-point estimate (like Tesla’s autopilot) can fail catastrophically if, e.g., the map alignment is off by one room; but a particle-based inference method _automatically_ _rejuvenates_ proposals and corrects itself when local mismatches accumulate. The code is straightforward: define a pose+motion generative function, a sensor-likelihood function, then run “resample‐move” SMC so multiple pose hypotheses are re-weighted each step, bridging _uncertainty_ in real time. |
| 3 | Foundations of modeling and inference | 🪒Auto Occam's Razor | Sample from hierarchical models with latent hyperparameters rather than optimize fixed models | Explicit regularization requires manual tuning; sampling naturally favors typical solutions | **Polynomial regression with outlier detection**. Assign a prior over both the degree of the polynomial and outlier/noise parameters; then use posterior _sampling_ to infer model complexity. A naive optimizer might pick a high-degree polynomial that overfits, whereas the Bayesian sampler automatically leans toward simpler polynomials unless evidence strongly demands complexity, thus embodying “Occam’s razor” without manually fiddling with penalty terms. |
| 4 | Why automate math Automatic differentiation of expected values Probabilistic programming with stochastic probabilities | 🧩Compose To Simplify | Implement higher-order operations (gradient, expectation, Radon-Nikodym derivative) as composable program transformations, allowing complex inference algorithms to emerge from simple building blocks | Traditional implementations require specialized derivations for each model; composable transformations enable automatic generation of efficient estimators for any well-formed program | **Stochastic gradient descent with language models.** We represent both the language model and a reward function as probabilistic programs. The objective is to maximize the expected reward of generated text. Traditional approaches require complex, model-specific gradient estimators with manual derivations spanning multiple pages. Using our approach, gradient estimation is automated by applying composable transformations: (1) transform sample-based language model into expectation-operator form, (2) apply automatic differentiation to this form, and (3) generate unbiased gradient estimators with user-selectable variance-cost tradeoffs. This enables rapid prototyping of RLHF pipelines that would otherwise require specialized implementations, and can yield up to 10x faster training compared to handcrafted estimators by automatically exploring the space of possible gradient estimation strategies. |
| 5 | Neural network models of visual perception | | | | |
| 6 | Learning probabilistic programs | | | | |
| 7 | Theory-of-mind via inference | | | | |
| 8 | Language model probabilistic programming | | | | |
| 9 | Neurally mappable implementations | | | | |
| 10 | Research methods | | | | |
| 11-12 | Research frontiers | | | | |
| 13 | Project CHI | | | | |
I'll help you fill out the "Why automate math Automatic differentiation of expected values Probabilistic programming with stochastic probabilities" row from the first attachment, following the cohesive table principles described in the cohesive table document.
Based on the lecture notes and transcript, here's a cohesive completion of that row:
| Module | Topic | Rational AI Principle | How | Contrast | Example |
| ------ | ---------------------------------------------------------------------------------------------------------------------- | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 4 | Why automate math Automatic differentiation of expected values Probabilistic programming with stochastic probabilities | 🧮Compose To Simplify | Implement higher-order operations (gradient, expectation, Radon-Nikodym derivative) as composable program transformations, allowing complex inference algorithms to emerge from simple building blocks | Traditional implementations require specialized derivations for each model; composable transformations enable automatic generation of efficient estimators for any well-formed program | **Stochastic gradient descent with language models.** We represent both the language model and a reward function as probabilistic programs. The objective is to maximize the expected reward of generated text. Traditional approaches require complex, model-specific gradient estimators with manual derivations spanning multiple pages. Using our approach, gradient estimation is automated by applying composable transformations: (1) transform sample-based language model into expectation-operator form, (2) apply automatic differentiation to this form, and (3) generate unbiased gradient estimators with user-selectable variance-cost tradeoffs. This enables rapid prototyping of RLHF pipelines that would otherwise require specialized implementations, and can yield up to 10x faster training compared to handcrafted estimators by automatically exploring the space of possible gradient estimation strategies. |
This row maintains cohesiveness with the previous rows in several ways:
1. **Column-wise cohesiveness**: The "Rational AI Principle" (🧮Compose To Simplify) aligns with the previous principles (📐Shallow to Deep, 👁️See Flowing Mass, 🪒Auto Occam's Razor) as a fundamental approach that enables more robust and efficient AI systems.
2. **Row-wise cohesiveness**: The "How" and "Contrast" columns maintain consistent relationships with the other rows, showing the pattern of flexible approaches versus fixed approaches across different domains.
3. **Logical consistency**: The example demonstrates a practical application of the principle in a way that's consistent with the level of detail and structure seen in examples from previous rows.
The completed row focuses on the key insights from the lecture about automatic differentiation of expected values in probabilistic programs, highlighting how composable program transformations allow complex algorithms to emerge from simple building blocks - a principle that fits cohesively with the table's overall structure and message.
[[📜gans20_choose(tech)]]