I forget now whether you do you have a discussant or not yeah Virginia oh Virginia's gon disc okay great so you here yeah she's there hi Jenny Virginia I mean hi okay let me just get up have you Josh you haven't seen any sessions at all yet right um I've seen I haven't been to join live but I watched video from the first two days okay did you watch bin I did yeah okay great okay gonna refer to it briefly yeah okay that's what I wanted yeah um I'm I'm also trying to see if there's sometime I know you said that there might I might be able to join I can't join the disc or there is no discussion today but um I'm trying to see if there's any one of the other panels that my schedule that would be great I can add you to any of them yeah I'll I'll uh I'll be in touch about that let me just solemn Duty here welcome everybody to the afternoon session everybody out in in the distributed land and everybody here I'd like to um introduce Josh tenenbom who's professor of computational cognitive science in the department of brain and cognitive science at MIT he's a principal investigator at the computer science and artificial intelligence lab called Cale is that former sale no uh sale is a Stanford s but it's formerly the AI lab and the LCS laboratory for computer science okay great yeah and a Thrust leader in the center for brains minds and machines cbnn his papers are on perception learning Common Sense reasoning in humans and machines with the twin goals of better understanding human intelligence in computational terms and building more humanlike in elligence in machines in other words he's cut out perfectly for this summer school I hand it over now to Josh tenen Bell okay great thank you so much Stephen for organizing for inviting me um you know I I I got to watch much of the first two days and it was really interesting to see a back and forth between people who are extremely impressed with large language models um both on their language abilities and maybe some of their General thinking abilities and other people who are much more skeptical uh that they have really anything to do with intelligence at least of the human form um the work I'm going to talk about here is a is an interesting I think mix of those two perspectives um and I hope that will be useful stimulating um engage in some interesting discussion uh both now and going forward over the next two weeks um the the heart of the talk which will really be more like the second half is based on this paper which you can find on archive from word models to World models understanding natural language by translating into into a problemistic language of thought um but I'm going to spend the first half setting some context on just how we think about thinking and then it's build on that for the relationship between language and thought um this is you know I would say this is both an AI talk and a cognitive science talk although most fundamentally to me I'm really interested in the computational structure and origin of the human mind um so just in that hear it in that lens um but I think also just uh sort of very generally um you know nobody can fail to be surprised and impressed at what has happened with the most recent machine learning models um and at the same time they're they're very puzzling and confounding in certain ways so I hope to try to be able to resolve some of that or at least point the way towards some of that um this paper the work I'm going to talk about is joint with a number of people I want to single out two people Lionel Wong and Gabe gr who are The Joint first authors of that paper I'm putting Lionel a little bit bigger um in part because a lot more of my slides are taken from l so I would say all of the credit um for the good stuff both on the research and in the slides to Lionel and Gabe um and you know anything that doesn't quite work or make sense is probably just me garbling things okay but a lot of other people um contribute to the word that I'm going to talk about here so again you know we can't who cannot be impressed with the advances of AI whether it's in perception and and you know robots deployed in the real world like the self-driving car systems of Tesla and weo or most recently conversational AI systems like chat GPT clearly there's something really interesting and important happening but at the same time something quite puzzling now there's a number of puzzles that people here have talked about just one I'll point to is what you might call the puzzle of confabulation um which points to it ways I think that the kinds of quote intelligence or unquote intelligence that we see in these systems is different in nature and Origins from that in our own minds so think about for example in the context of driving this is a video from uh Tesla uh self-driving system where it's it's faced with something a little bit out of its training set a horsedrawn carriage and and uh buggy and it interprets it as a truck first facing one way the other then it's um a SUV then it's a truck with a person behind it um now the person's gone away um shortly the person will come back um you know they can't figure out which way the truck is facing if a person were having this experience on the road you would think they should stop driving and get off the road as soon as possible right there's something fundamentally detached about their mind from reality here um and when this happens in systems where lives are at stake it can be deadly so this is a serious worry technologically and societally now when it comes to computer vision um and Tesla self-driving a lot of progress has been made um uh what I showed you what I'm showing you here is from a few years ago um but you still see similar kinds of things and also on the language side I'm just going to show one example of analogous kinds of things that really make you scratch your head on the the you you could find these in language models but I'm going to talk about multimodal language and vision models because this is where you know there's a lot of really interesting stuff going on uh where machine learning and especially like grounded language potentially might meet up with what goes on in cognition um so this is a I'm just going to show you a couple of little demos from um uh Google's large language model multimodal language model um Gemini both earlier versions and then the most recent version um this is the figure I'm showing here which I'm giving to Gemini and I should say you can see similar things with chat GPT but um it's a little bit funnier with the Google ones I'm going to focus on that um this is an image from my colleague Theos man Singa whose work will also figure in the talk um bash is one of the world's experts in probalistic programming he and I are very interested in how our minds and our machines could model uncertainty so here's an image that that I think all of us when we look at this it's it's it's pretty clear that though it's a little hard to read kind of messy it's really clear that there's a word here and that the word is uncertainty um but what do an llm a multimodal llm see from this well it might see this the image you sent me shows a black and white image of a text box the textb has a dark gray border rounded Corners good inside is the word dream written in a bold sanser font is there anything else I can help you with I ask how many letters are shown in the text box there are five letters shown in the text box the letters are d r e a and M what is going on with this now we could say all right well at least it's pretty good I mean at least it's recognizing that there's text and it's you know it could be worse but it seems to be strangely self-confident strangely confident about an incorrect answer but you could say at least it's not like seeing a person or person walking down the street um but consider this small change to the input image now our llm says this shows a black and white pixelated image of a person the person is facing in forward and appears to be standing still here are some details about the person they have short hair they're wearing a shirt with a collar their legs are slightly wider than their torso since the image is pixelated it's difficult to discern other details such as facial features or specific articles of clothing now again the thing that I think should make us puzzle here is what is not just that the system gets this wrong but that it seems to confidently assert things that are so detached from our sense of of the reality of what we're seeing and where just a small change can make such a huge difference um just to update things this was this was from the first version of Google's Gemini the most recent version of Gemini 1.5 pro has gotten better in some sense now given uncertainty it doesn't say dream it says optical illusion but it's still wrong and it's still overconfident um it rates itself as fairly confident that it says optical illusion it's interesting it makes you think is there some semantic association between dreams optical illusions and this particular way presenting the notion of uncertainty I don't know um the the very most recent thing this was well it's not that recent but the last time I gave this talk um I tried this on the the newest updated version of Gemini 1.5 Pro just to see if perhaps it had been improved and now it gave a very different answer it didn't it didn't read it as optical illusion it gave this extended digit number 612 981 53242 and I asked it to judge its confidence and it said well 70 to 80% again what what happened just to check again consistency and coherence of this weird interpretation I asked it just 10 seconds later just in out of context again just gave up the same thing and the next 10 seconds later it said geocache um again now 75% confident and then one more time um a distorted text appears to read Northeast blackout however it's difficult to be certain but it's moderately confident that the text reads Northeast blackout because the first and last words are relatively clear while the middle word is slightly less discernible but still suggestive of blackout okay so what's going on here that's one of the Mysteries now I'm not I'm not I'm going to gesture at what I think some of the Mysteries or some of the answers might be but mostly to point the way towards the difference with human intelligence and what we're trying to understand in in our work okay I think fundamentally what's going on is whether you're building a computer vision system or an llm or a multimodal language system You're Building A system that takes the inputs and outputs that our brains do perception of the external World sense data of some form and it produces actions or some something like actions that can be grounded back in the external world but the inside of the system doesn't have any notion of a world it's a function approximator it's learning to approximate the input output functions that our minds produce and it's learning to do that from various data sources including objective data sources as well as human reinforcement now how why does this possibly work well again there there may be some laws of physics things that are like physics like for example the famous scaling laws of neural language models where you can show in some form um especially when you're trying to predict like you know and this is very much following some of the information theoretic ideas that Richard has been talking about and others in the group here um if I'm just trying to predict the next token from the previous ones um there are certain fundamental power laws of language anguage and distributions in language that these systems seem to incorporate and build on such that if you increase by an order of magnitude the amount of compute you can in a predictable way lower the test loss in predicting the next token that's on the left um in GPT 4's technical report they suggested that you could see similar kinds of scaling laws for problem solving not just text prediction although it's a lot iier there and I think fundamentally the problem is though power laws are beautiful laws and they're predictable in a certain sense the thing about a power law is while it approaches Asm toote or it rather approaches zero error in its ASM toote it in some important sense never gets there compared to like an exponential decay where there's a predictable time scale in which it will get to zero a power law keeps slowing down and if you have any uncertainty in the power laws coefficient or you know in its applicability to the not just to predicting the data stream but actually solving a problem then it's it's basically impossible to know how much data and compute you're going to need to actually get to the ASM toote which we want to call you know full adult human intelligence in contrast humans don't seem to be built this way we our minds seem to be built as World modelers from the start and I'll say more about this but from the very beginning we have our our minds and our brains and this is something that is inherits shared with other animals and inherits from our evolutionary Legacy seem to be built to model the world and to deal with all kinds of incompleteness and uncertainty um and that means both in how in in the the structure of the world um and our uncertainty over what's out there the current state as well as the the fundamental deeper laws the causal laws of physics or how agents plan and so on but we're built to engage these kinds of mental representations um as I'll as I'll show in a little bit the way in our group we've modeled this for a long time is what we call the game engine in the head by analogy to say video game engines our our brains and Minds seem to be built with these kinds of resources for World modeling you can see that this the these notion of world models that are used even not just in human adults but in young children to coherently understand the world and deal with situations that are importantly out of distribution right and that's really key that the ways in which we perceive and learn about the world are not nearly as tied to the distributions of our experience as a machine learning or function approximation approach so here are some examples of different kind of self-driving system these are um four-year-olds for the most part 3 four 5year olds um in a genre of YouTube videos you can see one of them actually the video on the left um it's it's sort of the the video genre on YouTube is like you know my four-year-old driving for the first time and parents have put their kids Behind the wheels of golf carts or tractors or cars or trucks um and just go at it and they video from the side or or from next to them and you know you could question the rationality of the adults who are putting the kids in the situation and and putting the videos online but there's a certain basic rationality that the four-year-old has even though they haven't been in the situation before the systems that their mind has built through Evolution and the last couple of years the first few years of their experience allow them to handle this totally new kind of perceptual information the world's going by them at a high speed and they're in control in some way but they can generalize from their from from their experience because the the nature of their generalization is their mind's models of the world and those transfer to these new situations of course there are new things they have to learn about exactly how the steering wheel works but they can learn those very quickly too because the learning is grounded in their world model so fundamentally I think what we have is the is a contrast between the scaling thesis of deep learning and today's Ai and what you could call the original scaling route that human intelligence follows what we might call growing up as opposed to scaling up and I want to contrast three points which I think are really important for understanding in general our intelligence and contrast between Ai and machine learning and especially the role that language and language models play in this so three points that are that are fundamental to the way today's AI is working based on deep learning one that intelligence is seen as the end result of learning where learning starts with some simple very general and in some ways dumb mechanisms um associative learning prediction you know in some way um but the idea is if you scale that up enough you have this surprising emerging phenomenon that now we start to call intelligence um the the problem is though I mean either you can call it a problem or you could call it just the way things work but there's some remarkable kind of generalization but it's still is going to depend on similarity to training data and its ability to go beyond the training distribution is UN is weak and unpredictable compared to when you're in distribution um this is true whether you're talking about perception or other forms of higher level cognition but especially thinking things that we recognize as forms of reasoning planning problem solving and so on um the sort of thing that we've never seen in any computer vision system but we're starting to see some interesting kind of approximation to in language models crucially that in in machine learning thinking derives from language because it's only the data of language not the data of pixels that conveys information really about human thoughts which makes sense again humans have always used language as our main medium to express our thoughts to others just like what we're doing right now in giving talks like this right um but crucially thinking requires language data and even carefully cleaned and curated language data our colleagues who are at the big AI companies can tell us a lot more about this or rather maybe they can't tell us about this but they can tell us that they can't tell us about and those of us who've worked with this know the importance of having the right kinds of language data as well as the right reinforcement data okay so contrast this with human minds and especially human children our intelligence if for our intelligence it's not just the thing that is at the end result of all of our learning although we do you know babies are not in in some important way not as intell as human adults we would like to say but human intelligence is is built in in some form from the start it's the foundation of learning it's not just the end State it's what's there and the the learning mechanisms that let you learn so much from so little okay and we saw some of that in Virginia's talk and you know I think the the field of human cognitive development is a testament to this or what I was just showing from the four-year-olds is another example but in other work that I'm not going to talk about here we've done a lot of work trying to model the core knowledge of even you know 12-month-old 10-month old babies and there's important ways in which they have a common sense understanding of the world that is in in in significant ways it's not all built in but in significant ways it seems to be present even even in two and three month olds okay um as a result of our built-in capacity for modeling the world our generalization is based on the fit of our mental models it's not about the training data okay data is important we improve our models we grow models we can make new models from data but what accounts for generalization and what drives generalization is not similarity to the data but the fit of our world models and their ability to be flexible and themselves to be generalized and crucially thinking isn't the thing that comes at the end State as an emergent property of modeling language but rather thinking is there from the start and it's the basis for language it's the basis for why human children construct language so robustly and resiliently and again there I'm referring to uh by using that word resilience to some of the ideas that Virginia talked about in her talk and Susan Golden Meadow has talked about I urge everyone to watch her rumal heart prize talk where she talks about this as well this idea that as we saw in Susan's work and Annie Sis's work with Nicaraguan sign language that you know children who grow who grow up without Lang any language input it's not just about the poverty the stimulus in the traditional linguistic sense but you grow up with deaf without sign language input and you create in some form your own personal protol language or at least a way communicating that has hierarchical symbolic structure and then you bring a few such children together and within the span of a couple of generations they've created a whole new language from scratch okay so it's very clear from data like that that humans are built to think and we're built with an a desire to understand and to un to be understood to express our thoughts in some form and share them with others in our as our social partners and that's you know there's nothing I think more fundamental to uh to understand if you want to understand where intelligence comes from so what we've been trying to do at our work is to try to capture the human growing up scaling route in computational terms and it starts with this idea of probabilistic inference and expected value decision-making on top of world models you know informally I would say and like many others that thinking not just in human brains but in brains and in many other animals is about making good guesses and Bets not about the next datas thing you're going to see not about the next token or the next pixel set of pixel values but about the world what's going to happen in the world and how it might depend on your actions or the actions of others or how you might be able to change the World to Change those dependencies causal and counterfactual reasoning and then having some sense of of what you'd like to see happen or and what you'd not like to see happen or what could really cost you your life or be hugely valuable and making good bets about how to act and what to think about next fundamentally this classic idea of effectively rationality that uh you know you could say our minds and brains are the original sources of this idea through Evolution um being constructed to do this kind of computation in our work and this is the stuff we've been doing with language models uh Builds on this we've been using the idea of probabilistic programs which is a a family of mathematical languages and actual programming languages and platforms to embody this idea basically take this conceptual idea of rational World modeling inference of decision and turn it into practical engineering terms that can be models of human Minds as well as more humanlike Ai and I don't have time to give a whole introduction to problemistic programs but you know you can think of it as a as a kind of catchall phrase or a complicated suitcase phrase just like neural network um packages a number of different things together but probalistic programs are formalisms for combining what I think are several of the best ideas about intelligence that have come up through through a number of uh you know decades in the field that includes neural networks modern problemistic programming languages like Jen from Vash Man singo's Group who I mentioned before at M or pyro uh that was developed by originally a group at Uber AI that Noah Goodman uh another collaborator of our has helped to start along with a number of others um you know the uh uh in in many ways build on languages like pytorch or tensor flow the languages that support modern deep learning and that allow you to construct really complex but end to-end differentiable uh functions for approximation or other purposes but that's not the most important part the most important part are building on the idea of symbolic languages for expressing abstract knowledge for modeling the world um that is uh you know across many areas of Science and Engineering consistently the strongest most powerful toolkit we've had for building coherent models to to uh understand the world are various forms of symbols and then the third idea is the probabilistic one use those symbolic languages to express probabilistic models where you can be uncertain about everything and in a in a turing computational sense you can be uncertain about the state of the world right now or more abstractly how the world Works how the different kinds of data that you're getting perceptual or others are connected to the underlying state of the world and then to be able to do joint inference about all those different sources of uncertainty as the basis for perception reasoning planning learning and so on and problemistic programs bring those things together I'll say a little bit about how that toolkit works as well as then how once you learn language um the ability to externalize and internalize thoughts that are produced by these kinds of probabilistic programs basically um that transforms things in fundamental ways and here you know less people think that I'm just a a deep learning or llm skeptic or something um I you know again I find neural language models and the the long tradition of distributional statistical learning in language that they buil on to be really important and in the work that we've been doing we've been using large language models or in many by today's standards I might even say small language models to capture this actually um so I think this is a place where the you know I I I would not say that you want to think about human Minds it's llms or or Transformers all the way down nothing like that but the kinds of things that are going on in distributional sequence learning could be a way to capture some aspects of how language grounds in these tools for mental modeling and and also enriches and extends it so I mean just very briefly for people who aren't familiar with this idea there you know for a couple of decades now I and and many colleagues students former students a number of others who are very interested in what you could call the basian approach to cognition um have been using this General toolkit of probabilistic inference over structured symbolic models to capture many aspects of mental models and I just this is a plug for a book that's coming forthcoming edited by Tom Griff Nick Chader and myself and with many many other contributors and it'll be coming later this year from MIT press but it's it's part textbook part research monograph and just showing how this toolkit can be used to capture to both explain in a principled and you know actually understandable way but also to quantitatively predict and model behavioral data across so many different ways in which our minds model the world just to illustrate one which I'll come back to in the context of language in a minute um an area that we've done lot of work on is intuitive physics um as as a kind of probabilistic inference and the particular kinds of intuitive physics we're thinking about are in these complex scene understanding cases which again not coincidentally I think are also really interesting and classic settings for studying grounded language so for example if I show you um a a scene of a bunch of blocks think like Jenga blocks stacked up in various ways some of these images might look very stable others might look unstable and I can ask you a question How likely do you think this stack of blocks is to fall under Gravity and we can model that by having a a structured World model which consists of basically 3D object models and then causal models of how th of how those underlying 3D scenes give rise to images that's like a graphics program basically the computer Graphics is a way of writing programs that generate images from those underlying 3D World models but then there's also these physics programs and again here I mentioned game engines so game engine style physics simulators capture aot lot of aspects of effectively real world but Common Sense physics in ways that hack hack Newtonian or actual untrue scientific physics in all sorts of ways to be efficient but do a pretty good job of capturing what we expect to happen in the world which makes sense because we're the ones playing the video games and they're designed for us so by doing probabilistic inference to infer the input to a graphics program given the output which is the image you can do a lot of 3D perception and then by doing probabilistic forward simulation you can imagine what might happen next it could also be conditional on your action so the same tool kit could be applied to a much less familiar sort of judgment like imagine I have these scenes of red and yellow blocks on a table and what if I bump the table hard enough to knock some of the blocks onto the floor will you knock off more Red Blocks or yellow blocks so the first judgment the one about How likely are the stack of blocks to fall I mean anyone who's played Jenga or anybody who's ever been a kid probably has built things if you're a professional builder you have a lot of other intuitions but you know that's very familiar the question on the right is one that unless you've seen me talk about this you probably never thought about it's not something you have direct experience about you can't learn it from feedback how to answer that question but I can use my language to give you that question as well as the other relevant World Knowledge and then you can reason about it and the models that we build can capture both the familiar judgments of like How likely the stack is to fall as well as these novel judgments relatively equally well so these scouter plots are examples of the kind of data and modeling that we've done in our lab for a long time time on the Y AIS we're plotting the average human judgments of let's say on 1 to 7 scale how stable or unstable the blocks are um and on the x-axis the average um result of doing a small number of probabilistic simulations where we imagine running forward uh game style physics a few time steps with uncertainty about exactly where the blocks might be and how the physics works because we don't there there's are all things that our minds don't fully know the same kind of model can be used to answer these red and yellow questions so just to illust this um if we take one of these scenes and we reconstruct it in a game style physics engine and we simulate a bump of the table so there's one simulation on the right I'll show you another simulation with a harder bump okay and you can see watch it again that different things happen in these two simulations but it doesn't really matter which one of those you ran to answer the question right I mean you look at the scene also and it's very clear from the beginning that if I bump the table relatively hard it's going to probably be mostly yellow blocks on the floor how do you do that well in our model you only need to run one or a small number of these simulations to answer the question at the grain of intuitive physics and you don't need to run it very long you could stop it now and you already know the answer okay so a small number of short incomplete simulations is the basis for these pretty quantitative models and they've also been used in robotics to actually predict and and be able to get for example a robot to learn to play the game Jenga but learn from a very small amount of realistic Human Experience the same kind of idea has been very uh impactful I I think even more impactful in intuitive psychology um where we and and many others this there's work that I've done with a number of students and my colleague at MIT Rebecca Sachs but especially I would highlight work from Chris Baker and Julia har Edinger who did quantitative modeling here um going back years with us um Julian Contin Chris is now working on self-driving cars in Industry Julian is now working on um uh uh is now a professor at Yale where he's he's extended on this kind of approach in many areas of social communic social cognition and communication uh in really interesting ways and many other colleagues here um I I and I should mention also though I didn't I forgot to mention this before the intuitive physics work started in our group more than 10 years ago with Pete Batalia and Jess hamr and others and they are now actually both working at Deep Mind Google deep mind working on various interesting kinds of deep learning approaches for both intuitive physics and all sorts of real world physics that matter like climate modeling that's where Pete's current focuses um I I won't go into the details on the basian theory of mind um but the basic idea is again there's a program that now describes not necessarily how actual minds and brains works but our mental models of other Minds how the actions we see agents take we interpret them as the effect of planning programs that take as input the our mind's representations of their beliefs and desires and by seeing how agents act and change the state of the world and modeling also so their perception process which leads to belief formation and updating we can model many aspects of how people understand uh other Minds especially in what you might call these sort of physically grounded and perceptual scenarios the kinds that you can also study in young babies like in the experiments I'm showing here from Gary gay chib Kylie Hamlin Paul Bloom and others um the the intuitive psychology case is especially interesting because in work that we've done like in this paper with brandan Lake toer Olman Sam gersman that help to set some of the current stage of debates between deep learning and more cognitive approaches to AI um where we put out a challenge back in 2016 and 2017 um for the the Deep learning AI World about how to think about the kinds of inductive biases mental model U materials that seem to be built into human minds and the learning mechanisms that build on those um we highlighted both intuitive physics and intuitive psychology in the famous Sparks of AGI paper showing again some of the most uh earliest glimpse into some of the surprising things that gp4 seemed to be able to do uh BBC and colleagues also highlighted intuitive physics and intuitive psychology and then in a recent paper from Eric Schultz's Group which is kind of forms a a three-part story here and it continues as as I know Bush off and and Schultz and colleagues uh continue to work on this area and we also do is to now in the era of these multimodal language and vision models like I showed you at the beginning we can go and take all these tasks like whether it's block tower intuitive physics things or agents moving around in the world and interpreting um as Julia haringer called it the naive utility calculus of their rewards and costs and give language models or multimodal models the same kinds of stimuli and questions that we give people um and as Schult and colleagues showed in some of the intuitive physics settings they're kind of okay they're not great um in the intuitive theory of Mind ones they fail altogether that's a quote from their paper so as an example this this is work that Julia har Edinger did uh as part of his PhD thesis with myself and Laura Schultz um where we call these the astronaut studies where people would see an a agent an astronaut on some Planet who would start at a certain point and have a home base that they had to get to and they would follow some path along uh this the surface of the planet they could just go straight to their home base or they could take a different a not straight path um crucially there were various objects that they could pick up that could either be positively valuable or aversive to them and we ask people based on the path How likely do you think that the agent likes or doesn't like one of these objects and also there are different terrains and the terrains could be more or less costly and by showing people different Maps different configurations of objects based terrain and different paths you could get very interesting Rich inferences about what about what the agent both wants and what the rewards assigned to the different objects are as well as the costs for moving around on the terrain so here's an example of some of the stimuli from one experiment just to show you the kind of variation and then in each of these cases we can ask people to make four judgments or three or four depending on how many kinds of terrain and that's what's shown here these are these are the actually the predictions of the model um zcore for the relative cost for the different kinds of terrain and the relative value for the different kinds of objects and we're assuming that the agent basically takes an a rational efficient plan trying to maximize reward Minus cost where there's a small cost for each step but especially costly when you travel over certain kinds of terain so seeing the path you can make inferences about how the agent has rewards and costs and then when you ask people to make the same judgments they line up almost perfectly this is just one of many experiments that Julian did showing what is really a remarkable quantitative match and but but by a model that isn't just fit to data there's a little bit of fitting but it's mostly based on thinking about what are the Core Concepts of theory of mind that people like I mentioned before like G and chra have studied even in very young infants preverbal infants infants who who can't even walk um and barely have just able to reach for things themselves yet they still have these ideas these intuitions about efficient inference and and use those it seems um I should mentioned Sherry L's work who was a PhD student at Harvard a few years ago with Liz VY who worked with toor Olman also and me to show those same kinds of things work in babies Sherry's now doing amazing stuff extending that in a new Lab at John's Hopkins um so these These are cases where this kind of problemistic program model works really well but just a pure language model uh is basically at chance zero correlation with human judgments although quite good at telling you things like the background color of the scene okay so in the last part of the talk having sort of set the stage for how our our human thinking seems to work and how we can model this in ways that are explanatory understandable and quite quantitively predictive using these problemistic programs let's understand where language comes into the picture I I don't think that a pure machine learning Transformer approach like the ones we've been talking about are on track to give a human level or certainly a humanik account of how it works but I do think the ideas of of sequence modeling statistical distribution learning that you see having such great success in llms even perhaps on a smaller scale could say something important about how language comes into the picture so this here is the at the beginning I talked about this archive paper the word models to World models paper from lonel Wong Gabe gr and colleagues and that's what I'm going to be talking about here is how we've been using bringing these tools together but a key ingredient is this I would say the modern Return of the language of thought hypothesis you know made famous by Jerry foder but obviously with a history that goes back hundreds if not thousands of years but in a number of recent uh papers and proposals from various groups the idea that again that there's some kind of um not necessarily single language but but abstract symbolic languages which could be general or also created and constructed domain specific languages some kind of abstract symbolic language is seems to be a powerful way to think about human thinking abstraction concept learning and so on and the particular kind of languages of thought that we've been thinking about are what Noah Goodman and Toby gersberg And I called the probabilistic language of thought so this is the idea of using probabilistic programming languages the technical tool I talked about before to to formalize in a cognitive setting a certain kind of hypothesis of language of thought but one that is focused on modeling the world not just possible worlds but probable worlds and where the the symbolic language can also Express ways of conditioning and querying so that we can ask and answer the kinds of questions that our minds do and that we might we might want for example any kind of General AI system to do so if you want to learn more about the problemistic language of thought check out our chapter in the conceptual mind this is a one of the margal and Lawrence uh readers or the Web book probalistic models of cognition which which has examples of the kinds of models I'm going to be talking about not with language models though the new thing is to take advantage of llms and specifically the fact that most llms these days are trained not only on natural language but on on programming languages and source code namely programming languages like all the ones we're probably familiar with that are designed to be read and written by humans and not just machines and so they're written in a very english-like way or natural language like way you know linguists have long pointed out some times the differences between natural languages and programming languages but from a certain perspective they're a lot more similar than they are different hierarchical structure syntax and even just much of the Lexicon you know natural languages they're not just commented or programming languages are not just commented in natural language but functions variables data structures are named using English for the most part okay so that's a very powerful data source that allows some kind of statistical sequence to sequence model for predicting and translating between streams sequential streams to effectively learn to translate from English or any other natural language into programming languages of thought and that includes probalistic programming so the idea of this paper what we call rational meaning construction that's the name of this is is a particular thesis on how language is understood and perhaps also how it might be produced and learned Although our Focus here is on language understanding and thinking about the relation between language of language and thought in that context by thinking what we think of as like the core original notion of thinking is what I've been talking about for the for the first part of the talk which is the idea of having a a pro a structured problemistic model of the world conditioning it on observations and then and then um drawing samples of underlying latent States and future States okay that's thinking from this standpoint and then understanding language is effectively translating from natural language into a probabilistic language of thought that's used to define and condition and furry the probabilistic world model model so we're going to exploit the the the property of these llms to translate from natural language to code in a way that might be familiar if you've tried using them to code but it's different in some key ways in particular we're not going to be just asking our llm to write a whole bunch of code at least not to start but we're going to be focusing on the sentence level and what is meaning at the sentence level it's something like trying to infer a line of code in a mental programming language that is your best understanding of the meaning by which we just mean the thought that the person who's uttering that sentence is trying to convey okay and the the llms in this case can represent that meaning construction function okay in ways that have some very interesting properties so I'll just illustrate this with a classic example from the probasic language of thought literature the so-called basian tug of war so just to ground this imagine that you're reasoning about a situation like for example a various games of tug-of warar various people symbolized by colored shirts here are facing off off against each other and let's just take one person let's just say this is a uh guy named Jack and you might say well how strong do you think Jack is I haven't shown you any information so your best guess might be just average okay um now uh suppose I give you some other information like um or I should say here so here are a few samples you know I might say average maybe he's a little less than average um if we're talking about MIT students you might just pick the MIT average but now I could give more information like suppose you know that Jack beat Leo in a game of tug-of war so that might move your arrow up from the MIT average cuz well you know um people who are who are stronger might be more likely to win than people who are weaker somehow okay um I could give you some more information like Leo had just won 10 previous matches well so that means you might think Leo's pretty strong and Jack must be even stronger so your arrow goes way up but suppose I told you well Leo sometimes just doesn't pull as hard as they really could so maybe maybe Leo was just getting a little lazy when Leo faced Jack so it might go down a little bit but then Leo single-handedly beat a team with Jack and Tom on it okay so now you think okay well Leo was probably lazy before Jack maybe isn't that strong since Leo when they wanted to could beat both Jack and Tom okay so the point is in a in a classic example of non-monotonic reasoning your inference about this one aspect of the world Jack's strength is going up and down as you get various information so we'd like to understand how that thinking process works how updating your beliefs based on linguistic evidence linguistically expressed evidence works okay that's the starting point of this paper and the idea is to use in this case we're using the the probalistic programming language Church which is based on a dialc of Lis or scheme so there's a lot of parentheses and I'm not going to be able to unpack all the language but hopefully you can get the basic idea that we write we Define functions that describe probalistic distributions on strength laziness and so on I'll work through this and that's you know that's been that's well attested work that's been very good you know it's basically the the the general toolkit for doing the kind of cist IC intuitive uh mental modeling that I showed you with intuitive physics and intuitive psychology but here we're doing it for this kind of Novel domain it's not there's no core domain that infants are born with for tug of war but we can write a model like this to capture what adults in our culture at least might think and then the key new thing here is to understand how to to ground language in these mental models so this is where we're going to model the translation of a statement in English like Jack one against Leo or a question like how strong is Jack do lines of code that support probalistic updating and querying in this language okay and this these the meaning functions which will be implemented by the neural network here or the the large or even small language model um capture various interesting ideas that have been proposed you know I don't have time to review this history but proposed in different eras of linguistics and thinking about um how to think about meaning how to think about Concepts and you know I it's it's it's a very rich literature that I'd be happy to try to discuss if we have time during The Question period um but the hypothesis here is that maybe these distributional language code models can Implement some humanlike perhaps approximation to this meaning function um and again crucially it's it's we're not trying to now capture like to to learn patterns in data in the in in the world we're trying to learn patterns in our thinking and how thought is expressed in language so it's a more modular problem which I think is is better suited to the way language actually works in the human brain right it's there's a delimited part of our brain that is a language processing Network Strokes or other lesions there can really impact your language ability without impacting your general thinking ability um and you know it's a relatively late evolutionary addition to a to a brain structure which shares a lot with other non-human primates for example and so it's it's that that modularity I think is also really important here so again just to unpack how this works you have a probalistic program that describes these Concepts that I can I can describe to you in language in this way but to the model right now they're just given in code and I'm not and they just describe for example that you know player strength is drawn from a gaussing distribution players can occasionally be lazy um when when you're lazy it cuts your effective strength in half and the strength of the team is the sum of the total strength that they pull that the members pull with at any one time and the stronger pulling team wins that's basically what this says now for now we'll just assume that that's you have that kind of mental model of a tug-of war game and we'll just talk how do you update your released language and the basic way of doing it is that we're using a what I would call now a medium language model we used open AI codex which was the first widely used code llm pretty small by today's standards much smaller than GPT 4 kind of like an early version of GPT 3.5 to translate from a sentence in English like this into what in church the problemistic programming language is called a condition statement it just expresses is the constraint on possible worlds that while the underlying Define statements Define stochastic or probalistic functions which give you distributions probability distributions on possible worlds this says we're going to restrict our probability distribution which is like our prior to a posterior just those worlds that are consistent with Jack beating in one match Leo okay um and then similarly a qu to a query statement like how strong is Jack turns into this quy which is which the problemistic programming language then basically evaluates it draws samples from possible worlds consistent with the condition and generated from that prior on the left and and checks what the strength is and then just kind of counts up those distributions so it's a kind of probabilistic mental simulation the same effectively the same or a generalization of what we were doing in like those intuitive physics examples and from this one piece of data Jack beating Leo you can see the posterior updates from the prior so Jack is stronger than average if I say oh Jack also beat Alex or proceeded to claim victory from Alex that turns into another condition and that updates the posterior even more so now wow Jack's a lot stronger than average um even working as a team Leo and Alex still could not be Jack wow now Jack's even stronger when you add in that conditioning statement so each again the where the llm is coming in here um is it's just it's just adding in the the the statements into the language of thought and then we're running inference in our protic programming language to give these answers um it's worth and this is again especially if we want to understand what I think is um the way forward to thinking about how meaning in language Works in humans and where llms can come into the mix what's what's really powerful about neural language models here as ways to parameterize a meaning function is the ways that they can pick up on statistics context pragmatics metaphor um semantic associations all the things that in many ways were most appealing in connectionism like distributed content addressable uh associative memory and distributed representations of graded semantic associations so for example the llm will translate Jack one against Leo which into this statement which looks like a fairly transparent semantic parse of that natural language but it'll make basically the same semantic parse of a sentence which on its surface um and and in traditional snta know language struct analyses looks rather different right the syntax here is more complicated I'm not actually using the word win but in context the relevant aspect of meaning for Thought here is the same namely this is just another way maybe more poetic way um or dramatic way to say that Jack won against Alex and the model knows that automatically doesn't have to be specially prompted or trained for that it's using its associative memory properties it's also distributional these are probabilistic models not on worlds but on strings and in this case they're probabilistic models on strings in our mental programming language so they can bring into you know bring in classic Notions of vagueness if I say Jack is strong or very strong I'm not telling you exactly how strong Jack is but you might interpret that as a distribution on different condition ition statement saying well Jack's strength is is greater than some threshold but I don't know what that threshold is but it's probably pretty big remember this in this case the mean is 50 in the standard deviation 20 so 80 is you know one and a half standard deviations above the mean and the idea is this is again like in scalar um adjectives it's basically saying I'm conditioning on the idea that Jack is greater than some threshold but I but I could be uncertain about what that threshold is and even metaphorical things like if I say Jack is pretty strong and Ben is a beast right in this context it's reasonable to interpret Ben as a beast as as being saying Ben is really really strong you know 80 where Jack was maybe 60 or greater in a different context if I'm talking about you know um whether you should date someone and I say Ben is a beast you know then I'm probably saying stay away right um so that both metaphor sort of context sensitive metaphorical interpretation is the kind of thing that this meaning function is very good at um I'll mostly just skip over this but in work with Ben Lipkin and and uh Leo and Gabe and others we've we've done and this is really all Ben's work some very nice quantitative studies um of show showing that these contextual aspects of understanding of strength um in in the kind of tug-of War context can match very nicely not always but in in a lot of cases very nicely with human judgment so it's suggesting that the distributional aspects of um meaning that we're capturing here are at least somewhat it's not Psych logically real at least consistent with the judgments that people make okay so so but mostly what I've done with this example is just to try to show you how this kind of framework can work to be a way of implementing what is in some sense a classical idea that language isn't the directly the medium of thought at least the only one but it's a way of expressing and communicating thoughts internalizing and externalizing them okay and that statistical distributional mechanisms can be a powerful way of learning that the the mappings from the signs and symbols that we externalize to the to an internal compositional structured language of our thought now a lot of what goes on in this paper and I'll just show you one or two examples um such as work with SED Jang um in an intuitive physics domain or some intuitive psychology work with lanang and Shen or Toni Shen is showing how we can take the same kinds of things that I showed you in the first part of the talk and effectively reconstruct them but now in a world worlds that are just described by language so imagine take that task of the red and yellow thing where before I was showing you the image of a scene and then I was giv using language to describe a question but what if I just use language to describe the whole world right so I didn't show you an image I just described what's there you know in many ways but but but we're still going to do a a a mental simulation a probalistic mental simulation in a physics engine there's many cases you know in our daily life where we use language to describe the physical world as we experience it it can be extremely expressive very complimentary often to you know images or photos um it has vagueness and uncertainty and that's interesting so we might describe a scene that like imagine a table and there's some blocks on it there's some red blocks in the center there are many tall stacks of yellow blocks on the side of the table if the table is bumped hard enough and so so we ask the same question but for scenes that are described in language and then people make a graded judgment the same one to seven graded Judgment of is it more likely to be red or yellow blocks and so for then we can test this again the same kind of quantitative study with large number of participants all online all just reading stuff um with different kinds of language expressing exact as well as approximate number uh approximate quantifiers logical quantifiers vagueness like the stacks could be tall or very tall and and different kinds of spatial relations and across many different stimuli we we mix up more or less complex sentences using these different kinds of language and in each case we're using again a relatively small llm to translate sentence by sentence into statements for conditioning and querying in our problemistic language of thought then we run a small number of mental simulations in this case in a 2d physics engine compute the outputs and compare those with people and what I'm showing you here is again the same kind of scatter plot I showed before but on the vertical axis are are again are human judgments and on the x-axis now are are the model or the predictions of this language informed thinking model so it looks a lot like what I showed you before these are the judgments from the balad all work on the red and yellow task we're not where here the model was given a visual scene people were given visual scenes and the model did that same kind of probalistic mental simulation in the physics engine but in the in the current work on the left again the scene description is constructed from language using the tools that I've shown you by conditioning a prior on scenes okay and the main point is just that the models fit human judgments pretty well in both cases and about equally well right which is which is interesting um we can also compare with just a sort of zero shot or a few shot B line llm so these are language models which don't have an explicit mental model of physics or any doing explicit simulation and they are are much worse fits uh to people okay this is this on the what the high up here on the y- axis is distance and distribution from Human distribution of responses to the various models the blue one is the rational meaning construction model and you can do a similar thing in the intuitive psychology domain um like for example in the settings that I showed you with Julian har edinger's work um this is work that uh Lance ying and Shen did I won't go into the details but again we can describe worlds with various uh goal objects that an agent might have and constraints like you have to go through doors the doors could be locked you have to use keys there could be a red key that could unlock the red door and so on people could be told either that you know you need a key of the right color to open each door of the same color or you could be told weird things like Keys only unlock doors of other colors okay um you know in these kinds of uh uh worlds again the rational meaning construction model does a very good job of capturing people's judgments for some kinds of judgments easy cases even GPT 3.5 does reasonably well GPT 4 does better but as the situation gets more complex or unusual things change and break down like especially if we say well in this world Keys only unlock doors of different colors that's an easy thing to say to somebody and you have to change your mental model but when you do something like that gbd 3.5 becomes anti-correlated with people and gp4 just go drops from being highly correlated to just being a chance so these are examples of the ways in which you know going out of distribution from our training experience if we're just using a machine learning function approximation approach which as remarkable as those systems like gp4 are that's what they're doing um but ways in which if the function approximation is just approximating a much more modular translation function and building on our mental model tools we can just do much better okay um the last thing I'll just talk about very briefly is you know what is any work like this has to raise more interesting questions if if anything like this is right on the right track then it's it's not answering at this point the most interesting questions but just raising them and perhaps highlighting ways you might get at them so in all the work I've been doing in in for a couple of decades now in probabilistic inference in mental models you start by showing how you can do inference with a mental model but then you have to ask where do you get it from now learning from experience is one thing and I've done a lot of work and others in our group and many other colleagues a lot of that book I mentioned with Tom Griffith and Nick Chader is about how we can learn using hierarchical bays in prob programs learn abstract programs by doing inference over the space of programs to make sense of our data maybe even small amounts of data but much of our learning probably the most powerful form of human learning comes through language more abstract generic language as many folks in cognitive science cognitive development um have shown so the power of language not only to update your beliefs but actually to give you new world models is really incredible and that's probably the most that that's the real human Singularity there right is the ability of language to let us learn and think about situations that we haven't directly experienced I mean think about the tug-of warar for example I mean I don't know if you're like me I've maybe done one or two tug of war games in my life most of my knowledge about tug of war or my beliefs at least don't come from my direct experience they come from things maybe people told me about analogous situations and and more generally many of our mental models and our intuitive theories come from what people tell us all right so the same approach that we talked about for updating belief from language can also be used to acquire new mental models but here what's going on is we're we're modeling how somebody might explain to you the way this tug-of warar works and in fact in our experiments with humans in the in that Toby gersberg and Noah Goodman and and colleagues did this is exactly what we would do we would tell people um about how this works people have various strengths it can vary from person to person and then now we're using the llm to translate those S sentences in English that are generic sentences about the domain describing the world model we want our participants to use into the same kind of problemistic program code but now these are defined statements so these are not these don't condition on a spe a specific world but they Define the General Distribution on worlds but again they're contextual they're distributional there could be different ways of understanding the vagueness in language but the basic idea is that we can describe a world model in English and then the code llm can construct that world model so all the defined statements now are con structed they're not exactly the ones that we used in the original paper but they have the same functional role and they can support the same kinds of inferences so it's just just showing you the way we can I think go towards what are the most interesting ways that language informs our thinking not just as a way to convey specific beliefs about situations but new but New World models and in some re most recent work that's still in progress with Tyler Brook Wilson uh Katie Collins and a number of the others I mentioned here Tyler is a is a brilliant philosopher who recently graduated from MIT and he's actually just just accepted a faculty job at Yale so he's he'll be at Yale in a in a year or so um but together with with Tyler and others and Tyler's thesis goes into some early stages of this we've talked about we've been exploring the ways that the same approach can support constructing new models even if I don't explicitly tell you in language how the world works but just by again using your associate of memory and marshalling implicit knowledge to construct a model of a new situation so I could tell you about a relay race and I'm not tell that relay race is like a new domain I'm not really telling you how races work I'm just giving you some information but we're exploring ways in which the llm can can be queried to construct possible background knowledge and write problemistic program code that can be suitable for reasoning about this domain just on its own the llm isn't enough but you have to do some reasoning about the models that it suggests and then those models when suitably uh reasoned upon can support novel reasoning in this domain and even sensible updating so we're really getting you know at least demos of steps towards computational models that can capture the richness of how we are able to think about new situations even ones that we haven't really thought about very much before or that we haven't been explicitly told how to think about but using the combination of uh language the associative knowledge that's in it and an underlying ability for constructing probalistic models of the world and updating um the last thing I just want to leave you with is a set of thoughts about you know ridiculously it does have to be the last thing it is it is the last thing um but I think this will set up some of the discussion with uh with Virginia and others too which is many people in cognitive science whether in linguistics or other areas engaged in meaning have been interested in what you could call you know a unified account of meaning and what we're we're trying to at least Point toward steps towards this with the the framework I talked about um just to just to again raise controversial points for discussion if you like um but the idea that we can capture the meaning of a word in context as well as more generally as effectively it's you might think of it as a form of dynamic semantics if you're familiar with this but the meaning being in in context in a discourse the incremental contribution to the probability distribution over plot expressions in the problem that we're thinking about or the problem of discussion and the meaning of a word or phrase or sentence or other unit of language in general is a is a higher order stochastic function that can take as input a discourse context and return as output a meaning in context and the idea is that if we think about the different approaches these are just four you know traditional ways of thinking about meaning and language which all have great value and have often been seen as being competitors ways in which we can really bring them together ways in which the plot ideas that we've talked about can integrate The compositional Logical aspects of meaning that formal semantics and other areas in the language of thought tradition have emphasized as well in the context of probabilistic language of thought and vental models of the world can give a powerful form of grounding that's not grounded in sense data but in our models of the world that's what the plot does and the llm or more generally statistical distributional sequence models can capture both the distributional statistical aspects of meaning it's the base the the the um both the sort of distributional usage approach but also more General semantic Association needed to make sense of language so flexibly as well as some of the very flexible pragmatic communicative ways we use language so I'll just leave it at that okay okay could you turn off your share please yes first of all thank you very much there's not a chance in the world that I'm gonna forget Virginia this time and I'll tell you why because I'm gobp at how many different areas you're an expert in and so anyway here's another one uh Virginia valan from uh cuni Hunter it's all yours thanks Josh that was such a great talk and so rich um in 10 minutes which is how much time you and I have will only scratch the surface um so departing from the comments I sent you um with respect to the um unified theory of meaning um many years ago uh jerk had suggested that the question what is meaning could be separated into questions like what is sameness of meaning what is contradiction what is anomaly uh what is entailment and so on and it occurs to me that it would be interesting to try this uh to try your model to see just how well um it can detect uh synonymy contradiction anomaly and so on yeah I think that's um that's a great connection I mean I I know a little bit about that but I I you're inspiring me to go back and reread and learn much more about it I mean I think just in the context of the last thing I said the idea exactly that in a sense in this framework at least what the statistical language model is doing is capturing the notion of sameness effectively because there the Distribution on code in the problemistic language of thought if that distribution is similar and that could be measured in different ways then you might say well things have the same meaning either in context or in general because and that that distribution function can be contextualized or it can be made higher order okay so that's really interesting but it doesn't compute entailment or other you know uh conceptual or inferential relations those from reasoning in the problemistic language of tha um and yeah I mean I think that's that would be great to explore that more see if that can account for unify both account for the different set of phenomena and unify in that sense yeah I think that would be really interesting to explore um so going back to um some of the more mundane um which I guess is a pun um aspects um the way that I'm understanding what you've um said about how the theories intersect that is how llms intersect with basian models is basically you're using the llm as a kind of tool to translate from one vocabulary to another vocabulary is that accurate yeah I mean and that is at the most practical level that's what we're doing um yeah okay more about that but let's let's go with that for now yes okay um so the critical difference that I see in what you're doing and what llms are doing is the well I guess there are two critical differences but for me the most important one is the use of symbols um the other part is the basian um mechanism but let's separate them so when you put symbols in to me you're putting a lot of content into the mechanism do you agree with that um I agree that adding symbols adds a lot of content but I think um and I again I didn't have time to unpack all of this very well I think but when you say you putting in symbols the in different different things that I'm talking about here the U and the putting in are different but yeah whenever you add in symbols that adds a lot of uh content and structure yeah content right so I'm thinking about the implications of that for nativism so it seems to me this is a nativist Theory um I would say it is compatible with some forms of nativism that I find plausible although not I don't have a horse in that game but I've through a lot of interactions with friends and colleagues like spy Susan KY many others come to find certain kinds certain aspects of nativism in conceptual nativism plausible especially in certain core domains that are shared with other animals like intuitive physics about objects that the world is three-dimensional objects have some that there's some kind of physical interactions um that our you know our bodies engage with and also some forms of intuitive psychology not necessarily higher order belief reasoning but the idea of efficient action and that agents have goals and they pursue actions in the physical world grounded in physics to achieve their goals efficiently honestly there's evidence um again Sher Lou Who did this work in Spell's lab and continue to build on it has really you know I mean I was say striking evidence in three-month-olds but it's every experiment with three-month olds is very small and you know a lot needs to be built on but as striking As It Gets In three-month olds uh science um showing that that aspects certain aspects of not only physics but efficient goal directed action understanding seem to be present that doesn't mean they're innate but at least they're not they're present way before language and probably build on some innate stuff so it's compatible with that notion Sor um uh wait um I'm not sure who that is but could you mute that was yeah that was an accident I think they just should turned off okay but so so yeah there's this idea that certain kinds of symbols are used in our framework to describe those some of those core systems but we are in contrast to like a Fodor nativism which says all concepts are innate in this in this framework here I mean or you could say in contrast although some of uh I I don't know what Jerry would have actually said about this but people like Paul petrovsky have suggested you know maybe he would love it I don't know but in in a rather different version of like some forms of radical conceptual nativism most of the Concepts in this framework are not innate they're written in a language of thought that it that could be somehow innate or somehow bootstrap through natural language I find some of the ideas that that uh Susan KY and Liz balky in their different ways along with Jesse snaker have developed their ways in which language acquisition and mental languages of thought might bootstrap each other and many things in the kind of Glen tradition also I think are reflect that idea but I so I I think we we we are we we still need to show this this is like the most interesting thing to do but to show ways in which the things I was showing at the very end could be could be used to explain how and model how natural language can start off being grounded semantically and logically in a limited symbolic vocabulary of probable worlds that reflect core knowledge but then that can support bootstrapping and introducing new Concepts via those mechanisms I was talking about at the end including new Concepts but also new domain theories that we get explicitly or implicitly through our linguistic interaction with other people who we think know more than us okay um that sounds great um it also suggests to me that it is a highly modular system even though it's also probabilistic um so it's probabilistic within each of these different modules and depending on what you think the symbols are that you start with you can iterate what you think the modules of the Mind are yeah no I think that's right and I think you know I think it's this framework is not a on its own a proposal for how our minds start but it can be used to in instantiate and build and test some of them and so that's I think that's right it it does suggest that you could have you know different sublanguages of thought for different domains but crucially the picture that you get with adults from this is both in some ways like strikingly super modular and in some other ways completely holistic so the Striking super modularity is that the actual reasoning that you do in in a discourse when I'm thinking about a situation and we're talking in a conversation is very modular if it might even be just specific to this context this what what Tyler Brook Wilson calls bespoke model construction this idea that we might construct a model on the fly to think about a particular situation um that that we're that's that last thing that we've been working on there is in some sense super modular because that model is is is relatively small and all the inference I'm doing is just here so I don't so I sidestep or avoid the classic problems of what has made basy and inference intractable the idea that if I'm going to actually have a distribution over all possible worlds I could think about it update that you know that's completely intractable many people in the nativist tradition Dan asherson and others have written about that and this this is a a way around that or as Tyler puts it in his thesis it's in a sense kind of way to think about a solution certain kind of solution to the frame problem so it's super modular but it's also very holistic in the sense that the World Knowledge that's used to construct that comes from like all the code you've ever written and all the semantic associations and it's like a gigantic holistic almost quinian web of language and code what we've sometimes called the GitHub in the mind view um there's the game engine in the head the GitHub in the mind like GitHub is this you know thing on the web which was crucial for training language code models and you can imagine your own mind has lots of chunks of code some of which are maybe innate many of which are not and natural language inter weed with it and and having a Content addressable associative memory that can use that and and Marshall out from that um relevant symbolic probabilistic models that can be used to reason about a particular situation in some sense that's extremely holistic and maybe is necessary to Grapple with what isn't you know very clearly some of the holism of human cognition um so going back a few steps um the intuitive physics part um in principle animals could do that as well right yes so a chimp could have intuitive physics yeah and Joseph call and others have studied that um a number of other Amanda seed and number of others had studied other nonhuman primates and we're actually collaborating with cartmill and some others studying non-human primates on intuitive physics and I know colleagues uh who are studying analogous kinds of intuitive physics in rats um and um I think even simpler organisms okay so the difference between humans and other animals is going to be that language allows you to go further language allows you to go beyond intuitive physics whereas nothing is going to help these other animals go beyond intuitive physics yeah I I think that's right I mean again I wouldn't say that all the all the mental models of other animals is just intuitive physics they also have models of their social world like you know the chenian Saar baboon metaphysics is mostly a social theory but yeah but I would say definitely that the key thing here and it's the real human Singularity I would say um is that language allows us to construct to both enrich our intuitive physics to think about aspects of the physical world that are not initially intuitive to us and maybe still aren't but also to construct things that just go totally Beyond any any core domain that Evolution gave us right okay so one question that this brought up to me because what you're talking about are all the ways that humans are so good at what they do um but then there are ways that they're not so good at what they do and far trans is one of those examples um and there there are some things that are hard for people to learn or some places where it's hard to get from a to z um even though you think you've learned a so there's a famous little story about someone going to their logic teacher and asking about um understanding if a then B and the logic teacher spends a lot of time on that and the student says okay I think I get it could we try it with r and s now um so that's something that just shouldn't occur but it does occur uh even with smart people so like when you try to teach people about experimental design it's hard for them to see confounds often um it's hard for them to get at what's wrong with some experimental design and it's not that they don't have a general intelligence it's not that they don't have some principles but it's as if what they've learned is just too far away as far as the the string of examples goes for them to get to to the next one so I'm wondering how on your system those kinds of limitations would be modeled yeah no that's a that's a great point a great question and a great pointer I think to work that um we could and should do more of so the the stuff I talked about at the very end which is again this the work with Tyler and uh Katie Collins and lanang and and and very much with Leo Wong also um is what we're what we're getting at there like is is again how we can describe a situation in language maybe even very implicitly just start talking and then see if see if this architecture can be used to construct a mental model needed to reason about that situation and you could also make it better more robust if I explicitly give you instructions as in the educational context you're talking about if I try to explain to someone logic or experimental design actually another collaborator on that project said Jang is actually very interested for his thesis in how we learn like logic and learn to reason through language but so we we have you know our framework is providing possibly a way to do that but in order for it to work at least in the current system we do the kind of thing that llm folks are generally doing these days which is some kind of few shot prompting we prompt the system with examples of other mental models and language for describing them in related domains they could be similar or they could be further and what we already can see right is I me the the interesting thing is can you generalize to new domains so our system is somewhat able to do that but you know this is a place where near or far transfer would be relevant at least so far you know as you might expect you have to have some domain that's at least reasonably close and for example we're trans we could transfer from like a tug of Vore to a relay race or to some other sports setting um or from a couple of sports settings to yet some new sports um or you might transfer from a sports setting to like a math competition or to some other kind of thing so really interesting ideas about abstraction and metaphor um analogy some of the kinds of things that the the transfer literature has studied are going to be relevant here we're just beginning but what you're pointing to is actually some set of phenomena and things we should really engage with and I expect you know at least based on our current system it will definitely struggle sometimes with far transfer um which be which might be because at that level you know I I I I drew this contrast at the very beginning between the pattern recognition approach data driven learning and the mental models approach but at that level we are doing a kind of pattern recognition it's just not patterns in the world it's patterns in thoughts and thought structures and ways of expressing them and you know at that point if if if that kind of really interesting but much more abstract kind of pattern recognition is what's driving your ability to construct new models of new situations or at least initially before you've had formal instruction or when you're just at the beginning of for instruction then we should expect that it it to have the same kinds of fragility with far transfer that any machine learning approach does okay sorry to be Gil jooy here we have about 10 minutes left but we have at least four people that want to ask questions so what I want to suggest to Alina and Julia and also uh lein and Stefan Carlson is to raise your hand again and I'll recognize you in the order that you raise your hand okay Julia go ahead hello um so I was wondering I was thinking like maybe you can use the um the llm translator plus the probabilistic models that system to um model how the beliefs of a reader would change over the course of being told a story or like reading a book and then you could look across genres or books or whatever to kind of get a sense of what the kind of model of those stories being told since the stories are themselves a world model what that would be and that you could incorporate because if you have a guess because something that you didn't like touch on in you know in your acted talk which I'm sure you didn't have time but was that like if you have a guess about where you are in the trajectory of the story you expect then you have a guess about about what you think's going to happen next okay wrap it up because we um so I was just wondering like do you do you think that's a great question um let me just try to answer it really quickly yeah uh Leo Wong would love that question because in addition to being a great cognitive scientist they are also a writer they write stories and even novels and one of the things that we've been working on is little mini like threea structures and things like that that um follow either classic narrative structures or other things and and you know maybe not surprisingly those are places where we see a big gap between human story understanding and llms even the state-of-the-art ones but exactly yeah we're trying to use these models to capture how that kind of um you know journey of understanding might unfold and also even how a Creator might create it so another student that we've worked with cartic Chandra has uh had had some work at last year's kogai conference on storytelling as inverse inverse planning and the idea that like if somebody's understanding another character's journey by doing some inverse planning as in those theory of Mind models then a storyteller can try to invert that inverse planner to to convey the emotional or mental Journey that the character has and that's a way to use this toolkit for both story creation as well as story understanding so again it's those are mostly promisory notes but great um question and future research I think from cark and Leo and others will address it thank you thanks hi I'm gonna ask a a question in person Josh this is Eva hi I I yeah go for it I I I really liked your talk by the way I watched it on video yeah cool great okay so yeah I was gonna so following the work that you were showing where you were basically based also on what you were saying in the in the um question period where you were you know few shot prompting these language models to produce probabilistic language um world models I guess I my I have sort of a more meta question which is do you see that as just a way to sort of help stepbystep reasoning in these models or do you think this is sort of evidence for early sort of possible World modeling being learned in these models um I'm not sure if I see it as either of those um I I mean uh so I don't think of it as either doing step step-by-step reasoning in language models or evidence that they do possible World modeling it might be relate I mean there is you know obviously as you know you talked about some and others have you know there's certainly um a lot of evidence that llms if you try to just use them as end to-end reasoning systems they can benefit from step-by-step by reasoning and there might be some emergent World modeling capacities but you know again I look at that and I see a really interesting and mixed pattern of successes and failures and depending on who's writing the paper and what their agendas are you can highlight the successes or you can highlight the failures and the gaps um the to me the an objective perspective is that it's patchy and fragile although extremely impressive and really interesting so the way I think of what we're trying to do is to say yeah there's there's you know various kinds of interesting approximate implicit knowledge that those models have that can be used sometimes to succeed in reasoning complex sequences of reasoning or World modeling but I think a more robust way to use it is the way we're using it a more both a more human-like and a more robust for AI way to use it which is to which is to condition and construct these bespoke World models um that you know where the long chains of of sequential reasoning or actual um you know uh coherent World modeling are there by construction um but you know there are other limits like again as as I was saying you know basy and inference in really complex models um is very difficult and I don't think people do it there's a lot of evidence they don't do it but they seem to do it very generally in in just the right small model at least in cases where they have the relevant World Knowledge to construct those models and when they don't they don't right so that's all you know our minds have mixed patterns of successes and failures they too but I think this toolkit is better matched to the mixed patterns of successes and failures which is what what I'm trying to get at if that makes sense the the the one that we see in humans as opposed to the the weird head scratching like super intelligent cases in some places and then super dumb cases in others that you just see in a pure sequence model I'm told Alina can go next thanks for your your answer yeah thanks a question Alina pick it up or you lose your turn uh yes um hello thank you for insightful talk um Josh so um you said that um and correct me if I'm misheard that language allows us to construct meaning that is totally out of bounds of evolutionary game um is if uh if this is what you said I think this is ett profound so um yeah I'm sure I'm not the only person who said that many people sure okay it's just a reminder profound truth yes okay uh so the actual question so um you mentioned that the meaning of the world is constructed contextually and incrementally so could you please elaborate on how this process occurs and how it impacts our understanding of complex concept so so the transition between the um from incremental construction of the meaning to generalizations right so the way it works in our I can just I can tell you how it works in the models that we've built so far and again this is not to say that it it's exactly like this in the mind but I think it might be something like this and a lot more work needs to be done but the way it works in our models is something kind of like what you're familiar in a CH like if you've used uh chat GPT or other conversational AI systems the way it's working is that basically there you know if you're used to using chat GPT you type something it types something back and in the middle some Wheels turn and then you type something more and it Wheels turn in the black box and it types more stuff okay so it's it's basically like that in that you at each each sentence is translated into some expression in the language of thought roughly at the sentence level and I think sentences are real units of meaning like I I am very um respectful and admiring of many insights from language including that words are real and sentences are real and there's real syntactic structure to be understood there that is only being approximated in some ways by these models but the key is that yes there's a process the contextual process is like at the discourse level you go sentence code sentence code sentence code and each sentence to code translation is conditioned on the previous conversational history or the discourse that that you have been interpreting okay um that's just a first approximation just another thing you might want to do and surely you'll have to do is go back at edit previous code because if you realize oh I misunderstood something I can't just add new code I have to go back and edit the code that I wrote before which is also something that code llms can do I'm not saying they'll do it right but there's other processes that involve like checking and refining and fixing models that are written in this way but maybe that gives you some sense of the contextual dynamics of how language is understood in this model last question Le Jin uh hi um uh Joshua thank you for for the presentation it's a very interesting um I have a engineering uh background but I'm I'm not expert in uh artificial intelligence uh I have a few questions um from uh your presentation I understand that um actually llms um the uh the the it's it's kind of a gold M of a human uh knowledge and you seem to say that they have a modelized the real world facts and the Logics quite correctly um if you ask the question directly in human language uh currently they don't seem to to give the good answer uh because some with some lack of algorithm but if you first translate the uh human CS uh into a functional code like the the the work you are doing and then they are capable of um and we execute the code then we we are capable of having a res Resorts that are quite close to human behavior uh so is that the correct yeah that's I mean that's more or less correct but I don't want to I mean and that's that's what we've been doing in these examples and what I've been showing but I don't want to claim I don't I don't think it's true but certainly I don't have the evidence for it being true or how true it is that um you know llms like always do this right there's a lot of um ways in which language is only an imperfect reflection of the ways that we think there's certain things that again are partly based on evolutionarily Ancient core systems the language is not very good at expressing um and that includes also spatial reasoning and you can see ways in which even the best language models break down here and language multimodal language Vision models uh have have had persistent problems always from the beginning there so I think you know I don't want to suggest like I think yes language is a treasure trove of knowledge explicit and implicit about the world but if and for us it's it's such a valuable Resource as human beings okay that's why it's no accident that language models when they're trained to capture patterns on all the language that Humanity's basically ever produced and put out there on the web uh you know start to have remarkable properties at the same time it's only some parts of our knowledge about the world it's Key Parts to actually understanding and being in the world that basically nobody ever talks about and um and even if they do talk about it they talk about it very incompletely and imperfectly and yet our brains are designed to understand in those terms so I I I don't want to convey the idea that somehow it's all there in language and it's and that's all you need by not by any means wonderful before we uh let people applaud you know there's some several panels coming are there any of them that you can join not today unfortunately yeah um let I need to check my calendar and coordinate on some family things and I will I'll I'll try very hard to join one of them okay no now we'll applaud you for it thank you thanks and and thanks so much for uh the discussion Virginia and all the all the questions they were great questions so I hope to engage more um in the one of the panels and um in person if if anyone he's