Machine Creativity: Spark or Fizzle?
Episode Summary
Is creativity uniquely human—or can machines share in the spark? In this episode of The Emergent Podcast, Justin Harnish and Nick Baguley are joined by Chris Brousseau to tackle one of the most intriguing frontiers in the Age of AI: creativity itself.
Together, they unpack the messy, magical, and sometimes mechanical ways that ideas emerge. From “innovation voids” in machine learning to the golden goat thought experiment, the conversation explores how humans remix and recombine concepts—and whether large language models are beginning to do the same.
Justin, Nick, and Chris debate whether AI’s “creativity” is novelty, derivative recombination, or something that could one day surprise us in ways we can’t yet measure. Along the way, they draw analogies to quantum physics, protein folding, and telescopes for the mind.
What You’ll Learn in This Episode
- Why creativity is so slippery to define—and why that matters for AI.
- The concept of “innovation voids” and how machines might someday fill them.
- Human imagination vs. machine recombination: is one more “authentic” than the other?
- How analogies, metaphors, and mistakes drive breakthroughs in science and art.
- Why generative AI might be our James Webb Telescope for the mind.
- What it means to co-create with AI—and why the future may be about collaboration, not competition.
Books & Ideas Mentioned
- Programming the Universe – Seth Lloyd
- The Stuff of Thought – Steven Pinker
- I Am a Strange Loop – Douglas Hofstadter
- AlphaFold & breakthroughs in computational biology
- Innovation benchmarks like Kaggle challenges
Key Takeaway
Creativity isn’t a bolt of lightning from nowhere. It’s a dance of patterns, recombinations, and leaps into the unknown. As AI joins the dance, maybe the real story isn’t whether machines are “truly creative,” but what new things we can create together.
Transcript
Nick
Welcome to the Emergent Podcast. Today we're exploring a topic at the intersection of neuroscience, artificial intelligence, and leadership. Cognitive flexibility. This concept is just as vital for human decision makers as it is for the next generation of AI systems. In essence, it's the mental agility to adapt when things change, whether that's a business pivot or an AI model adjusting to a new problem. I'm here with my co-host Justin Harnish, and we're joined by a special guest, Chris Brousseau, a leading expert in language and audio AI. Together we'll unpack what cognitive flexibility really means, how it shows up in both human and machine intelligence, and why it's becoming a core leadership capability in the AI era. Chris, do you mind introducing yourself and telling us a little bit about your background and the key areas that you focus on?
Chris
Not at all. I'm happy to. So, I'm Chris Brousseau. I am currently the VP of AI at Vioxx, where we're dealing with artificial engineered intelligence. Essentially, we're algorithm researchers. And it's really, really fun to have this great, I'm going to call it an intersection of some of my specialties, where I got my undergrad in linguistics, and I've continued to study that since. And I specialize specifically in language modeling. How do we represent human language in the language model? And it goes much deeper than just transformers or autoregressive behavior.
Nick
Awesome. What an exciting time to be in that exact space between large language models and every advancement out there, right? Yeah, right. Yeah. Well, let's start with the basics today. You know, what do we mean by cognitive flexibility? In neuroscience, cognitive flexibility refers to the brain's ability to shift mental states or really to take on that context shifting. Or in data science, machine learning, large language models, we call it set shifting. It also means to be able to adapt to new rules and consider multiple concepts simultaneously. One of the core executive functions that we rely on day in, day out is this cognitive flexibility, especially as we move into leadership roles or we have complex workflows within our own daily lives or as we're trying to work within relationships. And many of these times, this kind of mental gearbox that we have to shift and we have to think about different tracks really provides the opportunity for us to be able to handle complex situations and thought processes. It's one of the key areas for researchers to study around large language models today and to think about adapting as we move forward. Not only is it a new and relatively emergent behavior that these models are able to really achieve, but it's also one that really gives us insight into flexibility in workflows that we see that might be knowledge workflows, that might be more complex workflows, or really allowing for that context shifting. And many of the latest models that are being released right now, like Grok 4 or even this month within this last week, GPT-5, as well as its open source version, really have started creating ways to be able to work across this cognitive flexibility and allow the models to be able to shift and right out of the box be able to work with tools, be designed specifically to be able to handle reasoning, thinking, other different processes that allow the models to be able to be able to be able to be able to be able to be able to them to take on those additional tasks and be able to shift very quickly at the speed of business in the ways that we need to work in our daily lives. So Chris, I'd love to pass to you first, and then we can come to Justin as well. But, you know, I'd love to hear some of your thoughts in general around cognitive flexibility, around some of the things that, as we describe it, what are the core things that really stand out to you within AI today?
Chris
Okay, so I'd actually love to start with human language, which I think both of you guys are also familiar with here. The line that we want to draw between cognitive flexibility and AI, I think stems from a couple of principles. The first thing, this is from Saussure. Language is inherently abstract, right? It's representing abstract feelings that we have inside our heads before we ever translate it into language. And the flexibility here, I think, like the comparison that keeps coming to my mind is Chomsky, right? He wrote about transformational generative grammar, a set of rules that expands and contracts on the fly, which we use to translate those thoughts into language. And I think that that is where this flexibility comes in, where it has to be able to expand and contract as someone is speaking, as someone is writing, as someone is expressing language. And AI at this moment is trying to solve this problem. Продолжение следует... Продолжение следует... Продолжение следует... Продолжение следует... Продолжение следует... Продолжение следует...
Justin
So a lot of the research and maybe a brief history of AI time here takes us through where, you know, these LLMs are, you know, paying attention to the next word, the next piece of information that they need to fill in the Mad Lib. They're doing this at the speed of light, and they're doing this across hundreds and hundreds of dimensions on the fastest chipsets that are ever been known to man. Just recently, the history of AI has brought us to more of a chain of thought. So following along some logical paths in order to do broader research spikes and is more adhering to principles that have been put in in order to add more structure to thoughts and reduce the illogic that might be coming out of these. And, you know, with agents and crews of agents that can align themselves to chain of thought, you maybe get a hive mind that's capable of reasoning. So in a traditional, philosophical, rational theory of mind, you get from language to logic to reason pretty algorithmically. But a lot of the folks talk about almost an Elan Vital creative spark that requires something more. And so the question and some of the feedback from neuroscience is that, you know, that creative spark really comes from this neuroplasticity, this movement of actually physical movement in the brain of neurons to make broader connections. And so just, you know, curious if we really feel like this is a distinction from where LLMs are now. Do we anticipate that they're going to have to rewrite the way that the model works or the embeddings where they're placed? Go with some hallucinatory embeddings that don't make sense for the next word? Or is this already happening or LLMs already created?
Chris
Jeez, that's a huge question, man. So yeah, I can give you some opinions here. I can tell you first off, following chain of thought as it was being developed is incredibly nirky. If you start looking at what chain of thought actually is, it ends up being a little bit of a combination between some probability manipulation through top K, top P, and through beam search, aligned with some rhetorical prompting strategies.
Justin
There's a lot in there.
Chris
And it doesn't go much deeper than that. Those strategies are like describe the solution step by step or starting the very prompt with a more elegant solution would be stuff like this, where basically we have found that some of the best quality language data that these LLMs trained on starts with these type of tokens. And so we can get higher quality language data out of the models autoregressively token by token if we start our prompts, or whether that's a system prompt or a user prompt, likely both of them with these types of things. So I think actually autoregression, the idea that we're taking a token by token is getting in the LLMs way of actual reasoning, right? If we want to talk about cognitive flexibility, that one of the main things we can point to for people, as far as that spark goes that you're mentioning, people have emergent developments of neurons in their brains, right? But I'm not the neuroscience expert here. I think that would be Justin for explaining why that's important. Why is it important that we have different connections, even though we have similar centers in our brains? LLMs don't have that, right? They have a semblance of it. They have a simulacra of it. But the main thing that they're missing is the pragmatic application of it, right? What this emergent spark can do for us is give us all of the ability to put context around our language. That way we can assume things. We can say like, what's a good example, right? He sat in the chair. I didn't have to tell you who he was because, you know, I'm talking to you and there's only one other person here. And so it must refer, he must refer to the other person here. What if that's, if I'm talking to Nick, that's referring to Justin. Justin, if I'm talking to Justin, that's referring to Nick. Right. This gives us the ability to solve semantic issues as humans that models don't have the ability to approach through anything other than that creative sampling. A good example of that is if I say like I'm married to my ex-wife, immediately semantic problem. You can't be married and exes at the same time, but there's pragmatic explanations to that that we get from having an emerging brain living in a real world such as how they're remarried, right? A model can answer those questions correctly, but it's just through sampling and repetition of seeing people answer those questions, right? there's no spark behind it.
Nick
Yeah, fantastic explanation. There've been some really deep research into these areas lately, a few that are going into the integrated cognitive skills that it takes where you actually combine that reasoning along with the context management, the models that are able to hand now or handle now and are starting to get into emergent properties based on that large scale next word prediction training or, you know, other tasks that these models are getting trained on at this point. For example, Grok is trained to use tools as one of its core tokens as well before it moves into the additional steps. There is a particular article in Arxiv, you can find it, where it's been archived under the title LLM and Cognitive Science, a comprehensive review of similarities, differences, and challenges. And you'll notice in there that as they talk about it, they describe how multi-head attention allows these models to be able to pay attention to different parts of the conversation and now beyond into different parts of the context that are necessary. And so just like you were talking about there, Chris, they may be picking up, you know, different things that they need to understand from items that they've been trained on in the past, but to actually go on and infer something that really doesn't exist inside of the data set is really typically what we call hallucinations. And it's fascinating that as we talk about creativity and everything else, we're at the same time working as best as we can in the field to limit the hallucinations, to limit that creativity or that imaginative process that a model goes through, where it's substituting something that it has no reason to consider it fact. And I would say this is a very common pattern for humans. It's something that we not only do in our speech, in our interactions with others, but all the way down to our subconscious and even into our sleep. from a neural standpoint you mentioned some of that chris and i was recently studying up on this topic and as i was going through details and having tpt4 really kind of sorted out we really got into the agility around the human brain being able to shift back and forth and a lot of that is rooted in our prefrontal cortex and its networks and when we think about how neurons fire and how really we're able to work across synapses many of these are very interesting behaviors where they may fire from one area of the brain to another, but as they light up, the main area for this type of task that we're talking about falls into that prefrontal cortex and back into the dorsolateral prefrontal cortex specifically. And so as we think about different regions like the anterior cingulate and the parietal cortex firing as those support this overall prefrontal cortex process, that key function that they're trying to accomplish is really about working memory associated directly with the task switching. And as our brain moves through this process, it starts going beyond what the models are able to handle today because as they handle multi-head attention, when we look at them pre-modern BERT and flash attention to and ROPE, they are currently focusing on certain masks that are given to them. Now that we have flash attention to and rope, they're creating opportunities to really randomize that attention and pay attention to different parts of the context, different parts of the sentence, and go beyond the tokens that they were given for one particular section into models that they've distilled additional knowledge from into reinforcement learning patterns that now automatically add additional layers of tokens that are now associated back with that given set of context. And so they start to mirror this process more and more. But as we've talked about today, they don't necessarily have that spark, like Chris talked about it, to let them be able to move into true cognitive flexibility. There's a famous test called the Wisconsin card sorting test, where you sort the cards by a rule. Let's say it's the color or maybe it's the suit of a card, for example. And then without warning, the rule changes. And now you have to sort by that suit or that shape of the card or something else within the card. It now becomes a completely different shift and a different rule. As we go through that context shifting or set shifting, a really healthy brain, especially a really well-developed brain, can adapt to the new rule very quickly. But someone with a frontal lobe damage or even sometimes as we talk about, you know, your frontal lobe not being fully developed when you're in your younger years will actually really struggle with this piece and really stick to the old rule and have a hard time sorting on a new rule as it comes. And as we look at this in leadership, especially in areas like finance, we see that the same thing applies where we often talk about companies that might be a little bit more conservative or maybe too conservative that are trying to go through transformation, that are trying to shift their context. And in finance, the entire world is changing, especially when we talk about capital markets. They move and change so rapidly, and it can be off of sentiment. It can be off of an actual rule that exists out there in the system. Or it can be off of bigger, broader, emergent things like the economy itself that are really difficult to adapt and build into your rules and think about how you're going to pivot as a leader. And so this core skill set that we're talking about really is driven home by your ability to allow your whole brain to fire in support of that prefrontal cortex as it makes these decisions. And with all of this, it's not just the brain structure. It's also the chemistry. So dopamine, actually, one of the neurotransmitters that we're all kind of familiar with, right, is really heavily involved in that reward and that learning process, similar to the reinforcement learning process that we use within models. And it's really critical for that cognitive flexibility. In fact, when we look at PET scan research that was completed recently, we're able to confirm that people engage when a task is given that requires a mental switching better when they actually receive that dopamine boost. When that release comes out from that dopamine, the more it's released, the more efficiently people are able to adapt and complete the new task. And so oftentimes when we talk about reinforcement from, you know, a disciplined standpoint or positive reinforcement, much of that is actually enabling us to learn better. And so finding the ways in your daily life to be able to drive home rewards for yourself, for your team, and within the functions that you serve, whether it's within the company or even broader across the economy, making sure that those rewards really add to that learning will help that context shift. It's what we call celebrating failure. It can also be on celebrating the wins. And we often forget that.
Justin
Yeah, I'm still curious if we think that the spark is real. And let me bring this to the present day for me, because I was watching, so I do my research like we all do now, and hit a model, and do a deep research task. And, you know, I've got my favorite authors that I feel like I understand. And so I plug them in and I say, LLM of choice, go and do this research, given these authors in this topic. And I was watching the activity for the first time. And so if I didn't lead up to it by saying that an LLM did this, but I just said, you know, this is my 20 something year old intern, you know, that I've asked to do this task. And I said, they went off and because the websites were blocked, they were behind the paywall. They went to the Spanish language sites. I would have said, that's a pretty creative solution. That's what I would have said. That's a pretty creative solution. Well, that's exactly what was happening in the activity stream of my deep research. And so is that not a creative solution? Do we think that that was coded into the way that, you know, if you're having problems with English-based firewalls, go to Spanish-based firewalls? You know, is that not a creative solution anymore because the machine can do it?
Chris
I kind of want to take a step back to answer that question because I think it's a really good question. I don't know if that's a creative solution. The place I want to start with this is the almost impossibility of the fact that we get anything coherent out of LLMs. right the first part of that is like take any language i love the idea here is that you know we
Justin
we basically have this uncanny valley of a place where llms are doing things that we never could have imagined would have imagined and we are ascribing once they that you know do something to the old, well, that was never intelligence anyway. That was never creativity anyway. So the question really is, is taking a path in a research task to utilize a different language in order to overcome a paywall, a creative solution to that research task?
Chris
Yeah. Yeah. Okay. Back online. I love this question, first of all, because the word creative there is doing a ton of work. And I want to frame this first with the fact that you can take any language, all languages, even if they're all infinite and they're infinite in terms of meaning. Right. We understand that. But they're infinite in terms of vocabulary, too. And if you're not sure, remember, there is an English name for every single number. So English is infinite, Chinese is infinite, French is infinite. They're all infinite, which is bonkers that we have been able to whittle down probabilities of a language that contains infinite numbers of infinite sets to get anything coherent out of them. So I'm going to reference a recent paper, The Illusion of Intelligence. This is from Apple, right? They showcased that LLMs are able to solve the first several layers of the Towers of Hanoi game, right? And that's awesome. It's very cool. So how do we tell whether it is a creative solution, like you said, where you are following a set of instructions that are kind of known or following a set of possibilities that maybe are not as probabilistic, right? it was able to solve the first four levels. So it's able to solve up to four disks of the Towers of Hanoi. But then it starts breaking down and it can't solve five, it can't solve six, it can't solve seven, eight, nine, ten, so on. It can't solve any of them. And I like to compare this to basically a Rubik's Cube where the LLM is finding creative ways to emulate a spark up to a certain level. but it doesn't have any ability to think algorithmically, which some people call that reasoning. Some people would maybe say that's the spark. I think there may be a little bit more to it myself, but if you understand what I mean, like if you have a Rubik's cube and you can solve that Rubik's cube, but you can only solve it from three pre-known configurations and from any other configuration, you can't solve it. It kind of shows that maybe you're not reasoning through the steps the way that you should be, or the way that should is a strong word, but the way that you would want to in order to solve the task. You know what I mean? So we want to shift LLMs into this cognitively flexible space. And we're doing a lot of things for that, right? You have the diffusion LLM variants, you have some of the LSTM LLM variants, where we're kind of pushing the model to instead of going token after token after token to guessing the full solution at once that this is I mentioned beam search earlier this is helping even with that auto regression to be able to do that where you're guessing the entire solution at once based on probabilities and then re-evaluating as you're going through it it's a good push in that direction we haven't hit it yet though yeah I think that's
Justin
really interesting in another part of this space that, you know, I do believe is our best definition of impactful creativity, which is the scientific method. And so I break down the scientific method into conjecture, criticism, and, and correction. So those three steps, And you can add a fourth, which is like theory where something becomes more of a complete description or explanation. But really those first three steps for every experiment that's not the general theory of relativity is where most of science comes down. And so, you know, that first step is really a guess, right? That first step requires what you were talking about earlier, Chris, is a knowledge of more than a language proxy of thought, but an understanding of previous conjectures and theories and things, the landscape of science. but also an understanding of reality which way is what does it mean that there is no locality at the in the quantum realm what has proven that right a knowledge of you know in the case of physics the physical world right the things of existence good enough right we can still have proxies or we can have a science of the digital world, right? We can have computer science that doesn't necessarily have that. And we could force these LLMs into that. But the creativity that is existent in that first step, the hypothesis or conjecture step of science is where we need the LLMs to be, right? You know, we need the science to shit out of all of our problems, like they say, and the Martian in order to overcome the existential and less than existential threats of this world that we're in. But we need these LLMs to help us with science. And so the next question is that, you know, what's AlphaFold doing, right? Is it making conjectures, right? Because it's responsible for some new science, right? It's responsible for a whole bunch of new proteins. Now, early days of its ability to do this, and it's created more, you know, new proteins and new science than any single human being in the last couple of years, right? And so, is it doing science? Is it doing conjecture? Is it hypothesizing? Nick, do you want to take this one?
Nick
I'll take it quickly, but really to tee up Chris more than anything, which is that I think he has a really strong parallel with the company that he's working with right now, where similar to protein, they're working on new algorithms and finding ways to develop those and create them. As I tee you up here, Chris, if we go back and really just talk, like at this point, We've kind of established that LLMs really appear flexible, but it's difficult to quantify. And really getting all the way up to that point of actually creativity, and am I creating something new, is just one of many different ways that we can interpret the word creativity itself, which I think was part of your point, Chris, on talking about creativity being a, you know, doing a lot of work in that sentence. Oh, absolutely.
Chris
Yeah. Vioxx were attempting to try to get models to invent other models, right? And one of the problems that we're grappling with is one that I think you'd really like, Justin, we're calling them innovation voids, where if you encode or if you embed all of the current algorithms that work for machine learning, and then you perform PCA on those so you can put them on a 2D plane. You see a big old cloud of dots, right? And you see holes in those dots. You see these big old circles where no algorithms exist. And unfortunately, because we've done PCA, principal component analysis, we don't know what's in there. We don't know if those are useful at all, but we're trying to create ways to push an LLM or even like big old suites of models, these ensemble models beyond just like one LLM beyond only LLMs to create new algorithms that sit in those innovation voids. Now, without going in there too much beyond that, it's really like the reason that we're trying to solve it that way is because of what you're mentioning, Nick, creativity. We know that these algorithms exist, but the LLM doesn't. That's why we have to embed them somehow. But we're not super confident on our ability to capture all of the information about these algorithms in order to give us a fully representative plane of information to feed to the LLM, to push in those directions, into those innovation voids. And I see that reflected in a lot of these other problems, whether that's Towers of Hanoi or chess or like, I mean, go on Kaggle, look at any of the bounty problems. All of those are attempting to elicit some form of creativity, but they're not eliciting creativity from models. They're doing it from people because people have the ability to do derivative thinking, but they also have the ability to overgeneralize. And we're trying to going back to that impossible paradigm that we have the models in where it's not actually impossible, but it's very difficult to nail because you have on the one hand, we're asking the models to be creative, be creative, be creative. But like you mentioned earlier, Nick, we're also trying to limit that creativity to like only stay in this same vector in this direction that we want you to be creative in, you know, and that's extremely difficult to nail that balance. And I don't know, it's one of those things that makes it so cool for me when I ask a model to do something and it actually does it correctly. For example, a couple of days ago, I asked one model to train another model and it wrote the training code and it deployed the environment and the training ran. And then I had two models. It was very cool. The problem is that the second model didn't do a whole lot, you know, because it was going back like it was all functional. It was very cool, but it wasn't creative. It wasn't pushing the boundary of that model, whether that was within the data representation portion or the tokenization portion. Right. How do we get words into vectors that the model can interact with and make sure that those vectors are actually representative of the language and the meaning that we want? Right. I have a lot of hope that in the actual learning algorithms that we have, but I have a lot of doubt in the ways that we are representing things. I think that we have a greater ability to push those creative boundaries if we're able to represent the data in a bit more coherent way.
Nick
Yeah, that's great. Yeah, and I'm going to use that to tee up another conversation, but bear with me for a minute. So first, I want to go back to just this general concept of creativity and try to argue that creativity is not always about novelty. There are times when you can take a creative solution by simply applying something that already exists to something new. There are times where even within your own given ecosystem, your microcosm, your own little world, where creating something that others have done, even in that exact same type of space, but that you are unaware of, actually becomes its own form of creativity. Even coming into new thoughts that are beneficial for us, I would argue, could be considered creative. And I think trying to find ways to manipulate things that really all of the parts are already something that truly exists, but by pulling them together, you create an emergent new thing that now is very fascinating and something that the three of us have spoken about a few times over years. You know, as I pull this back to like Alpha Fold, for example, you know, I went out and just kind of checked a few different things around it. And there's some pretty amazing achievements that have come down the line. One is just really its ability to have a much, much better accuracy as it looks into core protein development and protein folding. It's also able to find and discover all sorts of new drugs based off of these proteins. And it's making really amazing advancements on new research as well. And a lot of that gets into new computational work and other things that exist outside of that protein piece, but also creating new opportunities for, of course, things within the biospace as well. And so really, really exciting. It's revolutionizing all things in structural biology, the way that we think about proteins, the way that we think about our core structures and really accelerating that drug development and broadening the broader biological understanding that we have. When I think about this as a form of creativity, and I pull it back to the core things that we are trying to do in a space, you know, as Chris talks about his space, really saying, okay, I have a given Euclidean problem space, or I have a given set that I really want to work on. Discovering the new thing is valuable, but if I apply this to patents, when we think about a patent, we don't just look for novelty. We look to make sure that it's actually applicable for the given use case. That it is, well, it may not be for the given use case, but that it actually has applicability out there in the real world. If you have a patent that has no function or no usability at all, you really can't get a patent around it. You have to be able to show how it's applicable and you have to be able to make it so that it's something that the people that are in that industry would actually be able to understand and would be able to go and implement it itself as well. So you have to explain it further and not just some high pie in the sky details. An example is the other day my son asked me if maybe now we could use LLMs to create time travel and if we should be able to create a time machine and so I opened up a project in GPT-4 0.5, started working on that. Saw no real creativity or progress beyond what's already readily available out there in thought experiments and theoretical approaches. And then even some of the scenarios that people have pushed on. But as I was chatting with my son about it, he at some point said, you know what? I actually believe that a lot of people have already time traveled, but there's just no way for us to tell. They've moved on and there was no way for them to come back. and so we can't even tell that they're gone and that's an interesting form of creativity that humans and that children are often really highly capable of and so I think it's good for us to put this into perspective and say are we thinking about imagination are we thinking about novelty is it sufficient to consider the golden goat or do we actually have to come up with things that are completely new? Or when we talk about algorithms, is it really a process of discovery that Justin and I have talked about in the past? As we think about if we go back to that Wisconsin card sorting test that I mentioned before, GPT-4 was actually able to really solve it itself really well under the right conditions. And when all of those different solutions, Gemini, Claude, and others were able to go up against this particular challenge, when the AI was prompted with chain of thought reasoning, meaning that we actually give it step-by-step thinking and processes all the way through, it achieved human level or even superhuman level performance on that set shifting task. So really when the AI is able to give that rule, be able to sort it correctly, and then adapt when that rule changes, because it can go through its reasoning, that function is very, very similar to what a human is doing, like it would with that dopamine type release, like that approach that happens within our brain. But really, is it just mirroring it? Is it something that is just appearing as that core cognitive flexibility? And I think that's at the crux of our question, and one that's very difficult to answer. So as we go into it, and we think about quantifying what are all of the different ways to consider creativity? What are the different ways to think about this set shifting and context shifting? I think we need to shift out of just the large language models themselves and expand into the different tools that support them. So as we talk about chain of thought reasoning, or there's graph of thought reasoning, other types of approaches as well, as you supplement those with a RAG, such as LightRAG, or we've talked in the past about like Weave8 and Pinecone and other types of solutions, had vector databases, had knowledge bases, knowledge graph, ontologies. You can supplement the AI with these core tools as well as things like MCP and other servers that can now say, hey, you do have access to this, you don't have access to that, including things like the internet. And being able to go way beyond just core, simple tool development or chain of thought processes to start saying, okay, in this given workflow, for this particular set of tasks, these are the things that I need to accomplish. And if something shifts, here's how I can adaptively approach that problem, change my set shift, and then be able to move forward from here. One of the early fun ones was seeing large language models start shifting from just general chat to all of a sudden having Socratic-like arguments and being able to really talk things through with you in ways that felt creative, felt as though they were really approaching something new. But at that time, to go all the way back to the way that you framed the question, Justin, really at that time, those models were taking patterns they had already seen and just creating a repeat of those with variations, with substitutes to be able to provide the correct solution. Now, as time has gone on, I think as we dive deeper into the latest models and the next generation of models, I do think that creativity now not only mirrors what humans are able to do, but actually starts surpassing it because we can start saying, if I was truly given enough context as a model to understand the entire internet of all things that have been approached in the past, to also think about all tools that I could potentially have access to that all relate to all of this data, there will be a point where the type of creativity, the type of solution that comes out would be completely different. unexpected to humans. You would not be able to go and review that entire data set, that entire context window, especially as they expand and be able to quickly provide that back even with a photographic memory. And so there is a point with this cognitive flexibility leading to creativity with this cognitive flexibility that we're focusing on here for set shifting or other context shifting that start giving us a glimpse into where models can truly head and i think that's one of the key things that we want to look at is what is adaptive intelligence as it moves into super intelligence and what does that mean for us as humans as we try to find out how to apply that to our world and make sure that applicability is actually useful instead of like chris talked about maybe not as useful as what we already had oh i was just gonna say i really love that i uh
Chris
I have a test that I do whenever new models come out that it's basically digging into what you just said, right? It's a creative test to see if it's performing up to my snuff because it's really difficult to measure. It's difficult to represent. And so we have benchmark performance that all LLMs get compared against. But that doesn't necessarily detail how well it will complete your task, right? And one of the tasks that I use is I ask an LLM, I'll download it and I'll get it up and running and I'll ask it to create a brand new state-of-the-art algorithm for diffusion LLMs and then implement it using Vanilla PyTorch. That ends up being very, I'm going to say, revealing towards the LLM's capabilities, right? How well does it use tools? How well does it, because it has access. It has access to the internet. It has access to documentation about how to build LLMs, about how it has access to my RAG knowledge base. It's got access to whatever it wants. Which tools it calls end up showing what it deems to be most probable, and those probabilities are really what I'm grading. Sometimes, like today, I was testing GLM 4.5, I was testing Kidney K2, and I was just testing the new GPT-OSS. These are three vastly different sized models with vastly different training sets and they're meant for different things. And so I got different but maybe viable solutions from all three. It was just really interesting to see that, especially Kimmy K2, that one ended up modifying its own behavior on the fly where I was calling it through Cursor. And it initially had some errors where Cursor was presenting information in its Jinja template for what tools were available to the model in a way that was incompatible with the server. And, well, it wasn't incompatible with the actual server. It wasn't throwing a 503 or anything. It was throwing... There was an error with how Kimi K2 was responding to it. But it was able to feed those errors back into the model and Kimi K2 on the fly adapted to the circumstance and started generating different tokens right at the beginning so that it could complete the call. And that was wild to watch because that had happened with a bunch of other models. That has happened a bunch of other times. And that's the first model I've seen adapt like right then as part of the conversation, all of a sudden it's going to change its entire templative structure for how it responds and go with this one that it knows works. Now, that didn't mean that it performed better than the other models at any other time. It got caught in infinite loops of generation pretty frequently. It didn't clearly understand the task pretty frequently as well. It was a little under 25% of the time it was doing something that I didn't ask for or solving for some ambiguity that was in the question in exactly the wrong way. It's supposed to solve it a different way. it was just wild to watch it change on the fly and that you you reminded me of that with what you were talking about nick where creativity isn't always novelty sometimes is just like changing the template of what you're thinking about changing the tools you have access to it makes me wonder though like and i understand this is a deeper topic so we don't have to get into it if we don't want to but how many people are committed to like equal opportunity tooling for LLMs and are not committed to equal opportunity tooling for humans. Just bonkers to me.
Justin
I wonder to like to that point, I guess the yes or no question is, is did this model that showcase this interesting, maybe proto creative behavior, did it get a reward for showcasing that? Because you said like it had a lot of resultant error. So I'm taking it away that it must have gotten rewards during training for solving those because the only quote unquote reward that it got in this context was it didn't see the word error in the conversation after that. Sure. So, you know, again, even if it had gotten the error and that was rewarded, because again, like building the counterfactual space is as important in science as building the factual space, right? You know, both of those need to develop the path that is most optimal, most impactful to go down, right? And so maybe part of the problem here is that we don't reward the counterfactual space because we don't know what to use it for because we don't understand the path that we're on any more than the LLMs do. It's just moving so fast. One of the things that, you know, Nick and I get into, you know, frowny face arguments about from time to time is the use of analogy and our ability to create in that space. And that's where, when you were talking about these innovation gaps. So in real time, my small corpus of things that I glom onto went immediately to a picture of the microwave background radiation. And the clearest picture we have of the edge of the universe, the edge of the bone. The reason it goes there is because these gaps are now fairly well explained from the quantum fluctuation that many believe was the impetus for the Big Bang. This probabilistic, you know, happening that is a piece of the something that you can get from nothing that happens in a quantum universe. And so Seth Lloyd, brilliant physicist, in his work programming the universe, where he makes the conjecture that the universe is a quantum computer, that first quantum fluctuation is the bit bang that created this whole thing. And the reason that we see those troughs in the microwave background radiation, the reason that we have regions of stars and galaxies and conscious entities and regions where there is nothing, there's just the vacuum of space, is because of the remnants in the space-time continuum of that initial fluctuation. Now writ large across the whole universe, building these troughs in the quantum surface and this giant computer that we're living in. And so the explanation there is a very creative review, this fellow's whole life of conjecture and criticism and building experiments and reasoning in the mathematics of the physical universe to build this out. And so by analogy, I'm thinking, well, maybe what Chris needs is a quantum computational, you know, analogy for why there are these regions of emergent innovation and why there are these regions of the vacuum of the lack of innovation. And so this epistemological universe that you've created might have the same structure. And I'm willing to be criticized on this analogy because that's what goes beyond just creativity into the realm of actually helping Chris to solve this problem. Which, right now, the LLMs aren't doing any of that. They're not making this conjecture, even though their corpus holds a hell of a lot more than the microwave background radiation picture that came to mind in my head. And, you know, one of the many books that I've read. So I want written in on the IP, Chris, if that's what it is. So I guess that, you know, I'll take the pause as, you know, a hope to wrap. And I guess that I want to wrap with a p-value from each of you as to what's the probability that these LLMs are, by whatever definition, you think is rigorous, creative, but creative different right now. So give me your p-value on that. And if you feel like it's low, right, let me know what you think is the best path forward. So we'll start with you, Nick.
Nick
All right. This is maybe the first time that with very, very limited data, I'll throw out a p-value. I feel like I'm about to get my hand slapped by, you know, a client or, you know, somebody.
Justin
We can p-hack it in private. All right.
Nick
Yeah, so I'm going to call it a 0.81 or, you know, out of one. So we'll translate that to 81%. The reason that I think there is a high probability, but not all the way up into our 90th percentile, that models are truly creative different at this point, is that when I consider the different aspects of creativity in my own life, whether that's in lateral thinking or other types of moments, that it may take the prompting, and that prompting may be from its own chain of thought and reasoning, it may be my own prompting, but I do believe that the models are able to come up with new creative solutions, new algorithms, new solutions in general, new forms of things, whether that's discovering things from a protein-folding perspective, discovering algorithms, as we've discussed, or solving problems that maybe we haven't thought of before or that we've been trapped recently in our own rigid thinking around. And so as I consider what is truly valuable around creativity and going beyond just the core novelty and thinking about how it may need to apply in my life, I think this is key to what I do and that I have seen these models be able to move into this arena. One of the main things that I talk about frequently with clients is that we have shifted from traditional machine learning or deep learning, where we had a set of tasks that they were particularly good at, and where we would train those models to accomplish those given tasks, and then they were set in stone, is that now I can have models that adapt to that particular use case and are able to shift their context based off of the information that we give them to be able to achieve multiple tasks. and to be able to move forward much closer to how a human might to actually reaching the outcomes. I do think with meta reasoning and a handful of additional solutions that are just right now on the cutting edge, we're going to start creating opportunities for similar to a human, an AI or in a large language model to really be able to understand here is the direction that I want to head, here are the outcomes that I want to achieve, and then be able to walk that back and say, here are potential solutions that I can approach across those, And then either follow some form of vertical or lateral thinking, horizontal thinking to be able to go in and approach those problems. Even today, they're pretty fantastic at brainstorming and thinking about different ways to kind of be the devil's advocate for you. And so finding ways to be able to change your own thought, I think, is a critical usage of these models. But only if you are critically thinking. Chris, how about you?
Chris
Dude, this is a good question too. I'm not going to go for a 0.95 on that scale, but I want to couch it. I think that models already are creative by any definition that we could come up with. The problem is that they're set up for failure. I think auto regression gets in the way. I think that the way that we represent language gets in the way. I think that the, and I don't just mean tokenization and embedding there. There are problems with both of those. Like positionality is a huge problem with how we're representing. And we even have positional embeddings, but flipping back and forth between sine and cosine, it's not helpful. But beyond even tokenization and embeddings, the fact that we're doing this based off of text and we're not doing this based off of audio. We're not doing this based off of other representations of language that are much richer data formats. We're not using video. We're not using video plus audio. We may be able to get to this point. And this is what I was saying earlier. I think that the learning algorithms that we have and the generation algorithms that we have are fantastic. the issue that I see is with the data type the data quality and the representationality of that data is it representative of the problems that we're asking the LMS to be creative on and I think
Nick
the answer there is no awesome I'm going to pass it back to you Justin but I'm going to do a little bit of a card shift maybe not the color to the shape but I'm going to I'm going to pass it to and say please give us your probability but tell us what you think is necessary for us to solve these gaps that we have today yeah i think that you know i'm in the position of the guy on the prices right
Justin
safety. It's a big part of AI:Chris
Well, if that's all you think that we need to do, have I got news for you, Justin? We are working on it. That's like the whole point of this company. If we could condense that question into one sentence, what would you say? Like, what do you want me to really address here?
Justin
Yeah, so give me a problem statement or an objective statement for a generative, generative AI working on the creativity adaptive knowledge problem.
Chris
B from:Nick
Well said. I think this cognitive flexibility overall, the way that we think about it in humans, it's this rich concept. And it really underpins creativity, it underpins learning, resilience in humans. But starting to think about how that could truly be the secret sauce emerging in AI that really takes us and helps us understand, you know, what problems should we be solving? What are the goals? What are the outcomes? What things are best tuned to do so? And how do we start thinking about this? Not as broad models that can just scoop up and hoover in all sorts of data and all of a sudden create emerging properties, but take advantage of those really amazing advancements. and pull that back to say, yes, but this particular thing is really, really good at this. Let's find a way to say certain things are actually solved problems. And then bolt those on to the bigger, broader solutions as time goes on to create something that can surpass what all of the others are capable of. This entire emergent podcast is really about this concept of, are we taking ideas, things, whatever they may be, and taking those subcomponents and having them emerge into one greater thing? Or do they actually maintain their still own subcomponent parts, but move and act as though they're one greater part of that particular thing? And I think as we consider all of this, we should go back to the prices, right? And I'll just, you know, end on the note saying, don't forget to spay and do your cats and dogs. Like an old Bob Barker. All right. Thank you so much, Chris, for joining us today and for sharing your expertise. Thanks, Chris. Yeah, thank you both. Justin thinks as always as well.
Justin
Yeah, absolutely. Now back to work on generative generative AI, Chris. No rest.
Chris
What do you think I've been doing this whole time? We need him to be creative. yeah thanks guys this has been awesome
Justin
we gave you plenty of unstructured data so you can work from that
Chris
awesome thanks
Justin
you bet
