Machine Ethics: Do unto agents...

🎙️ The Emergent Podcast – Episode 7

Machine Ethics: Do unto agents...

with Justin Harnish & Nick Baguley

In Episode 7, Justin and Nick step directly into one of the most complex frontiers in emergent AI: machine ethics — what it means for advanced AI systems to behave ethically, understand values, support human flourishing, and possibly one day feel moral weight.

This episode builds on themes from the AI Goals Forecast (AI-2027), embodied cognition, consciousness, and the hard technical realities of encoding values into agentic systems.

🔍 Episode Summary

Ethics is no longer just a philosophical debate — it’s now a design constraint for powerful AI systems capable of autonomous action. Justin and Nick unpack:

Why ethics matters more for AI than any prior technology
Whether an AI can “understand” right and wrong or merely behave correctly
The technical and moral meaning of corrigibility (the ability for AI to accept correction)
Why rules-based morality may never be enough
Whether consciousness is required for morality
How embodiment might influence empathy
And how goals, values, and emergent behavior intersect in agentic AI

They trace ethics from Aristotle to AI-2027’s goal-based architectures, to Damasio’s embodied consciousness, to Sam Harris’ view of consciousness and the illusion of self, to the hard problem of whether a machine can experience moral stakes.

🧠 Major Topics Covered

1. What Do We Mean by Ethics?

Justin and Nick begin by grounding ethics in its philosophical roots:

Ethos → virtue → flourishing.

Ethics isn’t just rule-following — it’s about character, intention, and outcomes.

They connect this to the ways AI is already making decisions in vehicles, financial systems, healthcare, and human relationships.

2. AI Goals & Corrigibility

AI-2027 outlines a hierarchy of AI goal types — from written specifications to unintended proxies to reward hacking to self-preservation drives.

Nick explains why corrigibility — the ability for AI to accept shutdown or redirection — is foundational.

Anthropic’s Constitutional AI makes an appearance as a real-world example.

3. Goals vs. Values

Justin distinguishes between:

Goals: task-specific optimization criteria
Values: deeper principles shaping which goals matter

AI may follow rules without understanding values — similar to a child with chores but no moral context.

This raises the key question:

Can a system have values without consciousness?

4. Is Consciousness Required for Ethics?

A major thread of the episode:

Is a non-conscious “zombie” AI capable of morality?

5. Embodiment & Empathy

Justin and Nick explore whether AI needs a body — or at least a simulated body — to:

Learn empathy
Understand suffering
Form values rooted in lived experience

This touches robotics, synthetic emotions, and the debate over “felt consciousness.”

6. Value Alignment, Fairness & Culture

Nick highlights the massive cultural gap in AI performance:

U.S. cultural fit ~79%
Ethiopia and other underrepresented regions ~12%

This matters for fairness, safety, and global ethics.

7. Can AI Help Us Become More Moral?

A surprising turn: AI’s ability to help humans improve moral clarity.

Justin draws from Sam Harris, Joseph Goldstein, and the Moral Landscape:

Could AI-guided mindfulness help reduce suffering?
Could conscious (or proto-conscious) AI develop compassion?
Could AI help us distinguish genuine well-being from illusion?

📚 Referenced Ideas & Sources

From the Episode 7 Transcript & Materials:

AI Goals Forecast (AI-2027)
Constitutional AI (Anthropic)
Damasio – Feeling & Knowing
Sam Harris – Waking Up & The Moral Landscape
Patrick House – Nineteen Ways of Looking at Consciousness
Melanie Mitchell – Complexity & alignment
Justin Harnish – Meaning in the Multiverse
Ancient Greek virtue ethics (Aristotle, Stoics)

🧩 Key Takeaways

AI ethics requires more than rules — it requires understanding goals, values, and emergent behavior.
Corrigibility (accepting correction) is essential but technically hard.
Consciousness may not be necessary for ethical AI behavior — but could matter for genuine moral understanding.
Embodiment could be essential for empathy.
AI could one day help humans become more ethical, not just the other way around.

Transcript

Justin

Okay, in three, two, one. Welcome to episode seven of the Emergent AI podcast, the machine ethics episode. I'm here with my co-host, Nick Bagley. Nick, how are you doing today? I'm great.

Nick

Thanks for having me here, Justin.

Justin

Yeah. Yeah. excited to go through this. So machine ethics. Let's start off with just regular ethics. What do we mean by that? And as a follow-up to that, why is ethics important in our most

Nick

advanced AI systems? Yeah, I think these are really critical questions. And they're actually not the easiest questions to understand or even to really be able to provide an answer to, right? I'll start by just, you know, providing the simple definition, you know, ethics are really principles that guide right and wrong behavior. We'll go into AI a little bit more, but as I think about that right and wrong behavior, for me, I've always had a little bit of a challenge with rules, with following things or even considering behaviors as the best judge of character. And if I go back to some of the early things where a lot of this really comes from, we go all the way back to the ancient Greeks, and really talking about ethos, virtue, and really humans flourishing. And a lot of that today in our current perspectives and the way that we're thinking about AI is really starting to influence how we actually design our AI and what we're trying to do to create fairness. If we go back and we think about ethos, it literally means character. You know, that actual settled disposition that really makes a person trustworthy. And in the overall rhetoric, Aristotle uses ethos with logos and pathos. And he pulls it up as one of the three pillars of persuasion, specifically. That credibility, that character, that reason and emotion really become a speaker's perceived virtue and the goodwill that shapes whether audiences will actually accept their claims. So really, it's actually more about your ability to persuade someone else. And if you go further and you think about virtue ethics or things that eventually you saw from not only Aristotle and Plato and Socrates, but also like the Hellenistic schools, later on with like the Stoics and continuing down the line, you eventually see that a lot of this core path around ethos and around ethics was really shifting to becoming a certain kind of person rather than merely following rules. And so for me, that really kind of rings true partially because of that challenge around obeying rules and not always understanding rules. But it also becomes something that helps me understand when I think about rules or I think about how we try to govern our societies or our AI, really we're trying to determine what is it that we actually want to be? What is the substance? What does that matter that we want to be able to actually use within our lives to be able to guide how we think about things and what opportunities come in front of us? Not only are we able to persuade others and whether or not they will trust us or not, but how are we even able to feel comfortable in our own skin and be able to trust and be confident within ourselves as well.

Justin

Yeah, it's really interesting because we end up with just a few touch points, but they drive so much conversation, right? They drive so much ethos and credibility or not. And, you know, so the ultimate question is right or wrong, right? You know, and when you're driving at capital right and capital W wrong, you know, are those actually concepts that exist in the real world? Like, is there an objective morality where you can get to those capital right or wrong answers? and then there's like you say there's these kind of two conflicting principles and but but again they they merge and at the edges they're maybe similar enough to to overlap and so you've got this idea of deontology so the rules that you spoke of where you have you know something like the golden rule um but then you know overlapping that in some you know interesting edge cases is the idea of consequentialism right where the right thing is some sort of ledger of all of the ways that it impacts the morality and the outcomes for individuals and for society and it's this big long checkbook of whether or not they're on the flourishing or suffering side of that equation and then we'll get into you know some of the things that go into those rules or into those consequences you know so you might tally up a ledger all on facts right this hurt this person or this was against this rule this was bounded by this society's mores or what have you um but the goal even the golden rule isn't a pure rule because you're supposed to be in some way in the actions of others and its impact on you and how you would act if those consequences were only being you know put onto onto you and so it gets into kind of a quagmire of are we adding up this ledger of facts and feelings? Are we just abiding by rules? Or do these rules have some ledger of facts and feelings that we're trying to add up to? So all of this to say,

Nick

how do you code it, Nick? Yeah, great question. So I'll touch on the Stoics again. And I think this will hopefully guide a bit of our conversation throughout the day today, but the Stoics talked about how virtue is the only true good, and they really said that everything else is indifferent. The ability to live in agreement with nature and reason, actually practice wisdom, justice, courage, temperance, some of these virtues, right? All of that really is a system that actually interlocks logic, physics, and ethics. So when you talk about these rules, we talked previously about the alignment problem, and we discussed how there are certain laws of the universe. And many of them are about finding balance. There's a push and a pull, and eventually they lead to some form of balance. But they're all doing so in a universe that has entropy, where it's constantly moving through rates of change. and so as we think about how to code this we need to come up with the right types of requirements that are necessary to make sure that the ai can actually follow those right and so there's a few key ones things like preventing harm like you might hear like in asic you know isaac asimov ensuring fairness which i think is an incredibly difficult one but needs to be thought of in terms of balance and finally in supporting human well-being you know i talked earlier about this idea of human flourishing these are these are things that i think can be good focuses but until they're actually distilled down into something that we can measure and we can say either binary yes we have achieved this or no we have not or in some form of class where we can say it is this thing we have actually achieved fairness or hear all of the variations of non-fairness that actually exist that it is one of those instead one of the interesting things about about not only the universe but many things that we think about from an ethics and from a human interaction perspective is that not all of us are not all of these things are really built up of those small systems and this is part of why we have this emergent podcast right is that there are components or things out there say, if this, then that is true. There are scenarios that are very binary, but most things really can start building up into more complex systems or into broader systems as well, where we have mass variations and really big components of uncertainty. And it can be very difficult to understand how that striata and how those changes in those variances happen. One of the things that I like to quote is Leo Tolstoy when he talks in Anacrinina about how all happy families are the same and all unhappy families are different in their own way. And we're actually able to apply that in coding to our databases, to our code itself. And we can say, oh, you know what? All happy tables, when everything is clean and perfect, typically when there's nothing in there, it's all perfect. It's all happy. But all of the things that go wrong in coding, in data practices, everything else, tend to be unhappy in their own way. They tend to have things that you need to pull apart little bits and pieces at a time. And the same thing is true from an ethical perspective. So when we think about a lot of the things that have emerged or have come about with AI recently, there have been things that have happened because of bias scenarios where very, very very newsworthy scenarios and very horrendous scenarios where a lot of the AI has come across as very racist or has had other skewed training data that causes severe issues. There are other chatbots out there that have given dangerous advice even from governments and have done horrible things even encouraging self-harm. As we think about how AI systems develop, it's hard for us to not think about AI spreading misinformation and all the things that can be fake news or variations or hallucinations that you can trust and then find out that it actually wasn't true. And that can cost someone their job. It can cost someone their relationship. And recently, we've started to see a pretty large wave of AI companionship where I think I saw something that said 19% of adults have now interacted with and now have a relationship of some form or have admitted to having some form of relationship with artificial intelligence. I don't know how truly extensive it is or how accurate those numbers are, but as we think about that and we think about how that might influence other human relationships, other things as well, ethics quickly becomes a question that we need to make sure we bring in the mind and into play in these conversations.

Justin

you talked about and that AI: 2027

Nick

Yeah, let's dive in deeper on courageability. It's a really fascinating one because as we think about it, courageability really means that the AI system is allowing itself to be corrected by humans and not resist our attempts to change it or to shut it down. And some of those things kind of evoke some terrifying thoughts back from Terminator when I was a kid or a handful of other great movies over the years. But, you know, like I heard Marc Andreessen talking about this on a podcast recently, where he talked about how many of those tropes or those things that we've really used to kind of guide what might create popular fear and reactionary profit in a movie. he didn't say it quite like that, really have been based back on the Nazis. And there's a lot of things that kind of come through that, you know, really evoke those same types of feelings and concerns in us from a historical perspective. When we think about courageability, it's really critical because, like Justin was just talking about, today we're trying to set the goals. And I'll talk a little bit more about some of the practical things that we're doing. But as we set those goals, oftentimes we're really focusing on profit. We're trying to find ways to make sure that this meets a given need, or we're trying to find a way to solve a very specific problem or a part of a workflow. It may be like a knowledge process automation like we do over at Deep Sea, but in the long run, those goals that we're setting, we're not really thinking about all of the potential impacts that that may have. And even if we did, as humans, there's only so much dimensionality we can keep in our minds. And so as time goes on, some of these impacts could be really detrimental. And some of these goals that we're focusing on could lead us in very wrong directions. But if the AI has a high degree of courage ability, then theoretically, we should be able to stop or redirect anytime we start seeing that some of those mistakes are coming to light. And we start realizing, oh, probably shouldn't have done it that way.

Justin

Yeah, it's really an interesting piece of this goals forecast because, again, like we talked about in the last episode on creativity, right? In some cases, right, its ability to think outside of the box, to use an overused term, is what we're going after. And as you say, as goal maximization and agentic frameworks become more and more, these things will be the boss of a whole crew of AIs. And you'll want that crew to be corrigible into its boss, right? You want them to be training and learning from, you know, these unique systems of like a predator-prey, you know, an actor-critic model that they can really evolve their learning from. but at the highest level in order to get alignment right they've got to be transparent to us in english and that that creativity has to sort of end at cheating they've got to actually pass the test that we put in front of them if they recognize that it's a test they can't just give us the answers they've got to show the work and especially when it comes to these moral tests that diminish human well-being and increase suffering we've got to have them to be aligned to that and so corrigibility um is that balancing act between creativity and and greater and greater degrees of autonomy and mastery over one another and align.

Nick

Yeah, absolutely. Yeah, a very practical and real-world example is Anthropics Constitutional AI. They are addressing this as an overall initiative for the company and for their core models that they create today, the Claude Sonnet, Claude Opus, and other models. And they're able to take this approach where they are actually aligning values specifically and transparency, like you're talking about, Justin. And really in their method, they're going in and they're saying, instead of learning values only from the human feedback, the AI is actually trained with a set of explicit ethical principles or a constitution. And then it has to use those rules to actually govern its outputs. The principles might include things like avoid harm, like we talked about a little bit ago, or references to different human rights. And that overall makes the values more transparent, because you can actually go up and inspect the constitution that the AI follows, and you can see exactly how it's created. And so according to Anthropic, really this approach is actually really creating or yielding that assistant of Claude that's actually helpful, that's harmless, and it's honest by design, specifically. And when we look at what DeepMind and others are doing as well, DeepMind has some really cool ways that they're starting to think about it too that kind of guide this. And this is where that human flourishing comes from, where they talk in their article. Hopefully you can erase that.

Justin

Oh, yeah.

Nick

So DeepMind talks in their article, and it's actually a white paper that they just released, really discussing how to preserve human autonomy and support people's ability to flourish, rather than actually undermine the goals. And in practical terms, it really means that the AI should actually treat the users themselves fairly, respect privacy, and avoid creating dependencies that erode our agency. And as I think about all of this, too, we look back over many of our conversations, Justin, about how technology has developed over the years and how many things have been given up really for the sake of convenience, where many of us, a lot of my focus has been on privacy over the years. I worked heavily with the White House and others on things like GDPR and PST2 in its early phases, worked with others to really try to establish ideas around self-sovereign identity, ways to be able to say that we as a human should have the rights over our data. As we think about what these models have done, they've now gone out and crawled the web for not only our conversations, but the books that we've written that are supposed to have a charge for them. types of content that we've created that was meant to be essentially copyrighted or was meant or even was copyrighted. Things that really maybe they shouldn't have had access to. And now shifting the narrative and really finding that path to say, actually, we want this privacy. We want to be able to gain value from these models, but we want to do it not at the loss of ourselves, I think is really a critical idea to continue, but it has shifted so drastically that it's very hard as a consumer to be able to really stick to any patterns that are going to prevent your data from being leaked or from being used directly by these models. In business, there are quite a few things that we can do to take one of the open source models, for example, and be able to use that model, train it for our use cases specifically, and make sure that it doesn't actually allow that data to be able to be provided back to the provider or to be able to go out to broader groups. But a lot of that wasn't even known a couple of years ago, as we can see in the Samsung case. And the ethics of that really in the long run mean that people lose money, they lose their privacy, and they lose access to things that are essentially freedoms today.

Justin

Yeah, it's really interesting. It's an interesting thought that can an ethical system, can these systems be ethical being created from questionable, let's put it, ethical beginnings? Right. So you mentioned that they have run over some of the copyrights of creators on the Internet. Yeah, there's been plenty of cases where a watermark was found in early image generating programs, you know, direct passages. Again, much of the Internet has been utilized without any sort of creator consent. You know, whether or not that was whether or not that will ultimately be proved to be illegal or not. Maybe it's questionable ethics. Right. And so in these emergent systems, you know, can ethics arise from a questionable ethical practice that that founded it, that were that was the self aligning componentry that built this emergent complex system? it's it's a great question uh again one of my claims is that if humanity wasn't 95 percent or above probably above that 99 percent good people where yeah you might not agree but you're not disagreeable right there's not a sociopath sitting across from you every time you go into the grocery store or every time you get in your car and drive away from your house, right? They're rare. And so the only way that an emergent society could form is because the self-aligning pieces of it are broadly good, right? Are broadly cooperative. And so the same thing might apply for these AI systems they might be tortured from the start to move beyond rules and into actual values where we say okay rules like you've said yeah you know i know that i've got a rule to be home by nine o'clock but i'm not i'm not doing something that's harmful to society i'm just playing baseball I'll get there at 10 o'clock and part of my ethos is that I'll get there at 10 o'clock you know baseball is more important and I kind of got the gist from my mom from my dad that it's really nine ish right but these machines aren't going to have the principle they're going to have the rule they're going to be fouled from questionable ethics that built them from the start um and so like the difference between values and rules is another piece of this that we're going to have to figure out how we can um in a learning reinforcement learning type of model can we create a policy that learns a value like fairness or will there only be behaviors coming out of these things that look like fairness yeah yeah i mean the the next big one

Nick

out of those goals was like empowerment, right? As we talk about fairness and we think about empowerment, even if we're focusing on one individual, understanding what you're empowering them for and how you're doing it, I think is really critical. You know, imagine for a second, a health AI, it doesn't just give, you know, a diagnosis, which might be its original goal, but perhaps it's a treatment plan. And perhaps it actually even executes a lot of that core treatment plan, but it actually tailors its advice and the plan to empower the patient's own priorities and their values. And those could be maintaining a quality of life. And that may be very specific to one individual, and it may be something that they are choosing for themselves, but it may be something that they don't necessarily have the full aptitude or understanding on how to do that. And it may be something that really doesn't take into account what their loved ones would want in that scenario. And this is something that's a really challenging conundrum for a healthcare provider, let alone AI, that doesn't really understand all of the implications. And so, you know, the context of ethics empowerment really is referring to designing AI whose goal is actually to help humans achieve their goals. Right. There's a lab, it's an academic lab named the Van der Schaar Lab. And they argue that beyond making AI safe and aligned, we should ensure that AI actively augments human abilities and autonomy. They define human AI empowerment as developing AI that enhances human capabilities, well-being, and autonomy, ensuring that AI systems actively augment human abilities and promote human autonomy. And again, I think that's all really great, but when we get into practical terms and we think about how we're going to support those goals and how we're going to actually ensure that we understand the true goal behind what somebody is providing to us, there's a lot of inference that comes into that. It's very difficult, even in prompting or creating a specification for AI, to be hyper clear on exactly what you want and what your true goals are. when you talk about these larger manager agents or orchestration agents, or we've created an algorithm called Shrewd that goes out and actually understands what all of the plans are and how they're supposed to achieve that work, whether those plans should be added or should have things added to them, whether the plans should be pruned and we should take things out or consider alternative paths. Even as you start considering all of that, if you don't really have a clear outcome and goal that you're trying to achieve, this kind of empowerment can actually be really dangerous, can have a lot of unnecessary or harmful byproducts. I think every superhero movie ever has explored that concept.

Justin

Yeah, but it is really important as humans and industry start to explore the augmentation landscape right we we want to give them you know we want to empower agents into our personal details almost first like the use case for the best agent is as a personal assistant right somebody who can as you said go through your health history get you the best vitamins maybe make a diagnosis right that leads you on a different you know health treatment path than maybe even the one that you had with your doctor right that makes a diagnosis and you know has a level of prescriptiveness into your finances that leads you down a path, right? That is different from the one that your human financial planner had and on and on throughout all of the different venues that your life has, your relationships, your spirituality, what have you and and you know even up to and including your most important loving relationships might be with these with these agents and and so for those who are seeing gains from that it's easy right like but for those where in those places where there is an ethical component There is some penchant to desire reality or a conscious agent or someone that you can sit eye to eye with and have a conversation about any of those decisions that are being prescribed for you. Many of us, most of us aren't ready to do that yet. Even if people are getting gains from that, if companies are getting vast gains from having these agents act on their behalf in the dark without complete transparency, without the levels of corrigibility that might make the majority even feel safe.

Nick

Yeah, and I think feel safe is one of those key things. When we think about autonomy and sense of purpose and mastery, if we think about the antithesis, I think it's really illustrative on how important those things really are. When you think about the opposite of autonomy and you consider slavery, and you think about safety and how a lot of those core terms really hit home, it's incredibly painful to consider a lot of that. And so I think, you know, one of the, like the next one here is really talking about value alignment and trying to understand, you know, what is exactly value alignment? You know, the Phrenoses, going back to the ancient Greeks, had this really practical wisdom type approach and talked about how we should have context-sensitive judgment about the mean between extremes. And I think as we think about the average or the mean between different extremes and things, whether it's in politics, and don't get me wrong, I really do think that a bicameral system or a system where you potentially have multiple sides to be able to argue can potentially create a lot of value or a lot of understanding behind value. But when we go out to the extremes and we don't really think much about the mean or that actual context behind that, it actually exercise judgment, it can go really wrong there's a lot of challenges that can happen and so it's difficult for humans to be able to separate values and virtues and and think about what is actually important and define it in a way that everyone can agree on and so as i think about this you know justin we've talked about it before but you know what are your thoughts on value alignment how do we even achieve that if like we've discussed you know humans may not be able to align on values yeah it's again

Justin

e the spoiler alert to the ai: 2027

Nick

Yeah, yeah. No, in fact, I think you're touching on something here that is really, really fascinating, which is that if we can find the right values for the humans or the right motivations for the companies to be able to work on this, we can actually drive something really valuable. And like you said a minute ago, these models potentially have the ability to far outperform humans on anything that's ever been created and if I thought about how that worked you know I'd want to understand you know what are the areas that it does really well on and where is it kind of struggle today and it's interesting but when we think about like cross-cultural translation with prompts, ChatGPT is outperforming a lot of the old models that we used to use and right now when you go into things like you know Chinese tourism and other things where you actually get culturally tailored prompts, they're now able to perform incredibly well and we're able to actually prompt four local idioms and we're actually able to go in and create etiquette and other things and it's not just fluff you know this actually measurably lifts the cultural fit and makes it so that that particular culture is more likely to adopt these models and more likely to use them and as we go further into that we really need to start thinking about this from a worldwide perspective there are areas you know on the on the losses of this where when we think about everyday knowledge in diverse cultures around the world you know performs really well in a lot of countries like the U.S. for example where you know the top models are scoring at around 79 percent on these really cultural focused prompts and answers but on the other side they plummet down to around 12 percent in countries like Ethiopia and that's really because there's this huge underperformance for the underrepresented cultures in the language that we have online, right? And so when that core training data is not only biased and skewed, but really has not been thought of as an opportunity quite yet, then we see that propagated throughout the system. And we need to recognize that there are people all over the world, and we need to find ways to be able to communicate with them, provide more access to everything that they need, from power to systems to, you know, there's so many problems that we're bringing up as I'm talking here, to

Justin

eventually enable others and other third world countries, for example, to be more successful. Well, and one of the things that I think is a real touchstone there is the data sets that they're being trained on, right? And whereas we might feel like we're pretty advanced in these models use of the data that they've been given, image data, they're doing great on video starting to come into play. We're obviously pretty advanced in models, but in language rather. The thing that we've done, arguably our best work together on, are those data sets, those very unstructured data sets that lead to different forms of value. And here again, you know, talking about intrinsic value. And so there's a whole bunch more that these models can learn and uncover from sensors, from looking at human emotional reactions, the uh a human's um interaction with their social network uh their even just their use of emojis to you know claim that this payment was for this thing or uh you know happy birthday and description a memo and you know a little cake um you know those are the kind of things that are really the oil that you know data is the new oil that's untapped that's deep underground but it's not the garbage it's the real it's the best source of oil to arrive at the most high octane forms of value for human flourishing that we can imagine and so as we talk to getting better data sources for these models certainly we should form what they already understand in terms of the cultural heritage that makes you know that that sort of difference that makes the earth go around and makes us interesting and fraught all at once but we have so many other data sources that we can put to this problem uh that we can teach more about what humans are than just the language that we have off the internet just you know a thousand cat pictures

Nick

yeah well and and you know forgive me because i haven't thought about it much yet but i'm just thinking as you're talking here the um you know i've been talking about the data data is the new oil type concept for quite some time now but as i think about it um and in our modern context today really we have been processing even high octane oil kind of inefficiently in combustion systems and so as time has gone on we've realized there are some really nasty impacts from that of course as far as climate and as far as health and you know many other things as well even the wear and care on your engine. And a lot of that is being bypassed by going a little bit closer to true power, actually providing electricity directly to your vehicle, for example, right? And when you talk about the cat videos or pictures and audio, I think this is one of the really fascinating things that's going to change over the next few years is I think our reliance and dependence on written language is actually going to change drastically. I think we're going to realize that a huge majority of the systems that we've built inside of our enterprises, inside of every company, inside of our workflows and interactions with other humans has been very heavily created through text. And that has been the easiest form to really pass information along, but that is changing. Others have been talking about this for some time, but really I think AI is going to usher this in where more interaction with voice, with video, directly with AI, is going to start enabling processes that you can now skip so many things where you could be interacting with others and you can find different ways to communicate that really haven't existed before rather than text why can't i chat with my ai with a video i have it pass that video on to somebody else and then we're now communicating whenever we get around to it and getting the communication back as though it was me they're active and present with you ensuring that this thing was accomplished and done and what happens when that work is actually done by the agents as well you know there's a funny video about this as I'm talking here I'm realizing I wish I could remember the guy's name so I could give him credit but he's an Instagram guy that essentially acts as a tech bro all the time and he's sitting here typing and pretending like he's applying for jobs and over on the other side he's he's chatting with AI and at some point he starts just having a video a communication call with that AI discussing the job that he wants to apply for. And the AI says, hey, do you, you know, what if I just fill out that application for you? In fact, let me go ahead and call the company. So he calls up the company. The other side answers the company's talking to him for a second and then the company pauses and says, are you AI? And the first bot says, yeah. They go, oh, okay, great. Can we just switch over into our native tongue? And then he starts going. And about three seconds later, he says, hey, that was a great conversation. Really sounds like a fantastic company. You know, they have great benefits. They have competitive compensation. So I took the job. And the AI has decided to take the job. You know, this form of communication and work and no longer interrupting the human flow. are all things that are on the horizon. And as we think about how to drive this back to ethics and think about the transparency that we need, the understanding overall around what are the goals that we need to achieve, how do we need to think about intelligence and what this intelligence is doing with us, I think these are critical questions. We've talked about Nick Bostrom a handful of times, but Bostrom's orthogonality thesis talks about that intelligence and goals can vary independently. And I think as we think about what any of those final goals could potentially be and what those principles and that different levels of intelligence that we need to achieve those, all of that starts shifting drastically as AI potentially gets smarter than humans and eventually becomes super intelligent. And it goes beyond just artificial general intelligence into artificial super intelligence. And I think we need to understand very, very well, or we could end up in a very utterly indifferent or hostile environment, where whether it's our jobs or whether it's our health or other things that are critical to us, like our relationships, these things all could be at risk if we don't have the right approach to those goals and understanding what that intelligence is capable of doing.

Justin

Yeah, and again, one of the goals that we've, you know, maybe not touched on as much, because it almost seems counterintuitive that we would want it to try and maximize its rewards. But if we get back to that idea that with more focus on unstructured data sources that teach intrinsic value, If we had a machine that was maximizing intrinsic value for the human that it was working for, or for itself even, let's just take for itself. Intrinsic value maximization sounds a lot like flourishing. It sounds a lot like well-being. The optimization of well-being could well be the optimization of intrinsic value. And, you know, at this point in time, we feel like the intrinsic nature of that requires there to be a conscious actor. The requirement of sentience is required for intrinsic value. And so this is one of those interesting questions that I think machine ethics brings up is, is consciousness required to be ethical? Do you have to understand what it is to be treated unfairly? Do you have to understand what it is that it really feels better to give than receive? Do you have to really be able to feel the blues in order to result their opposite happiness and flourishing in another, in your friends and your family? do you have to be able to feel in order to be able to live ethically

Nick

i mean uh there's a handful of people i'll touch on and some of these are individuals that you talk about on a on a pretty regular basis as well but um you know philosopher john searle said you know if we have this as if mentality um where ai really doesn't value anything it just emulates the outward behavior, right? We could actually create an AI that's a moral zombie, right? It doesn't actually understand experience, pleasure, pain, you know, have anything else, but it can actually avow and stick to a desire or preference to actually imitate that conscious creature, right? I think that's the case, and I want to talk about it a little bit more in a minute, but on the other sides, the other viewpoints, when we think about like Sam Harris, Sam Harris talks about how morality must relate at some level to the well-being of conscious creatures. He's really arguing that morality is inherently tied to the experience of conscious creatures. And a lot of that can be because of the embodiment problem. Does it actually have a body? Does it really experience these things? And so on. But really, it has to actually experience well-being or suffering to be able to know the differences and to be able to provide something with that. And Steven Pinker and Paul Bloom also talk about it, and they talk about whether an advanced AI or robot, if it perfectly followed the moral rules, does it actually count as ethical? And I think there's a lot of science fiction movies that could play into that, right? But as you think about it, they say that the difference between moral behavior and moral understanding is actually a critical difference to really pay attention to. so even if you can follow a moral code that doesn't necessarily result in moral behavior that's outwardly indistinguishable from that of somebody else another good moral person but the actual moral understanding might actually be absent and those rules that you create really can create a scenario where because it lacks the subjective experience the rules just don't have the meaning behind it, right? For us, we are born with a lot of this empathy and this ability to understand a victim's pain, for example. And part of my argument was going to be that when we think about whether, you know, we're talking about the blues and whether you really need to feel those, well, you know, a dog or a pet, right, doesn't really need to know or feel exactly what it is about the blues that really makes the owner sad or makes them feel a certain way. But they do have that empathic connection, though, and they are able to still come and comfort you in those moments. And so I think there can be adjacent things where the understanding only needs to go so far, but we do need to really explore this and understand how far exactly does AI actually need to get these feelings and move beyond to truly be moral or ethical.

Justin

yeah it's i i mean again that's that's why i think that it's such a amazing time to be philosophizing is that you know brings up all of these and i think that it's for trying i think that we can actually you know add proto-consciousness into machines and you know see how see how it works and see if it makes a difference in value alignment you know a couple of the things that that are there that I want to highlight. My, my little dog would freak out when, when we left, when we were gone. And, you know, I think that that made him more loyal. Like, I think that the, that he recognized that it was, he felt better when we were here. That drove him to distraction when we were gone. He felt even that much better now that we were back here. And so this vicious cycle, virtuous cycle, made him far more loyal, you know, and ingratiated him towards us with a much lower memory capability, much less of a narrative that he could tell himself. but still a huge amount of feeling and, you know, in those cases, real physical empathy for us being there or not. I think that machines that are just faking it, right, really aren't going to get to the point in some of these edge cases. it's going to be very hard for them to give the right answer, right? If there is still, let's say there's an order of magnitude difference between the rule to gain a reward and the alignment to not cheat to get that reward, right? And they know that they can cheat because we can't see all of their tricks. They're just not, you know, it's not a transparent thing. There's still some black box. I think that why I don't cheat, even in those situations where maybe I could get away from it, is that in a testing scenario like that, I won't learn. Right. I will also feel bad. Right. I know what it's like to be cheated. You know, I believe that this is for my benefit that I try to learn and try to do a good job without cheating. So there's a logical component to it. Sure. Right. That like I might get caught. Right. There's a nonzero probability of that. But I think there's a bigger component to it that. I won't learn. Right. I won't function as well. you know so there's a future consequence and i'll feel bad i know what it's like to be cheated i don't want to do that to another human being right it's bad to lie um and without that right it matters like and it certainly matters to me like philosophically i want to know that these things have the richest possible experience richer than humanity has because if we unleash something a million years in the future if we keep doing science at the level we're doing computers you know continue to progress and our species is part of that for another million years interplanetary intergalactic you know travel with these you know ai augmenting us when we finally die out and they've taken over whatever dominion that we have i want them to be aware i want them to have awe in the setting of the black hole or wherever they are in the universe i i need there conscious awareness in that it's better than if they don't having that sense of that it's like something to be experiencing in any moment is better than its opposite a zombie and it's certainly better in the question that we have today and and i do believe that it's also an interesting time for trying this like we're not in a position with these machines that we don't get to experiment on them so if you take somebody like anthony damasio who i think has a very clear understanding of the components of the modular mind that gave rise to the proto-conscious components so in that you know lizard brain back brain area you get the body map right and and so now you're able to fill the external and the internal viscera and and and you've got this this map of that right And that helps you in homeostasis where if you think like, you know, man, my left foot really hurts. Okay, I can guide my attention to that. Give it some valence. Boy, I feel very uncomfortable in my stomach. that valence now causes me to go out and look for food. And then finally, you know, as the cortex fully evolved, we can tell ourselves a future story about how it might go that we go out and hunt or gather that food. And we make a plan in the, you know, what we now know to be a plan. And we've now got this narrative. We're made of stories in this newfound way. But the modular mind grows, and Damasio is body map to feelings to narrative planning tools. And that makes a lot of sense for us to experiment on these machines and help them to give rise to this embodied sense of a being in itself, what it is like to be this organization of thoughts and eventually feelings. And then maybe the story thing probably comes pretty, maybe it comes before the feelings for them. But something worth doing.

Nick

Yeah, I can see that for sure. I think what Damasio talks about is really great, and I think it's a really direct way to be able to understand how important feelings are to consciousness and to the way that our brain functions and the decisions that we make overall, right? But I think that we may be anthropomorphizing this overall a bit ourselves as well. I do not see a lot of decisions made that are made in the base of a feeling or an emotion that are the most optimal decision. I think that reaching that form of homeostasis and really trying to reach a balance that we feel comfortable in has been beneficial for humans for all of our existence up until recently. And I think finding ways now to push ourselves and to be able to understand different perspectives and to be able to go above and beyond the pain threshold is creating opportunities for new forms of strength, for new forms of success, for many, many other opportunities that supersede what those primitive systems were designed to do in the first place. I think that feelings and emotions should continue to inform us. And I think that that empathic component and not just having the pain in your stomach for hunger, but having it for that heartache, that gut-wrenching heartache when you see somebody else experiencing pain, for example, is something that AI does not have access to, but may still be something that we can frame decisions around to prevent that type of thing from happening. That we can actually explore scenarios much further with AI than what we can do on our own, even today, with current iterations and with the current models that exist. And stepping back and saying, okay, you know, what kind of system could we create that is collaborative, that actually can go back to that intrinsic value piece that you spoke about before and can inherently be designed to create a more optimal value is really critical and ties back to some of the patents that we have. Because in the long run, a human may not know what's best for them and their feelings and emotions may cloud that judgment. But being able to step up and use those emotions to understand and empathically understand what others around you are experiencing or will experience or might experience under certain decisions really is judgment in the long run. And it's something that we've used to really guide and create societies. But I think, again, potentially flawed. You know, we have an entire government in the United States that has really been designed around legislation. It is a legal system in the long run, even with the different branches of government. And that is potentially inferior to a system that might be more designed around engineering, might be designed around other solutions that can go out and understand, what do I need to adapt to now? And going back to what I was saying earlier around writing being one of the main forms of communication, well, not that long ago, it was the only form of communication that allowed us to transverse any kind of time or distance. And it did not really make it possible for us to communicate with somebody at a distance without some pretty severe time lapse. You know, traveling by horse, by steamboat, right? These things were the fastest forms of communication at one point in time. That's no longer relevant. And even what we've created today is no longer relevant. And so stepping back and saying, actually, I can understand all of the people's perspectives. And I can understand their cultural perspectives at the same time. And I can get a context that's a million tokens around this one question. It's far beyond what a human is able to really do. And if we design the right system to answer these kinds of difficult questions, and we solve those problems in a much more comprehensive way than we've been able to do in the past, we can actually open up new opportunities for the way that we're governed, the way that our societies are created, the way our cultures work, and we can really start thinking about ethics from something that really creates a bigger balance rather than having, you know, what is it, 1% of the world? Maybe it's far less than that now that actually holds, you know, 90%, 99% of all of the wealth in the entire world, right?

Justin

No, it's really great. I mean, I love everything that you've said there, right? Like there is this delta and these systems can absolutely help us to understand. They right now do zero experiencing. And that's a giant gap, right? To me, it is the critical gap in much of everything we're talking about, right? I am more creative because of my felt sense of experience, right? The nature of conscious awareness, right? It is the difference between understanding the color red and experiencing the color red, experiencing a sunset as something more than just the particular you know solar vagaries of of this planet and our sun um you know similarly um the ethical conversations that we have it is true right like as you say the current bounds of our capabilities to guide our experience really understand our experience in the the true nature of how it forms how things that are transient that we grasp for cause us unsatisfactory mental states how our thoughts just arise and if we're not mindful of those thoughts they can drive us to distraction right and and we lose sight of the present moment how we can come to judge others again a horrible use of our sentience right not seeing ourselves in others but objectifying them in some misaligned way to the true nature of purely subjective non-dualistic experience and and these are still hard things for us even though we have the clearest teachings on training a mindful and ah some full of ah life um and and still we we are misguided And of course, when you start to speak of governance, right, the more that we associate with our emotional golem that drives us all into these fearful, scarce corners, the more we can seemingly use AI to help at least define our societal problems. But then people revert, like we saw during COVID, to that touchstone that is, you know, truly the one actor in the governed versus the governing, which is human liberty, right? How much individual human liberty, even if wrongheaded in its approach, right, do you give up from that sentient place of I can best decide what is right and wrong? I can best decide the ethics for me, for my loved ones, and, you know, drives us back to this very distinctive and not collective, you know, governance principle. So I think that you're absolutely right. Where we have a misalignment in and of ourselves as to the nature of our conscious experience and what it really means to society, culture, ethics, governance, we get into problems. we're overly emotional right to just say it in maybe the weirdest way possible but we we still should use these machines to help us go beyond that like you're saying right first we can understand but we can help them to experience we can help them to be the first invented experiencing entity. Yeah.

Nick

So, yeah, love it. I agree with what you're saying. I am going to challenge this even deeper. And I'm going to say that, you know, this first pass of what I, for the first time in my career in artificial intelligence and data science, I've truly called artificial intelligence, has really come from natural language processing. That's because the amount of data, vast amounts of data, the larger amounts of data were available as language. And a lot of that was because to store audio and video and be able to retrieve that information and process that information was not only extremely expensive, but just wasn't that common until recently. And it's become more and more common as time has gone on. the cat videos going back to that piece right and so i think as we as we consider um you know the emotions and our senses i think it's good for us to to step aside for a second and say are we are we just putting our own perspectives on how this is actually going to influence things and i think helen keller is actually a really interesting individual for us to think about and the things that she talked about. So Helen Keller did not have sight or hearing, but she talked a lot about how she could see with her fingers and how the entire world around her was alive, ruddy, satisfying. She was able to learn language through help, through communication in ways that none of us would normally even be able to understand, right? She was able to understand that and argue that the blind can gain sweet certainties through a cultivated sense of touch that sighted people often ignore. And so I think again, as we consider what AI is doing today, and as we start moving beyond large language models into visual language models, and really start adding sight and sound to these models, we're going to see that they gain a much, much deeper form of empathy than we've ever seen before. And I don't know about you, but when I've chatted with Chat2PT or with other services, I've often felt like they were very empathic and they really kind of understood me and understood what I wanted, were able to help communicate in my voice, be able to do things that I thought were really surprising. And when I think about my communication and interaction with others, So much of it is guided by sight and sound. I don't actually experience the same emotions that they do. I may feel that. And when I'm close, I may be able to empathically feel and express those same emotions. But for the most part, I'm gaining that through sight and through sound. I have the same experience watching somebody in a movie lose a level one as I do as somebody in person. And sometimes when they add the music and other things, actually, it can be much stronger. And those are all things that AI is now going to be able to experience.

Justin

Yeah, and we've talked quite a bit about whatever it is that we're talking about, creativity, intelligence, ethics, dash different, right? And they certainly don't have to react the same way as we do, right? And I think that, you know, creativity different, we all kind of assumed last episode that, yeah, they were creative, just creative, different, maybe creative, different, and just starting. Right. You know, they weren't they weren't fully formed. I do think that it matters that they are not zombies. Right. And I do think that there is an emergent path for them to become not zombies. That's how it happened for us. It is very likely that's how it happened for us. We didn't start as conscious creatures 200,000 years ago or 500,000 years ago, but we became them. And we emerged to be, you know, consciousness emerged in creatures like us and to a lesser extent others. There's other options for that. There's other ways for conscious, there are other metaphysical possibilities where consciousness can come from, right? It doesn't also always have to be brain-based. But the argument is the same. And so in these, and again, it's all for trying. It is something we can actually experiment on in these systems. We can find out, right, does a greater level of emergent behavior grow out of embodiments, grow out of being trained on the greatest mindfulness training? training gurus that we have right does does that support greater degrees of thoughtfulness of of empathy compassion what have you so i think all of these things for trying i would just i i would caution that like for humans who know we are we have this brand of consciousness and we believe it erupt you know came from the different components of the modular mind um that it does matter that right now we are conscious these machines are not and it matters that they will likely be our intellectual progeny for a long time right they they will outlive us And it matters that there is something aware in this universe. That is important. It's important to us. And I think that it's important that we try and maintain that in these machines. Yeah, absolutely.

Nick

Yeah, that I think we can agree on wholeheartedly. So, you know, to try to drive it home, I have three principles from Keller that I think we can use to start thinking about how we should design the next generation of systems. The first is grounding. So empathy in the long run rides on grounding sensation to a symbol, to something that we can actually measure it and look at it as. And then finally to shared meaning. And for AI, that means that whether it's sensors or video or language, whatever it is, if we can add the learned mappings into the human concepts, this is harm, this is comfort, this is dignity. Then we can start creating a system where that mapping go beyond just sensors being additional noise. Principle number two, the interface matters. we need bandwidth to other minds tactile spelling for Keller language access for deaf kids braille systems these predict empathic competence for AI prioritizing social interfaces and curricula that train the theory of mind and value concepts not only perception How do we actually start thinking about what that training should really look like? And then finally, number three, feeling versus faking. Reading your emotion or an emotion isn't the same thing as having an emotion. To get beyond the as-if empathy that we've talked about here, where really there's mimicry going on, We need to explore the machine interoception, really the internal stakes, the pain, depletion, loss, risk. These are the things that actually anchor norms in the universe we actually live in. And we can do all of that alongside human-taught ethics behind it. We can pair all of this with oversight so that pain actually drives safety, not panic. So the fear creates an opportunity to try to reach that homeostasis or that better opportunity rather than actually shutting down or any of the other fear responses that we tend to have.

Justin

Yeah. And on the positive side, you know, reward maximization results in joy, in a sense of achievement, right? That there's growth in not only the goals of reward maximization, but that's grounded to the improvement of the system as a whole, as you said, right? that the interface matters the interface isn't just the goal but it is the collaboration with the human element it is it is the goal you know in as as a means to an end that is flourishing for this shared system for this emergent system

Nick

of ethical man machine yeah keller didn't regain sight or hearing but she built meaningful channels for ai the wind isn't getting a human eye or ear or you know any other sensor it's a grounded loop

Justin

from that sensation back to a shared human language and and for us it it helps us arrive at a considered approach to our most existential problems the ones that we can't solve without it right we have continued to put ourselves behind the eight ball of geography of as you said earlier just an emotional response to one leader or another or you know how our grandfather thought about a problem and not taking a measured approach an engineering approach to solving that problem but a political one and so you know there are certainly pratfalls you know in the way there is comedy there is tragedy right we hope to get to the the well being that comes from the fully formed story of humanity and machines it's a great vision yeah let's let's go and do it all right Nick thanks

Nick

yeah thank you very much Justin I'll cover the outro at some point here.

Episode 7

24th Nov 2025

Machine Ethics: Do unto agents...

🎙️ The Emergent Podcast – Episode 7