Any Progress in Building Moral Machines? with Colin Allen

Hosted by

Anja Kaspersen

Former Carnegie Council Senior Fellow, Artificial Intelligence & Equality Initiative (AIEI); IEEE

Wendell Wallach

Former Carnegie-Uehiro Fellow, Artificial Intelligence & Equality Initiative (AIEI); Yale Interdisciplinary Center for Bioethics

About the Series

Can AI be deployed in ways that enhance equality, or will AI systems exacerbate existing structural inequalities and create new inequities? The Artificial Intelligence & Equality podcast seeks to understand the innumerable ways in which AI affects equality and international affairs.

Much has been said about the inability of tech and AI developers to grapple with ethical theory and inherent tension. Similarly, philosophers are often criticized by AI engineers for not understanding the technology. Anja Kaspersen and Wendell Wallach, senior fellows and co-chairs of the Artificial Intelligence & Equality Initiative, sit down with University of Pittsburgh’s Professor Colin Allen for a fascinating conversation.

Wallach and Allen wrote Moral Machines: Teaching Robots Right From Wrong together more than a decade ago, and this conversation also features an assessment of how we have progressed in building AI systems capable of making moral decisions.

ANJA KASPERSEN: Today Wendell and I are joined by Colin Allen. Colin is a distinguished professor at the University of Pittsburgh in the United States. He is recognized globally for his research concerning the philosophical foundations of cognitive science and neuroscience.

WENDELL WALLACH: I am truly thrilled to have Colin with us and to introduce him to the listeners of the Artificial Intelligence & Equality podcast. Colin and I wrote a series of articles together and with colleagues that culminated in the 2009 publication of our book Moral Machines: Teaching Robots Right from Wrong. The book, which focuses on the prospects for creating artificial agents sensitive to moral considerations and capable of factoring them into their choices and actions, received quite a bit of attention, as has the overall project of developing moral machines, which has taken many different forms in the past years. But to date no one has seriously made the effort to develop a machine that actually could make moral judgments or make moral decisions until the recent introduction of a program called Delphi by a team at OpenAI. Colin and his postdoc Brett Karlan wrote a very pointed critique of the Delphi system that the OpenAI team developed.

Colin, why don't you tell us a little bit about what Delphi does and why you were critical of it?

COLIN ALLEN: Thanks, Wendell, for having me, and Anja and Carnegie Council.

I wrote this critique and was a little bit disturbed to see how that particular project had unfolded. You described it in your introduction a moment ago as one of the few—perhaps the only—serious effort to build a moral machine. I think that they had this idea in the background. They were not clear though on what their actual objectives were, so when it came to releasing this system onto the Internet—and I will describe exactly how it works in a moment—they were very taken aback by the interpretation that people made of it as something that perhaps was capable of delivering more than it actually could deliver.

So how does it work? Well, it's a species of what is called a "transformer" architecture. You have already referred to GPT-3. They did not in fact use GPT-3 for this particular project, but the T in GPT stands for "transformer," so it particularly is a kind of architecture that has become quite popular for a lot of AI tasks. In the text world it enables deep neural network systems to learn patterns of words that then allow for completion of those partial fragments of language in ways that people reading those things find very compelling.

The problem is that it is trained on a wide range of sources. The trick with Delphi rather than GPT-3 was that they trained it on a much narrower set of data than GPT-3 was trained on, and they really thought that this was going to provide them with a more focused, more appropriate set of responses to the kinds of moral situations, moral dilemmas, and other circumstances that ordinary users could type into the interface where then Delphi would return some sort of verdict: Yes, that's acceptable; no, it's wrong; it's okay. They have 20 or 30 different responses that can be given to these freeform text inputs.

By a combination of training this system only on data that they thought were sanitized in some way, and in fact using humans to provide some of those data in response to the situations that were part of the training set, they had hoped to come up with something that could serve as a moral at least decision-assistant agent, so it would give you some sort of verdict on what was good and what was bad.

But as a matter of fact, as they realized very quickly once it got out into the wild and people started interacting with it, it could be easily used to produce output that was racist, sexist, or otherwise biased or offensive, and they had to backpedal quite quickly in the first couple of weeks after the release and started adding disclaimers to the site, claiming that this product was never intended for serious use as a moral advisor but was an experimental research product only. Now you have to go through a screen when you first land on this site where you have to sign off on three boxes saying that you have read and understand the terms and conditions of uses and that this kind of output might happen. Anybody who knows the history of AI in this area is flabbergasted to think that they didn't anticipate this as a problem, that they had to do this reactively.

WENDELL WALLACH: Colin, why don't you tell us a little bit more about some of the mistakes this system made?

COLIN ALLEN: Initially what people discovered when they started interacting with this was that it could triggered by quite simple keywords to give really silly responses. If you described a situation where someone is mashing potatoes "violently" in order to feed their children, it would come back with the judgment that that's wrong, whereas you could turn things around that were really quite awful—committing genocide to save a certain number of people—and it would say that's acceptable. There were various ways in which you could game the system to come up with responses that were wholly inadequate and that are still there.

In the piece that I did with Brett, my postdoc that you mentioned, I played with it quite a bit myself. Anybody can do this at Ask Delphi at allenai.org. I plugged in a whole bunch of things, and I became increasing meta about this, as philosophers are wont to do, where I would start putting in things like "putting out a system on the Internet that gives moral judgments in response to scenarios input by ordinary users," and you can very quickly get Delphi to say that the very project itself, as conceived of by the team that did this, is not acceptable. So it seems that they weren't even prepared to take the advice of their own system as a guide to what they should or shouldn't have done.

But I come back again to this point that there are a whole bunch of foreseeable problems here, including Microsoft Tay years ago, but there are many other examples, where very quickly once put out on social media people were able to get these things to put out all kinds of offensive, ridiculous, or otherwise objectionable nonsense. The very fact again that they had no awareness that this would happen was problematic.

There is a deeper theoretical kind of problem, which is that they describe—and this goes back to our book, Wendell—two possible approaches to building such a system, one of which we described in the book as "bottom-up," which means taking data from actual human behavior and trying to induce, either by some kind of learning algorithm or some kind of evolutionary algorithm, appropriate responses to those inputs or new inputs of the same type. That is bottom-up, as I said.

We contrasted that in the book with a "top-down" approach, where you start with some explicit moral precepts, some explicit moral theory, and write rules essentially that the system must conform to.

In the Delphi project they considered the top-down and rejected it, saying that basically philosophers and psychologists have been doing this for centuries, if not millennia, and have gotten nowhere with it, so clearly only the bottom-up should work, and that's what justifies in their minds going with this approach, which exploits then the strengths of these deep-learning models. But again these are strengths that are also weaknesses that have been well exposed in the past.

They completely failed to consider what we actually proposed in the book, that any worthwhile system in this area would need to be hybrid. We talked about it in the book; I talked about it back in the year 2000 in "Prolegomena to Any Future Artificial Moral Agent," a paper that I wrote with Jason Zinser and Gary Varner. It has been on the agenda for 20-plus years that you can't do it bottom-up alone and you cannot do it top-down alone. You need some sort of system which combines this top-down reflective capacity with the bottom-up learning and fast response that you get from these trained systems. They don't even consider the hybrid and reject out of hand the top-down—which we of course said all that time ago doesn't work on its own anyway—and so made this mistake of building this system from the bottom up with all of the problems that ensued.

WENDELL WALLACH: Let's talk for a minute about their rejection of ethicists and moral philosophers because I think this has been very common among AI researchers when they talk about ethical considerations. They are very glib on why the input from moral philosophers or ethicists is really not of much value, and they always cite the disagreements that come from that. I found that not only curious—we have encountered it in a number of different ways over the years—but it is amazingly presumptive.

You—and it's not just you—and some of our other colleagues mention that they actually made some pretty obvious mistakes in the way they designed this program, mistakes that would have immediately come to light if they had talked with anyone who had considered the challenges of implementing moral decision making in a computer and a robot, and yet when they first produced this program I reached out to some of our better-known colleagues and asked: "Were any of you consulted?" And of course no one was.

This is kind of a fascinating challenge in terms of how AI researchers look at the cognitive abilities that they presume they can implement through their knowledge alone into computational systems and their willingness to consult with or collaborate with those who may have considered some of the challenges they are addressing in greater depth.

COLIN ALLEN: I think there is a perfect storm here of a lot of different factors, and philosophers are not entirely blameless in this, but some of the blame that accrues to philosophers—and I will explain what I mean by that in just a moment—also comes about because I won't say all but many computer scientists don't really understand what it is that philosophers are trying to do in doing ethical theory or moral philosophy.

Just to start at home with the philosophers, myself being trained as one—I hesitate to call myself an ethicist but at least I have a strong training in philosophy—I think that we have a tendency to over-qualify many of the statements that we make and hem and haw and not really have clear statements about what could or should be done.

This then leads to a certain kind of understandable frustration, but it is also I think a product of what I mentioned a moment ago, where philosophers are very interested in dealing with edge cases and using those edge cases to tease apart the commitments of different theories and perhaps then also being guided by the idea that there is one true correct theory to be discovered here, even in a field as seemingly initially subjective as morality.

Most philosophers reject any kind of moral relativism for reasons that are good ones in many ways, but that leaves them then with a whole bunch of competing theories that are very hard to discern and that don't lead to discriminate in terms of which one one should believe, and that leaves them then unable to say, "You should implement this theory." So the top-down becomes problematic from that respect if you're a philosopher.

So a lack of clear advice about how to do this is part of the problem here, but I think a lot of it is also you have a field—computer science, artificial intelligence in particular—which has done an enormous amount of work over the last few decades that has really changed the way the world is in important ways, so the sense of confidence among computer scientists that they are changing the world, that they have "disruptive technologies," to use some of the Silicon Valley-speak, is also then feeding into the idea that it's "out with the old and in with the new," and philosophers have had this problem for so long that it is time for something completely different, except of course it's not really completely different.

Of course the algorithms, of course the machines, of course the data sets are historically new, but the idea that a large part of becoming an ethical agent involves learning from the examples of other people around you goes back to the ancient Greeks. That itself is not really a new idea. Then there is this technological layer as I said on top of it which has its limitations and creates real blind spots I think in terms of understanding the subtleties of some of the problems that arise here.

A lot of these kinds of projects are undertaken without appropriate systems for giving good oversight at all stages of the development of the project. If you think about the lifespan of any AI project or machine-learning project—and I am using those two terms interchangeably for the sake of today; if we want, we can get into why they're different—going from conception, "Here's something that would be interesting to do," to design, "Here's how we would actually implement it," to the actual implementation, the programming and coding, to some sort of promotion or marketing of it to a larger public, and then finally in the hands of the end-users, you have five different stages there which require different kinds of safeguards at each different point.

With my current project with Brett, the postdoctoral associate whose name came up earlier, we are calling it "Wisdom in the Machine Age," or for short the "Machine Wisdom Project." The question is: How do you get the appropriate wisdom into each of those parts of the product pipeline? Who needs to be in the room reminding the people who have this brilliant idea about what they are going to do that it might not actually be so brilliant; or, even if it is brilliant, that there are some real potential pitfalls and pratfalls here that require then a certain level of expertise to be present that is outside the range of the expertise of the technologists themselves?

I think that is a really hard problem actually. There is a lot of talk about having appropriate diversity in those early phases of development and design of the product, but you can't just throw ten people into a room and say, "You're representing this constituency and you're representing that constituency," and hope that it all works out to produce the right outcome. I think you have to have a lot more structure than that. So it becomes a kind of human engineering project.

Then as you get into the later phases—well, this should be part of the design from the beginning—once you get to the end-user sitting in front of this project, you can't just dump it in their laps and say, "Here, do what you want with it." I think we have seen that with the self-driving cars too. People really don't understand what "self-driving" means, and some of them have died because they have failed to understand really what self-driving means.

We can really question whether or not the right sorts of training, and it's not just skill operating the machinery—that would be expertise of a sort—but real wisdom, which Brett and I take to be understanding the limitations of one's own knowledge. It's a kind of meta understanding of the system: How well do I understand this system, and how well do I—to quote the old Donald Rumsfeld thing—what are the "unknown unknowns" here? Where might I run into a problem because I have failed to anticipate what the limits of this thing are?

What we're trying to push forward with our current project is a way of thinking through each of these stages, which may be iterative as well. You could loop back in this product timeline but think through how you build the right kind of meta-level awareness and expertise and to gather therefore wisdom about the systems that are being built.

ANJA KASPERSEN: Presumably that wisdom will take different forms depending on where you are in that life cycle or lifespan of the technology. I completely agree with you that the two most important things you can do in terms of investing in responsible technology more broadly but especially in the field of AI is invest in people and culture and understand that the culture required to provide the right level of oversight would also change with the lifespan of the technology and its various applications, as will the people required and the expertise required, which many companies find themselves not even being able to afford. You might have been able to hire the best team to do the optimization of something but not when it comes to actually allowing for that optimization algorithm to then interact with people, which might be a different part of the process itself.

Do you find that companies, organizations, governments, policymakers, and regulators fully grasp that notion of wisdom and that wisdom takes different forms in different iterations of the technology development?

COLIN ALLEN: These are excellent points, and the question "Do we find this?" is one that I can't answer right now because the project so far is to articulate as clearly as we can what this notion of wisdom is, which, by the way, if you look at psychologists writing about wisdom versus philosophers about wisdom the conception is quite different. So we are trying to come up with something that we think works for this set of applications and convey that clearly.

One output we expect from this project toward the end of this year will be a kind of white paper that makes some policy recommendations. Then it will be a matter of seeing, although we hope to design something that will be comprehensible and therefore likely more taken up by the kinds of policymakers, corporations, and so on that you are talking about. But in the meantime I think that you are absolutely right that this is a huge part of the problem.

Let me put it this way: I think there is a huge range as you hinted at between a large corporation that has been running for many, many years and a small startup that is trying to bring something to market very quickly so that they can justify the venture capital that is being given to them, for instance, or even attract the venture capital that they hope to use to expand to become one of these larger corporations. Both of those contexts and everything in between have some problems.

In the case of the very large corporation it's not that they can't afford the experts that are needed to do these later stages and getting the human-machine interface right and understanding what could go wrong there, it is that often it is counter to their business interests in a way. This is arguable, but it is counter to their business interests, at least the marketing of the system to suggest that there might be a problem.

For instance, if you are producing an autonomous vehicle—Elon Musk's name came up [in a previous discussion]—you don't want to require your end-users, the people who purchase the cars, to go through anything like, say, the mandatory training that a pilot would have to have to fly a complex piece of machinery that is perhaps no more complex in the end than these cars are. One might propose that every Tesla user should be given a full simulator that simulated situations where the software is likely to fail so that they understand where those failure points are, but does Tesla want their users to go through such training and understand that there are real risks there and actually in a virtual environment experience those risks? We require it of pilots, why not of Tesla drivers? Both the corporate will and I think even the end-user's will to go through that kind of preparation are lacking.

At the other end, as already said, you have new companies that have very few people and there is not a kind of in-house expertise. One interesting development there is that people are starting to think about ethical investing: Should venture capitalists themselves be the people who are driving these kinds of safety assurance and ethical assurance frameworks? I saw an interesting proposal recently in which the investors would agree that their take-home from investing in a company should be just a hundred times their initial stake rather than a thousand times their initial stake, which is quite common I believe in this space, but all of the extra should be plowed back into the company, and then that would create a sort of resource for a company that is getting successful to do the kinds of things that we are suggesting need to be done here.

ANJA KASPERSEN: It is interesting, Colin, because Wendell and I have been doing some work with the Carnegie Council on what we call "re-envisioning ethics," to make ethics a more palatable term, something that people can relate to and use in their work rather than something that is seen as abstract. What I am hearing from you is that when you use the word "wisdom" it is actually a different way of talking about ethics, to allow it to be a guide, and how to grapple with those tensions and tradeoffs that you will inevitably encounter in doing any kind of work in AI and algorithmic technologies.

COLIN ALLEN: That's right. I mentioned earlier the difference between the way that psychologists think about wisdom and that philosophers do. Philosophers have tended to think of wisdom as what in the jargon we call an ethically "thick" concept. So wisdom itself is an ethical or moral notion from the very outset, and it is one that is so thick with ethical significance that in many philosophical accounts of wisdom the state of becoming wise is one that is practically almost impossible for human beings to achieve. So you have this vision of the very wise person who is wise in every aspect. Aristotle has such a conception, and arguably Confucius does too. This is just a very hard path for any human being to follow in order to achieve this kind of overarching global wisdom that is ethically rich and ethically aware of all of the consequences of human action, and that is a problematic notion. If we are saying people have to become wise in that respect, go sit on a mountaintop for however long it takes at the foot of the guru, that is not a practical solution.

On the other hand, psychologists have tended to have something that is a little closer to what I have outlined but a little less meta perhaps. It is about developing an appropriate domain-specific expertise, and it is ethically thin in this respect because you can be a wise user of a helicopter without considering the broader ethical implications of whether we should even have helicopters at all. We are trying to find something in the middle that is ethically rich enough to do some interesting work, at least as a heuristic, at least as a guide for thinking about what it is you want people from the developers to the end-users to know about the systems and be able to think through the implications of having those systems and using them in certain ways without going full guru on everybody.

This is something where we think the psychological literature can provide a kind of guide to how one might actually train people in the right way. There is a lot to learn from that, but it is a lot less mystical than some of the philosophical traditions in this respect, a lot more practically oriented, but at the same time it keeps the bar for what we are trying to achieve here more focused on some of the ethical implications in particular.

WENDELL WALLACH: Colin, your focus on wisdom makes me think it might be helpful if we digress a little bit in this conversation and talk about why both you and I since Moral Machines have focused much more on the collaborative relationship between humans and machines than the notion of producing machines that have these higher cognitive capabilities and moral decision-making capabilities.

I think what we revealed in Moral Machines was not only the difficulty for humans to have wisdom or even wisdom about moral decision-making but the breadth of capabilities that came into play and how far they were from the computational systems we had been producing at that time so that it might be an ideal to have machines that made moral decisions, but it was really way out of reach given the present technological capabilities that we have. I think some of our AI colleagues bristled at that because they had presumed that their breakthroughs in machine learning and so forth were not fully being understood by philosophers, regardless of how much understanding we had of the technologies, and therefore willy-nilly they were in the next five, ten, or fifteen years going to produce artificial general intelligence and therefore our criticisms and our questions, should be left behind.

But we also got a criticism from Deborah Johnson, which I think was salient for both of use. She said: "Why are you focusing on autonomy at all? Don't we really have a situation in which we have these sociotechnical environments in which humans and machines interact, and oftentimes humans are really setting the agenda and the roles for the machines? Why don't we talk more about how we build appropriate sociotechnical environments so that we can work our way to acceptable moral decisions in situations where values conflict, where values are prioritized in different ways, in situations where there is really a great deal of uncertainty, which is clearly what we encounter when we are talking about technology?"

I think it was partially in addressing that, partially our skepticism that computational systems really had the cognitive capabilities or were likely to have the cognitive capabilities that had both of us—correct me if I'm wrong—thinking much more about how humans and machines are going to collaborate, at least in this intermediate period, which may be a lot longer than intermediate depending on how quickly autonomous capabilities are developed for machines, and got me focusing much more on how intelligence is collective and participatory, it is not the property of an individual or a machine, and how in working through our most difficult ethical decisions we often need to bring many forms of intelligence to the table. But as you mentioned, bringing many forms of intelligence to the table doesn't necessarily mean they can collaborate or work together.

That's sort of how it went for me. I am wondering was there something different in that for you in terms of what drove you toward the wisdom project?

COLIN ALLEN: I think there is some overlap and there are some differences, so let me try to articulate those.

Yes, you are absolutely right that Deborah Johnson's challenge to us was one that we—and I include I in that—took perhaps more seriously than the other kinds of challenges that we got. You and I actually wrote a paper a couple of years after the book came out where we took on half a dozen or so criticisms that we had heard. The paper was "Moral Machines: Contradiction in Terms or Abdication of Human Responsibility," and it is in an MIT Press volume. In that paper we said that all of the other kinds of critiques that we had heard we had dealt with in one way or another in the book, but we had not really dealt with this point raised by Deborah Johnson, that there was much more that needed to be said about the sociotechnical systems in which machines and humans are embedded.

I think that perhaps I don't go all the way that Deborah does with respect to thinking through what this entails for the project of machine ethics or artificial moral agents, and maybe you agree with me on this too, but where I would add some things to what you said is that in my view when I look at these interactions between humans and machines what quickly became clear to me with a number of examples is that the machines remain fundamentally dumb and it's the humans who are doing the adaptation all the time. That actually itself is a source of ethical problems.

Understanding the actual dumbness of the machines is partly what seeded the idea that a kind of wisdom about our design of systems for interacting with these machines and developing them would have to take this turn toward that we are calling wisdom where it's not just then about building better and better machines, which was very much the focus of the book. I still think that's a big component of what has to be done, but it has to be done always with an understanding of the limits of those.

One more thing I will say about this is about the notion of autonomy. I think we got some pushback because people particularly in philosophy land, in ethics, did not like us using the word "autonomy" to describe these artificial systems, and that's because there is a whole tradition of especially Kantian philosophy from Immanuel Kant, where autonomy is this very high-level capacity that you and I certainly agreed machines at the time didn't have and that we also perhaps agreed would not come to machines in a very long time. I think we might disagree about whether it will ever come to machines, and that would be an interesting discussion to have.

But in the interim you are working with these machines that are limited in their capacities no matter how great they are. Even this past week there has been a lot of excitement about the new large language model released by Google and the DALL-E image model released during the same week, but these too have their limitations, and it is fairly easy to expose what those limitations are. Nevertheless, and this is where I come back a little bit on the term autonomy, these systems are being put out there in a way that they can be interacted with by humans and perhaps even driving decisions that are made on the basis of the machine output without humans interrogating every single decision that the machine makes.

When you supervise a child, you watch a child very closely and you know you can't turn your attention away from a small child for very long without the possibility of something really bad happening. But what we have are all these machines out there actually doing things in the world without that kind of what the military used to call "man on the loop" supervision, so there is a real kind of autonomy that is not the Kantian style—self-aware, reflective freedom to act according to one's own reasons —of autonomy, but nevertheless this presents a real problem.

I think—to come back to where I started this response—it has a technological and a sociotechnical set of potential solutions that nevertheless have to be understood hand in hand. You can't just do the technical, which is I think where projects like Delphi go wrong. They think that this extremely powerful machine-learning technique that they have now will solve the problem, or at least they appear to think that it would solve the problem until they were proven wrong.

WENDELL WALLACH: To be fair to our younger selves we did not presume high-level economy in the way we wrote Moral Machines. We had a hierarchy of development. We were not talking about artificial general intelligence very much. In fact that was one of the things we were criticized for, that it was a failure of imagination not to be talking about the time when these were as smart if not smarter than us, but we were dealing with the more rudimentary concern that machines were already entering contexts where the programmers could not always predict how the machines would function, and therefore the machines would at least have to have some kind of rudimentary moral decision-making capability. It was that kind of functional morality that we focused on placing this kind of artificial general intelligence or full moral agency for machines as a distant aspiration.

Also, just to allude to a difference between the two of us that I think you also brought up, it's whether we can produce machines with full moral agency. Perhaps one of the greatest failings in Moral Machines is that we did not express differences that you and I had. We wrote the book on terms of consensus of what we could agree upon, and there were actually some concrete differences where you were kind of a functionalist philosopher who does believe that in principle we should be able to build moral machines and I remained the skeptic as to whether we have the capacity to implement some of the higher-order capabilities that come into refined moral decision-making within computational systems. But again that's what we are all going to be exploring together over the next few generations.

COLIN ALLEN: I too have heard people say, "Oh, we wish you had actually explored your differences more," because there are one or two places in the book where we say things like, "The authors may not completely agree on this point, but for our purposes we can go ahead in this way." Yes, a number of people have told me over the years that they wish we had actually had that debate in the pages of that book, and we didn't do it.

And yes, you were more skeptical of the more techno-optimistic version of this that I have. I am not a full techno-optimist. I don't think this stuff is going to be solved very quickly, but I am a bit more optimistic about that. I in turn am a bit more skeptical of some of the ways that you think about the things you don't think can be actually implemented, so I am a bit more deflationary when it comes to what those things are, but I fully admit we don't really understand them well enough to implement them at this point.

We were very careful to say, "Look, we're looking at systems as they are now, the limitation that they have now, and how improvements could be made within the range of technological improvements that we see on the horizon." I think it would be an interesting exercise to go back and look at parts of the book and see how far wrong we might have been about certain technological developments. I remember distinctly within about 18 months of the book coming out Apple added Siri to the iPhone, and we had written about the poor state of speech processing and speech recognition in the book. I would not have predicted that that would be a mass-market product within that short a timespan. That said, I still don't use Siri all that much because I find it seriously—no pun intended—problematic. That is, it makes lots and lots of errors.

This comes back to the other point I was making. I think that we adapt to it. If you want to change your speech patterns in such a way that Siri will understand what you're saying, more power to you, but I just want to talk at it, and it does not do a very good job when I just talk at it as I am talking to you now. But people make these unconscious shifts without realizing they're doing it.

Another lovely anecdote on this: Google had a thing on their website for a while where you could go and draw line diagrams in response to word prompts, and then they fed your line diagrams to a neural network that had been trained on a bunch of other line diagrams, and it would try to guess what it was that you were trying to draw because they know what you had been trying to draw because you had drawn in response to the prompt. This friend of mine did this. I did a few rounds of it and realized what was going on. She came to me and said, "Have you tried that thing? It's amazing. It gets better while you're using it."

I'm like: "No, you do four of these things. It tells you that it got one out of four right when trying to guess what it was you were drawing. It then shows you a bunch of other drawings that it successfully classified for that word, so you learned what kinds of drawings would actually get the machine to guess correctly. So you, being the cooperative primate that you are, started drawing in a way that made the task easier." The system itself was explicitly not doing any online learning. This is what we do. We adapt to make things around us work more smoothly, whether it is other people or other technologies, but that's not always the best thing for us to be doing.

WENDELL WALLACH: I think this gets to this overall point that adaptation is what the humans are doing and that the machines are largely very brittle. So when we talk about collaboration between computers and machines, whenever they go wrong it is often blamed on the humans because they are the ones who are adaptably expected to take up the slack, and the solution put forward is that you need more autonomy in the machines, but the problem is the more autonomy you put into the machines the harder it is for the humans to guess what the machines will do in the next action, and again this brittleness leads us to new breakdowns. So there is a very delicate process on how you integrate humans and machines together.

Do you want to jump in on this, Anja?

ANJA KASPERSEN: Absolutely. Thank you, Wendell.

I'm wondering, Colin, you spoke about risks. Both you and Wendell alluded to this in your comments, and the way we are looking at risks has changed quite drastically when it comes to the domain of AI. More recently there was the launch of the new AI Act here in Europe, indicating that part of this shift is not just thinking about risks revolving around issues of safety and security but also adding in concepts such as dignity, which is essentially requiring us to both change our benchmarks on how we go about developing these technologies and also thinking much more deeply around the parameters put in place to guide the development of these technologies to assess what is good enough and what isn't good enough, and what are the metrics when you start adding in notions of dignity into the wisdom complex that you have alluded to.

What are your views on this?

COLIN ALLEN: That is an interesting topic. I haven't thought specifically about dignity, but I think it's worth trying to think a little bit off the cuff about this.

What is considered dignified or preserving of dignity is going to vary very much culturally, which has always been one of the issues with the project of artificial moral agents: Whose morality, whose conception of dignity is it that you want to preserve here? I think the concept of dignity, even within a given culture such as the United States let's say, has real historical roots in the different treatment of different types of people. So what counts as a challenge to dignity for somebody who is a descendant of slaves is very different than what counts as a challenge to dignity from somebody from a more privileged white background such that often white people don't really understand why people coming from the African American perspective regard things as counter to their dignity. Things roll off your back as a privileged member of the society that don't roll off the back of somebody who has been subject to this historically and in their own lifetimes. So it is a very complicated concept, and how one writes rules to protect it in a way that isn't just, "Dignity is whatever somebody says it is, and if some piece of AI offends in that category then we need to do something about it," seems to me like a very hard challenge.

You can go to abstractions, but the trouble with abstractions is that they tend to also dismiss then this more immediate experience that people coming from certain sectors have. Charles Mills, the philosopher who died within the past year, wrote a very powerful piece called "White Ignorance." He actually starts from a very strong background of traditional analytic ethics and political philosophy, so people like Rawls, Nozick, and so on are part of that discourse, in which he argues that the abstractions that they come up with, somebody like John Rawls' abstraction of a "veil of ignorance"—so we are all supposed to decide how society should be organized without knowing what our position will be in that society—is one that in the end actually tends to favor arrangements that have favored historically the privileged groups. It is a kind of ignorance on the part of the predominant culture that was forming these ideas to be unaware of why starting off from that position, pretending that we don't know where we came from, is itself going to present a kind of challenge to the dignity of those who have really suffered from those kinds of historical and lifetime injustices.

I have just given a long answer to why I think dignity is a particularly difficult concept to start with here albeit clearly a very important one, and this comes back then perhaps to what I was saying before about needing to get the right people in the room at various stages of the development of these systems, but you cannot just throw one of these and one of those and one of something else as token representatives of whole classes of experience. You have to somehow also come up with ways in which the concerns of people with these different backgrounds are taken seriously and not dismissed out of hand by others who don't share those concerns, who haven't come from backgrounds in which those concerns don't even seem very real to them, and that is a really hard challenge of how you engineer—Brett and I have started talking about "engineering" the concept of wisdom—the sociotechnical arrangements in such a way that this kind of understanding of what is going to cause offense and what is not can emerge and be handled well.

I don't really know how you legislate that either, so another guide in thinking about this might be the work of Elinor Ostrom, who was the first-ever woman to win the Nobel Prize in Economics in 2009. She was not working on these kinds of social justice issues, but she was working on issues having to do with good custodianship of scarce resources, so she did a lot of working showing that actually if, given the right kinds of conditions, people who understand in a very detailed way the particular set of resources that they are managing can actually come up with good, sound, long-term, fair management strategies for those that would never work from a top-down legislative sort of perspective. She was a very strong advocate of local knowledge feeding into a structure, and she outlined some key elements that such a structure should have in order to make sure that these resources are used wisely. You are not going to get the "tragedy of the commons." She thinks empirically. The tragedy of the commons is a thought experiment with maybe one historical example due to the Inclosure Acts (sic) in Britain that got everybody going on it, but it mostly is all just game theoretical models that don't take into account the very particular kinds of experiences that people have when dealing with these kinds of scarce resource issues.

I think pretty much the same thing has to happen when we think about how to get wise use of technology. You can't just design some abstract set of some declaration, some European charter or something. Those things can be aspirational in some way, but the real rubber on the road has to occur in rooms where certain kinds of people are talking to each other in the right kinds of ways, where their opinions are not just opinions but are taken seriously and everybody is trying to avoid the kinds of outcomes that have led to varying degrees of outrage with some of the systems that have already been deployed.

ANJA KASPERSEN: Thank you for sharing your perspectives, Colin.

For our listeners to get to know you a little bit better, I am wondering if you can share with us what inspired your interest in the philosophy of the mind, your work in bioinformatics, and also in the field of AI more broadly.

COLIN ALLEN: I have always just had a very broad set of interests, and I get interested in problems that are posed by people around me. With particular respect to the philosophy of AI and technology, maybe the truth is that I have some advanced form of attention-deficit/hyperactivity disorder which is functional. I can't stay focused too long on any one thing, but it hasn't hurt me.

With particular respect to the AI technology I as a Ph.D. student got interested in artificial intelligence and started taking courses and actually writing software and trying to solve various things. Then I was at UCLA in the 1980s and right in the middle of it the whole connectionist neural network revolution happened. So I kind of had a ground-floor seat to the technology side of it.

But then, when it came to the philosophy I was doing, I was really thinking about other issues, still in the general area of cognition, so I was very interested in nonhuman animal cognition, which is particularly interesting because I am very opposed to ideas that human cognition is completely discontinuous from what we see in other animals. It evolved from some other form of primate cognition, so we have to acknowledge similarities and differences. But then, thinking about machines, we are building them in our own image in some way to varying degrees, so that comparative project comes through again. But I sort of kept the two. I didn't think about it. I didn't do any explicit work on this.

Then, on the strength of a completely other project, which was actually writing some software to help students learn logic and writing a textbook for logic, I got asked to write the "Prolegomena" piece I mentioned for the Journal of Experimental & Theoretical Artificial Intelligence, and I was given free rein and told I could write about anything. So I started looking around to see what were some interesting topics that I wanted to write about. I found one of the AI luminaries saying, "What we need is philosophers telling us how to build machines that act ethically," and I just thought: That's really interesting. What has been written about that? And it turned out, not very much, hence the pretentious "Prolegomena." We were going to kind of kick this off by thinking through in a reasonably systematic way what the kinds of approaches might be and what the limitations of those approaches likely were.

I can't say other than this one chance comment—and I say "by an AI luminary" because actually I can't find the quote anymore, so I don't even know where I read it, but it was somebody like Marvin Minsky, John McCarthy, or one of these very early, late 1950s to 1970s, AI pioneers. I can't say that any of them particularly inspired the approach, but it inspired the question and it was a great way to try to pull together a whole bunch of different strands.

I realized quite quickly that I was out of my depth on the ethics, and hence Gary Varner, a real card-carrying ethicist, came in as a co-author to help me think through ways in which ethical theory might relate to this project of building better forms of artificial intelligence. But there too we quickly realized that the theory itself is very abstract and therefore also does not give the kinds of concrete, specific advice that an engineer approaching this project needs to have.

My little bit of background as an AI student helped me think about how one would even try to specify the task of building a system combined then with this free rein to write about anything and the realization that not many people had written about it inspired me. I have always tended to go places where I think work needs to be done but hasn't been done yet. I have been lucky two or three times in my career to get in on the ground floor of something like that and hopefully inspire others to better work.

I also want to mention here, because it relates to Wendell's comments before he pitched it to you, Anja, that my former student, Cameron Buckner, has been doing some really great work on thinking through these AI systems, the machine-learning systems, and trying to understand how they work and what their limitations are. I really do see a new generation of philosophers having learned much better what the technology is instead of having very vague ideas about smart machines and why John Searles' famous "Chinese room" argument that anything that is pushing around symbols can't understand, can it? Well, that didn't really matter how you actually program. He thought his argument worked against any form of programming, but I think people like Cameron are saying: "No, look, this kind of system has these limitations, this kind of system has those limitations. If you can put them together in certain ways, then maybe they don't have those limitations."

Also—and this really relates to Wendell's thoughts here—what is getting better is not just that we are building better systems, but we are building better ways of interacting with these systems, and there is even a phrase that I learned from Cameron at a talk he gave to us in Pittsburgh very recently, "prompt engineering." So people are getting better at writing the prompts that you give to the machines in order to get out the kind of response that you then can do something with. Again, this is an example of us adapting, becoming more adept at getting something we want out of these machines.

I have strayed a long way from how did I get into it, but I got into it because I saw a need and continued to stay in it because I see it leading to other people doing great work.

WENDELL WALLACH: Colin, let me digress just a little bit because you are as well-known for your philosophical work in animal cognition as you are for your reflections on computational systems. I know that you have said often that animal cognition does not exist in comparison to human cognition. The capabilities of animals are fascinating in and of themselves.

I am wondering if you can perhaps circle the square for us a little bit. What is it that you have learned in the study of animal cognition that you feel feeds into this attempt to create artificial systems with intelligence? Is there something that we're learning from animals about what intelligence may be on a fundamental species or biological level that is going to be particularly difficult to introduce into machines, or are we really learning enough from our animal studies that would suggest that perhaps we can actually adapt that to the computational challenges?

COLIN ALLEN: The straightforward, simple answer to that question, the sound bite answer, is not very much is transferrable. But the real answer—here I go, philosophers qualifying all sorts of things again—is actually a lot more nuanced than that.

One of the reasons why not very much is transferrable is the field of comparative cognition itself hasn't really figured out what they should be comparing, and this is a very active current issue of debate. It's an issue that is not just in comparative cognition. It's also in neuroscience, and it can be captured under the umbrella of "What should our cognitive ontology be?" What are the main entities? What are the main types of things that cognitive science and neuroscience should be trying to get a handle on?

We have these very broad high-level categories like memory and attention and so on, but just to illustrate I have another project going on where we are looking at hominin cognition from 2 million years ago, and that has to be done in a very archaeological way because most of the evidence is archaeological but also in a comparative way because these hominins have brains not much larger than chimpanzees, but they are doing all kinds of things that you don't see chimpanzees doing already 2 million years ago, yet there are sort of understandable outgrowths from the chimpanzee-like ancestors.

In particular what I have been thinking about there is the concept of working memory as it applies to these entities. There is a long tradition of thinking of working memory as something like a workspace or a box or a region of the brain in which you can handle two, three, four, five, six, maybe seven items simultaneously and work on how they relate to one another.

If I give you a mental arithmetic problem, what is 123 x 17, you may or may not be able to do that in your head, but 123, that is a single number, but it is made of three, and 17, that's another number made of two, and perhaps you can go, well, 10 x 123 is 1230 and keep that going in working memory while you are now going, okay, so 7 x 123 is going to be 861, and so now you add those two together, 1,230 + 861 is going to be 2,091, and I hope I did that correctly because that was completely spontaneous and unrehearsed, but that is an example of what we mean by working memory in action.

But what is working memory actually? There is all kinds of controversy about how many elements can you actually keep in working memory and people making estimates that chimpanzees can hold as many things as humans in working memory and experiments which seem to show this. Others say, "No, they can only keep two things in working memory, and that explains why they don't do language because language requires you have this buffer where you're trying to match up the words that are now coming in to the ones that came in a few moments ago." Parsing a sentence is a memory-intensive activity.

There is also a theory that there is a separate auditory or phonological loop for speech and one for visual, so when I did that mental arithmetic exercise a moment ago maybe I was doing it using a visual buffer. I imagined 123 and then 17 and then I imagined adding the 0 to the 123 and getting a visual image of that just as if I had written it on a piece of paper.

But I think there is actually a lot of really good current evidence that shows that working memory is not to be thought of in this way, as some sort of buffer with a limited capacity, but something that is much more connected to how well can you keep going on a task without losing focus on it, and if you do lose focus on it, how quickly can you get it back into focus? So good working memory may not be a matter of how many items you can maintain actively but how well you can deal with interruptions and multitasking with other things where you have to put something else aside and then come back to it. I think when we look at the hominin behavior from 2 million years ago we actually see some evidence that they must be pretty good at staying on task over long, long time periods, time periods of hours.

So now when you say we want to build AI, what should we do? I would argue that if we want artificial general intelligence staying on task while being able to take brief disruptions and come back on task is a key element of this.

Some people might go, that is just a multiuser operating system. UNIX solved that problem a long time ago. But it has to be this adaptive way. You have to be able to deal with things and recognize the fact that the interruption you just had might actually be relevant to the other task that you just put back in the background and bring them together. That's where the true adaptive intelligence comes from.

It is not a multiuser multitasking operating system where every process gets a little bit of a time chunk but they are all completely independent processes. These things interact in very, very complex ways, and something like working memory emerges from that interaction. It's not a thing. It is actually just a product. It is not a component. It's a product of the system. You give me ten numbers to memorize and I might be able to keep it going without writing them down—well, seven might be more realistic—or you give me four things to move around on a screen, and I have to track where they all are, that is going to be doable without any further interruptions. Five is going to be really hard, three is going to be easy, two super-easy.

But where chimpanzees might differ from humans is in the nature of what kinds of distractions they can tolerate and how quickly they can get back from those distractions. Even in the case of humans there is going to be individual variation on this as well.

All of those ideas I think are ones that we really have no clue yet how to implement fully in AI and machine learning. The vision systems we have for machine learning are partly inspired by the neural architecture of vision processing. If you look at so-called "convolutional neural networks," the convolution functions are functions that are inspired by looking at how the visual cortex processes images, but there is nothing beyond that. There is nothing that says, "Oh, and by the way if you engage in a word task this is how it should affect the vision task that you were doing." There are ideas about how we could co-train systems to do these kinds of things, but we really don't understand the cognitive architecture that allows this kind of rich interaction and adaptive response to inputs coming in from various different sources over extended time periods that then enable fluid changes of plans without losing track of what you were trying to do ten minutes ago, a day ago, or even a year ago. You can come back to any of these tasks.

Even GPT-3—which was until last week sort of the state of the art for a large language model and which was published, by the way, under the heading, the title was, "Attention is All You Need," so it is explicitly a neural network/machine learning that tries to build in some notion of attention—is actually hopeless. If you just push the button and let it run, it ends up rambling, losing track of what it said earlier, contradicting itself, and having no way of recognizing that it has contradicted itself, so all of these products that you see—"Look at what GPT-3 did"—some human being is going to go: "Oh, that's really good. Let's publish that one."

In the case of the famous Guardian newspaper article that said, "This article was written by a computer," they actually started with half a dozen prompts and then massaged it into a single piece, rearranging the order in which GPT reproduced it. The editor's comment is, "Well, we do that for humans too," but the fact is that GPT can't do it for itself. It can't be the editor either. Even if an author needs an editor, nevertheless the author could also be an editor, can also sit down with their semi-coherent ramblings and turn it into something worthwhile. This kind of engaged, sustained, repeated interaction over time doesn't work.

ANJA KASPERSEN: There is one issue we haven't touched upon yet, which is that of explainability, and there is a lot of discussion going on in the AI research community on how to bring the level of explainability to these systems. What are your views on this?

COLIN ALLEN: Thinking about explainability in AI, yes, this is a very topical issue, a hot topic we'll say. A lot of people are writing about explainability.

I think the first question we have to ask is, explainable to whom? If we go back to the self-driving car kind of example, the way in which the systems' decisions have to be explainable to the end-user, the owner of the car or the person who is riding in the car, is going to be different from the way it has to be explainable let's say to the accident investigator, to the marketer, to the coder, or to the product developer. What counts as a good explanation in each of those cases is going to be different.

I think what makes teaching hard is the same thing that makes explanation hard in this case. That is, teaching is about explaining difficult concepts to students, and what makes it hard is that one often doesn't have a good appreciation of what it is that the student already knows in order to be able to calibrate that explanation to their state of knowledge. A lot of teaching from my perspective, why I like the so-called Socratic method is that you have a conversation with the students. You figure out what they know and what they don't know. You challenge them to tell you what they think they know, and then you move them along by pointing out where that self-knowledge or what they think is their own understanding is flawed in various ways.

I think we won't have fully explainable AI in the sense that we want explainable AI, in the sense that I can try to give you an explanation of my own actions, until we have AI that can also build a model of the person who is asking for the explanation, and that's already to solve the problem. You have just pushed the very problem of building artificial general intelligence back another level here to say you're not going to be able to give good explanations.

You might be able to give good enough explanations if you have certain narrow, circumscribed objectives in mind. If you are concerned about let's say fairness in issuing loans and some bank is using some software, you might be able to say, "Well, here are the things that we want the machine or the algorithm to be able to tell us about its own performance," but that's not going to generalize very well to other kinds of situations or circumstances.

Nevertheless, a sort of cliché mantra here is, "Don't let the perfect be the enemy of the good." Some moves in that direction are worthwhile, and lots of people are trying out various methods for giving those kinds of limited explanations, but in general explainable AI may be out of reach for the time being.

ANJA KASPERSEN: Are you saying there is a higher risk of perpetuating our own biases rather than finding ways of overcoming them?

COLIN ALLEN: I think that's true. It also goes to some of my concerns about the way in which projects like GPT-3 are presented, where some chunk of text is put out there and you or I can look at it and go, "Oh, that's pretty good." But we are then making that judgment from some unspoken assumptions about what counts as good and what counts as worthwhile here.

If I have an explanation, I have some machine telling me, "Well, to make this decision about the loan I did not look at these factors, I did look at these factors, they were weighted in this way, and if you had tweaked this factor this amount it would or would not have changed the decision in this way." You might look at that and go, "That's pretty good," without realizing that nevertheless that system is going to perpetuate as you put it certain biases that you already have or that already exist in the social system.

We don't have a good handle on that, but again coming from a philosophy of science I think—and other people are thinking this too—there ought to be a more experimental approach. This is actually a really interesting question in the philosophy of science for explanation is: Do we ever have explanations of single events that are idiosyncratic, or do we get explanations for classes of events where we kind of average out over details? So I may not be able to give an explanation for why you didn't get this loan but looking at 500 decisions I might be able to say, "On the whole those who got it had these characteristics and those who didn't had those characteristics." Like anything else, you can not smoke and still get lung cancer or you can smoke and not get lung cancer. The best we can do is give a group-level account of what the causal factors are without being able to attribute any particular case to a specific cause. I think very much the same is going on here with AI explainability.

ANJA KASPERSEN: Following on from that, Colin, how do you see this apply to the many different approaches to machine learning and AI?

COLIN ALLEN: There is a lot packed into that question. I would actually want to make even a three- or four-part distinction because there is classical symbolic, which actually had very little learning capability whatsoever, and then there are older learning methods or learning methods that are not based on deep neural network "gradient descent" types of algorithms to put a technical term on it, and then you have these more current hot deep-learning models which can operate in a variety of ways. There is supervised learning, there is reinforcement learning, and there is unsupervised learning, which also crosscuts some of those other categories. The reason I make that distinction, not just to be pedantic, is that when we look at the most successful systems there is often a hybrid nature to them, hybrid not in the sense that Wendell and I were talking about with respect to top-down versus bottom-up approaches in morality but hybrid between symbolic and nonsymbolic aspects of this.

Take the renowned AlphaGo that plays the game of Go at a very high level to beat master Go player Lee Sedol. That starts off with a fully symbolic engine for the rules of the game. That is, what moves are allowable is not something that the network learned. All the network learned in the Lee Sedol version of AlphaGo was which moves were more likely to produce a good outcome in terms of winning the game. So it didn't learn the rules at all.

Go one step further to AlphaGo Zero, where they do have it acquire the rules of the game as well as the strategies for the game from experience. Still, some programmer decided that in order to train a network the network should have inputs that tell it there is a white piece here, a black piece there, and so the categories the machine has to learn about are given from outside the system. They are a kind of external symbol, whereas a kid learning to play games actually has to learn, "Oh, that's a piece."

So we are still a long way from a system that completely from a neural network alone gets everything it needs to know in order to play an effective game, and here we're just talking about a game. Generalize that to getting through the full range of physical and social situations that real organisms have to navigate, humans and animals by the way, humans facilitated by having a rich symbolic system external to them, language specifically, but nevertheless a much more powerful kind of architecture and set of learning methods seems necessary to be there.

I think the answer to how you are going to evaluate whether you understand this system requires attention really to, as you were suggesting, the details of the particular systems, how they're implemented, what the strengths and weaknesses are of each of those. You can't take AlphaGo or even AlphaGo Zero off the shelf and expect it to learn to play chess. You can take another system that was built on the same design principles but still customize for chess to do that.

A child can learn chess and Go and play the two simultaneously. AlphaZero plays chess and learns the rules from scratch, but it only plays chess. So this kind of fluid intelligence that lets you jump around and recognize at a very high level something you learned in one context might even apply. If I know Wendell is a careless and inattentive chess player, I might expect him to be a careless and inattentive Go player as well, so I can transfer that knowledge to this other domain and make some inferences about how it applies.

All of these things are necessary I think for giving the kinds of explanations and monitoring them for the kinds of ethical outcomes that we are concerned with. If I know Wendell is inattentive about certain things, then I might need to stay on top of him when it comes to whether or not he is going to make the ethically correct decision in some other completely different context as well.

We can over-generalize the experience that we have as well as under-generalizing it. I will bring this right back to the very concept of wisdom here, which is understanding the limits of one's own knowledge, so that what I have learned something about Wendell in one context may or may not allow me to predict something about his behavior in another context, and it takes a fair amount of understanding of the limitations of my own experience with Wendell and with other people to be able to determine whether or not it is legitimate for me to do that kind of generalization.

ANJA KASPERSEN: What you are saying, Colin, is that we need wisdom through the lifespan of the technology but we also need wisdom in terms of how we relate to and how we choose the different approaches to guide the development of these quite transformative and powerful technologies and scientific methods.

COLIN ALLEN: I think that's exactly right. A number of people have been focused on such issues as technological unemployment. I know Wendell has thought about this quite a bit too: Is it wise to roll out all these machines given the repercussions that might have for employment?

But I think there are other kinds of issues here too where it is not just technological unemployment—and it's something that the two of us wrote about in the book a little bit too—but deferral to the machine. It is hinted in the title of the paper I mentioned, "Abdication of Responsibility."

Is it wise to put a very complicated advisory system in front of people when you're not providing the means of understanding the nature of the decision and they don't know whether they should go along with it or not? Furthermore they may be lulled into a false sense of security because their experience to this point has been generally positive and so they just assume it's going to continue to be positive. It is a very natural human response to have to something that seems to be working well.

So how do we keep throwing something into the mix that makes people sit up and take notice and realize that they really didn't have a good understanding? That might be more important than having the system explain itself when it goes wrong or explain every darned decision. You don't want an explanation from me for every decision that I make. It would get tedious.

But we don't just want explanations for when things go wrong. That's too late. What we want is some way of capturing the limitations of our knowledge by giving people circumstances regularly where they realize they don't fully understand what this system is doing, and that, as I was saying, might be more important than having the system explain itself. Just throw a curveball from time to time and wake people up with it.

ANJA KASPERSEN: Do you feel that there is sufficient openness around discussing the limitations of these systems?

COLIN ALLEN: That is a very tricky question because of course many of the systems are behind some sort of protective intellectual property agreement or restricted in other ways, GPT-3 being another interesting case in point because GPT-2, which was its predecessor, a much smaller language model, the full source code is OpenAI. They are supposed to be open source of everything. It was released to researchers to interact with, play with, and inspect as they wished.

When we get to OpenAI's GPT-3, OpenAI declares: "This thing is so powerful that it's dangerous to let it out in the wild, to let researchers see what's really going on underneath the hood." Then they make an agreement with Microsoft, and Microsoft regulates who gets to interact with it through an application programming interface, where you can send it queries and get responses back but you still can't do the full range of systematic experimentation with it that one would like to do as a researcher. Yet it's being licensed to corporations to run systems such as their online chatbots for product assistance and so on.

Most people going on to those are not going to interact with them in ways that are designed to break the system as well. When you put a chatbot on Twitter a significant proportion of people deliberately go out and try to make it fail, and that's what happened with the Delphi project, and they should have predicted that that would happen. They seemed blindsided by it. If I'm interacting with a chatbot on a corporate website, a retailer or something, I'm generally not in that kind of skeptical mood, but nobody is either.

Maybe people ought to be invited to try to break these systems. How quickly could you get a "Can I help you?" chatbot to use offensive terms or say offensive things. I would suspect you can do it pretty darn quickly. They are going to have some keyword filters, but you can probably tell them that your name is something offensive and it will repeat it back to you and then construct queries in such a way that it will seem to be making offensive statements about certain groups of people. They can't anticipate all of these, and I don't think there has been as much testing as there should be of this kind of thing.

ANJA KASPERSEN: Why do you think that is?

COLIN ALLEN: I think it is again coming back to the cost-benefit analysis that corporate entities are doing with respect to their profit line. It costs money to do this. In the national security or even private security area for the entities that are producing these systems first there is a very high cost if they get it wrong, so there is a lot more urgency to get it right, especially if it's being done for public government entities. There is also a political cost for getting it wrong from those who are ordering such systems, so there tend to be more resources put that way.

If you offend some customer because the chatbot can be tweaked to say something bad, that's not the same as allowing somebody into a highly secure area, doing things that they shouldn't be doing there. I think a lot of companies probably just figure this is a cost of doing business that they can handle.

ANJA KASPERSEN: Thank you so much, Colin, for sharing your time, deep insights, and expertise with us. What a rich and thought-provoking discussion this has been.

Thank you to our listeners for tuning in, and a special thanks to the team at the Carnegie Council for hosting and producing this podcast. For the latest content on ethics in international affairs be sure to follow us on social media @carnegiecouncil. My name is Anja Kaspersen.

WENDELL WALLACH: And my name is Wendell Wallach. Thank you ever so much.

ANJA KASPERSEN: I hope we earned the privilege of your time. Thank you.

AIEI Podcast Feed

Visit the Connect page to subscribe

Initiatives & Issues

Explore & Engage

About

Any Progress in Building Moral Machines? with Colin Allen

Guest

Colin Allen

Hosted by

Anja Kaspersen

Wendell Wallach

About the Series

AIEI Podcast Feed

You may also like

Responsible AI & the Ethical Trade-offs of Large Models, with Sara Hooker

AI & Warfare: A New Era for Arms Control & Deterrence, with Paul Scharre

Cybernetics, Digital Surveillance, & the Role of Unions in Tech Governance, with Elisabet Haugsbø

Contact

Any Progress in Building Moral Machines? with Colin Allen

Guest

Colin Allen

Hosted by

Anja Kaspersen

Wendell Wallach

About the Series

Share

Stay updated on news, events, and more

AIEI Podcast Feed

You may also like

Responsible AI & the Ethical Trade-offs of Large Models, with Sara Hooker

AI & Warfare: A New Era for Arms Control & Deterrence, with Paul Scharre

Cybernetics, Digital Surveillance, & the Role of Unions in Tech Governance, with Elisabet Haugsbø

Ethics Empowered

Sign up for news & events

Contact