AI, Military Ethics, & Being Alchemists of Meaning, with Heather M. Roff

About the Series

Can AI be deployed in ways that enhance equality, or will AI systems exacerbate existing structural inequalities and create new inequities? The Artificial Intelligence & Equality podcast seeks to understand the innumerable ways in which AI affects equality and international affairs.

In this episode of the AI & Equality podcast, Senior Fellow Anja Kaspersen speaks with Heather Roff, senior research scientist at the The Center for Naval Analyses. They cover the gamut of AI systems and military affairs, from ethics and history, to robots, war, and conformity testing. Plus, they discuss how to become alchemists of meaning in the digital age.

AI & Military Ethics AIEI Spotify podcast link

AI & Military Ethics AIEI Apple podcast link

ANJA KASPERSEN: As artificial intelligence (AI) continues to reshape national security and military and defense applications, and as countries grapple with how to exactly govern these developments, it compels us to explore its broader implications. I am thus very pleased to welcome Heather Roff, a senior research scientist at the Center for Naval Analyses, which specializes in military affairs and ethics. Heather’s academic journey in social science and philosophy uniquely positions her to discuss the ethical integration of autonomous systems into military applications. We will make sure to add a link to her background in our transcript.

In today’s conversation we will touch on a plethora of issues, looking into AI more generically, military affairs, ethics, history, cringey chatbots, and even the occasional unicorn issue. We will also explore what Heather describes as the “boring part” of AI and robots in war, focusing on critical yet often overlooked areas like maintenance and logistics, highlighting the importance of conformity testing, validation, and verification.

Welcome so much to our podcast, Heather. We are very pleased to have you here.

HEATHER ROFF: Thank you so much for having me, Anja. I am excited to talk today.

ANJA KASPERSEN: Heather, before we explore the complexities of AI, not least in a war type of context, could you share what inspired you to enter this field, having a background in social sciences and philosophy? What pivotal moments, or, if I may say, key influences or influencers brought you here?

HEATHER ROFF: I got started on this journey in graduate school, while I was writing my doctoral thesis, which eventually turned into my book on humanitarian intervention and Kant, of all things, because I love Immanuel Kant. Who can’t love Kant?

I was writing that book, and one of the main objections to humanitarian intervention is why should I submit my resources and my troops for somebody else’s war that I have no dog in the hunt for? That main objection is why do I send my soldiers to fight and die in someone else’s cause. I had just finished reading Peter Singer’s Wired for War: The Robotics Revolution and Conflict in the 21st Century, and I thought, Well, why not send robots? They don’t die. Go send the robots.

As I started to think through the logic of that argument I thought it was actually a terrible argument because I thought, Oh, that sends a horrible message to the people you are trying to intervene on behalf of, like: “Well, you’re not really worth human lives. We will just send machines.”

In working through that logic I became fascinated by the use of robotics in armed conflict, so because I am a social scientist I wanted to understand everything, so I went down lots of different rabbit holes. I wanted to understand not just the engineering of robotics and learn about systems engineering and how you build robotics, but I also then was like, “Wow, these are really insecure,” so I wanted to learn all about cybersecurity. I ended up being a National Cybersecurity Fellow at New America and then I was a Future of War Fellow at New America, and in this process of trying to understand robots in war, I was like: “Okay, I understand the physical side, I understand the vulnerabilities on the cyber side, but now I have to understand what makes them ‘think,’” and that required me to have to understand artificial intelligence.

Luckily as a social scientist and a philosopher, when you are a political theorist in a poli sci department they make you learn math, so I was very positioned to understand, “Okay, so this is what is happening in the mathematics behind the algorithms, what are these things doing.” I became more and more fascinated by the AI side of the house and trying to understand how they really work—how you build it, where the data comes from—and, as a social scientist who is trained in methodology, looking at how some of these systems are designed and built I was appalled at the lack of rigor that is involved in where people get their data and the spaghetti approach of throwing it at the wall to see what sticks and not having any sort of principled aims of methods to go about building these systems.

That is how I got involved. It was, “You open up one can, then there is another can and another can, and before you know it you have an entire box of cans of worms.”

ANJA KASPERSEN: So the brittleness of systems and our human processes.

HEATHER ROFF: Exactly, because they are sociotechnical systems. It is one thing to understand the math behind the systems, one thing to understand the data behind the systems, and it is another thing to then say: “Okay, how are these systems going to interact in the real world amongst people, people that exist in institutions, in bureaucracies, and in organizations? What are the incentive structures to use or not use different kinds of systems or technologies?” All of those incentive structures are going to mediate the kind of uptake and mediate the knee-jerk reactions to the use of that technology.

Social preferences and organizational culture and behavior are incredibly important, and without understanding that you might have the best robotic autonomous widget in the world, but if the organization is not prepared culturally to take that and employ it effectively, message about it effectively, and train their operators to use it effectively, it is not going to be used in the way that you think it was designed to be used.

ANJA KASPERSEN: With all of these technologies afforded to us, does it make us more or less humane? Does it make wars more humane or inhumane?

HEATHER ROFF: I was a fellow at Oxford for a little while, and they have the Changing Character of War program. That distinction between the character of war and the means and methods of war—is there a change in the character of war? No, war is inhumane, period. As Kant used to say, it is “the scourge of the earth.” It is one of most horrific practices we can engage in. In that respect it has not changed.

Where the means and methods have changed I think there are differences. They may not be different for the folks who are feeling whatever bomb or blast or use of a particular weapon. They are feeling the visceral effects of that weapon no matter what. It might change the perspective of the people or the institutions doing the fighting if they can do greater amounts of fighting at a distance, so there is a lot of that kind of shifting of costs onto different folks. In previous conflicts you could not shift that cost in those ways. I think that definitely is different.

Also, speaking historically we have to think about how militaries actually operate. Fundamentally war has not changed much from World War II. We still divide territories and air corridors—“This is your spot and this is my spot, and if you come into my spot you are likely to get shot down. This is the territory which I will control, these are the means with which I will control this territory.”

Comms networks are better, but they are not amazing. We can see a lot more in permissive environments, but in contested environments we can’t.

I think in some instances it has changed and in some instances it hasn’t. We have a lot more adaptability in certain mission sets than we did in the past, and we have a lot more ability to see what our adversaries are up to, but there has also been a lot more obfuscation in different ways.

Historically, again going back to World War II, the Allies engaged in massive amounts of obfuscation and massive amounts of deception, so we might say that the deception looks slightly different now. We might not be using rubber inflatable tanks so that they think there is a big tank troop massing, but the deception is still there. It is interesting how it has shifted and changed, but the characteristics I think are still quite similar.

War is fundamentally a human practice. It is mitigated through bureaucracies and all of these different types of institutions, states, power, and corporate moral agency, if you think there is such a thing, or corporate agency at least, but fundamentally it is about territory, power, revenge, some sense of justice, or some sense of wrong being done that has to be rectified. I think those types of reasons are still the reasons why states or groups of people if not states go to war, but how they fight that war, even to some extent tactically, has not changed that much.

You can think about siege warfare, you can think about insurgencies, you can think about guerrilla tactics, you can think about irregular wars, things like that. The means why which they do it are definitely different, but the fact is that it is still guerrilla combat in Vietnam and guerrilla combat in the Middle East.

I feel like those types of tactics are a response to the incentives. If you are fighting against an opponent that has a massive advantage on you in terms of power and in terms of military hardware, surveillance, and all of these types of things that you could never match on an open-plain battlefield, then the natural reaction to that is to go guerrilla tactics. You want to punch as many holes in your opponent as possible because you cannot meet them face-on.

I feel like these are tactical decisions based upon what types of assets you have and how you want to fight in the course of pursuing your cause. It might not be a just cause, maybe to you it is, but those types of similar behaviors—why do we fight wars of attrition, guerrilla wars, irregular wars, gray zone wars, all of these different types of activities I think are driven by considerations about how much power and what you can do once you face your adversary on the field of combat.

ANJA KASPERSEN: Given these changes in characteristics of warfare and that the field of combat is rapidly evolving, can you elaborate on the crucial role of maintenance logistics and the need for skilled human operators, which you also discuss in your recent article, “The Boring Part of AI and Robots in War.” In it you emphasize that finding maintainers is as critical as the systems themselves.

Also, in light of your work with the U.S. government on defense innovation and ethical principles, how exactly do we test these systems’ capabilities against real-world environments, especially with a rapidly changing battlefield and potentially inaccurate data?

HEATHER ROFF: I was involved in the Defense Innovation Board. I was their special governmental expert and the primary author of the Department of Defense’s (DoD) “AI Ethics Principles.” That was a big undertaking over a couple of years. Hundreds of people were involved in that process with multiple states being advisory states too, so they were involved in some of those discussions in a working group.

One of the things that project focused on and that I wrote about in the supplementary document—there is the principal document and then there is a supplementary document that goes along with it, about 70 to 90 pages—spends a lot of time on testing, evaluation, verification, and validation (T&E and V&V), exactly for these reasons.

But who is going to do that testing, evaluation, verification, and validation? That is a big challenge because a lot of T&E and V&V for these systems right now at least in the military is a challenge in terms of the personnel available and with the knowledge skills to be able to adequately test in terms of the ranges, the availability of time on range, and the modernization of those ranges. There are a lot of challenging things there to even test the widget that you build to see if it is going to work the right way before it becomes scaled up in some giant acquisition and then deployed to forces. That is one challenge.

Then you say: “Okay, now you are facing a different set of pressure incentives from the DoD at least with the Replicator Initiative. The Replicator Initiative is looking at how can we basically build hundreds or thousands of attritable autonomous systems on land, sea, and air to countermatch an adversary that would have access to as many types of systems? We want them to be low cost because they are going to be blown up or they are going to be damaged in some way that we want to be able to leave them and we do not want to have to worry about them being exquisite systems that cost millions and millions of dollars. We want to be able to repair them or leave them.

Repairing them requires maintainers. In the Replicator initiative we are focused on production—create the system and then produce a lot of it—and I see a lot of ink being spilled and words being said by various DoD officials: “We want to have more robots on the battlefield, we want to have a robot embedded in each Marine squad, we want to have all of these things.” While they are talking about training operators for these positions or having squad leaders or something, what is left at the door is, well, who is going to maintain this? Who is going to understand the system not just in terms of how you operate it, but what if something goes wrong and it needs to be fixed?

Especially if we are looking at strategic implications—you are forward deployed and do not have a lot of communication pathways, you might be in a comms-degraded environment—you are forward deployed, you have degraded communications, so you need to have somebody who can maintain these systems embedded with you, but where do you find them? We cannot find them at the DoD level writ large to undertake testing initiatives on a grand scale, and now you are talking about needing to finding people with this type of skill set and have them embedded with squads.

It seems to me to turn the problem on its head. It is like, “Okay, yes, I want a Special Operations Force (SOF), I want a SOF person”—usually a man—“to be able to understand how to use the system, but then do I also require of that individual the ability to maintain and fix that system as well as operate it?” That is an entirely different skill set than being a Navy SEAL.

I think it is important to focus on logistics and maintenance because if we have these systems we want them to work, and because we are going to have these deployed in a variety of domains—air, land, sea, what have you—they could also be in a wide variety of different climates with a wide variety of different foliage. You could have different problems related to sensors degrading because of rust, salt, sand, or whatever it might be, and so you have to be able to figure out what is wrong with the system and fix the system to get it back to running to get the “goodness,” so to speak, the operational utility, out of that system.

If we cannot hire people who can do T&E and V&V on these systems at the Office of the Secretary of the Defense (OSD) level or at the service level—the Marine Corps has Marine Corps Operational Test and Evaluation Activity, which is the Marine Corps’ testing facility, the Air Force has the Air Force Operational Test and Evaluation Center, the Navy has theirs—for research and engineering, testing and evaluation stuff, how on earth are we going to find them at the operator level?

For me the boring part of this, which is not sexy like “How do I make it go?” but “How do I get parts for it if something breaks?” What am I going to do? Bring out my 3-D printer in the middle of the jungle? What am I going to hook it up to? Where are my materials? If that is the case, then you have to be shipping these things back someplace to get fixed, and that logistics tail carries with it operational concerns and security concerns.

You might think, Oh, these are going to give us all these operational capabilities, but you might be completely hamstrung because you are dependent upon something that breaks down all the time, that no one wants to use, there is no one there to fix it, and there is no one there to test it once it has been fixed to see if it is actually behaving in the way you think it is behaving because its sensor input is fundamentally different. If I go from the desert to the jungle and this thing was trained in Mountain View, California, its data library, so to speak, is completely different. It is “seeing things” it has not seen before.

I just think the boring stuff is actually the hard stuff, and it does not get enough attention, and that is a detriment. Also we don’t value it enough. We think of the valor associated with operators. We do not think of the valor associated with maintainers.

ANJA KASPERSEN: You raise some interesting points. Actually Missy Cummings, who I know and who you have done some work with in the past as well, was on this podcast a year or two ago, has been writing quite extensively about this exact topic. She is critical also in looking at this in terms of how we are embedding autonomous features into societal applications including cars, aviation, etc. She obviously comes from a military background so she looks at it from that perspective as well. She has many, many times said that for these systems with a level of maturity but also the interoperability between software and hardware and the hardware component of any conversation about AI is often not only not understood but missing altogether.

She has been very critical about the human operator side of it. Do we have a sufficient amount of people trained—and the military is part of public governance—in public governance functions to actually engage with the systems?

My experience is so far from yes, we don’t, and this is a serious, serious security threat. We are embedding technologies without having a full comprehension of the software/hardware relationship, and we do not have enough operators or maintainers as you said to be able to navigate that space.

Another issue which I heard between the lines is that you were speaking about the U.S. context, but if you look at this internationally what metrics of assessment are we using? I fear that not only do we have a big issue in terms of definitional clarity—and as we are both social scientists we can certainly spend weeks and weeks on end discussing one word and what is the meaning of it.

HEATHER ROFF: Autonomy.

ANJA KASPERSEN: Autonomy, exactly. Or “meaningful” or “human control,” which we will get to. But these issues of using different assessment metrics for things where there is no clear definitional clarity and where even the issues of interoperability between not just the human, the hardware, and the software may be understood but also then not embedded into our evaluation metrics.

HEATHER ROFF: Exactly. I will bring up something Missy says all the time because I think it is a great way of highlighting the problem. She says with regard to autonomous cars and civil society, she wants all of them to be required to be painted hot pink because she wants to be able to see them coming. She is like, “Oh, there is one, I am going to avoid it.” She wants that hot pink for autonomous vehicles requirement.

I think that is telling. She says it in jest but also in seriousness because these are not systems that operate in the ways in which we assume they should be operating. We overestimate their capabilities quite often.

When it comes to, as you were saying, the standards, if you are talking about technical standards there are still challenges related to establishing technical standards on these systems because we are not saying, “Does it operate at 0.2 MHz to 0.6 MHz” or whatever your bandwidth is on the tech spectrum? We are talking about a bound of potential behaviors, and you have a hard time writing requirements for systems based around behavior. That has been a major challenge for a number of years. It is an open challenge. Requirements writing for autonomy is a big, big, big problem.

On the international standard side of the house, you can think of this as, if one service within the DoD says that they have figured out one way of measuring something, then that is great, but then you have to think about of it almost like a Russian doll. It is like, “Okay, we have one thing here, but this has to then scale up to the largest nested set of actors.”

As we are increasingly relying on allies and partners—you can think of the AUKUS stuff going on right now between Australia, the United Kingdom, and the United States. AUKUS wants to not just deliver nuclear-powered submarines to Australia, they want to under Pillar Two engage in a lot more technology transfer between all of these different countries. If that is true, the interoperability of those systems and the standards for interoperability of those systems is going to be paramount.

Then you can widen that out from AUKUS and say: “Okay, well, what about other allies, what about other NATO partners, what are the technical standards there, and how do we test to those standards?” Right now if you look in the standards literature—Anja, I know you are involved with the Institute of Electrical and Electronic Engineers (IEEE)—this is a major problem for IEEE.

If you think about what kind of international standards you are going to apply, any standard that has been adopted, at least on the responsible AI side of the house, those are not technical standards. Those are processes. Those are standards for how you organize a company, how you organize an organization or an institution, and they are all about processes and procedures: “Do I have a data governance procedure? Do I have a reporting procedure? Do I have this, that, and the other?” They have nothing to do with the actual technical standards.

I find it is quite interesting that standards bodies now are issuing standards not related to tech standards but to organizational and cultural processes. That is an interesting shift away.

Testing and evaluating these systems, even in the civilian side of the house, Waymo is going to have different standards than Uber and they are going to test in a different way than that because again there is no set of established standards. I think however we get through this process there is going to be a lot of trial and error, and it is still nascent.

The testing and evaluation of autonomous systems is still nascent, and Missy can talk to you about that. She likes to, for lack of a better word, “take the piss” out of Teslas all the time because they don’t work so well. She has a lab where she can break them, and the interface between hardware and software is a systems engineering problem. Unless you are really good at systems engineering and understanding all of the inputs, outputs, and dependencies of that system, you are not going to engineer it right, and unless you have a good understanding of human-systems integration (HSI) and human factors, which Missy does but others do too, human-systems integration is incredibly important to systems engineering.

But even on the HSI side of the house that has a lot more to do with human physical integration – what are the capacities of my eyesight and hearing, attention span, when do I zone out, when do I not zone out, things like that. It has less to do with cognitive functions, the science between cognitive integrations or cognitive dependencies between humans and their technical artifacts is super-nascent because we just do not have cognitive benchmarks for humans, and I don’t even know if there is an ability to create “human benchmarks” in that way because of the differences and variations amongst human cognition.

We have a major problem. It is one thing to have human factors and human-systems analysis and integration for cockpits in flight because we know that you will pass out at however many Gs that you pull, the human body cannot withstand it. That’s fine, great, I understand that, but integrating humans and human cognition into a sociotechnical system and then testing for that has never been done before, or if it has been done it has been very, very caricatured in different ways and experiences.

ANJA KASPERSEN: We know from research including Missy’s research but many others too, you mentioned the Second World War earlier, that the decline in attention span is very diminished with adding autonomous features to any system, so it is that part as well. How do you train operators in a very different way knowing that their cognitive abilities might be impaired and knowing that the skill sets required should you need to actually take over interpretive processes needs to come into full play.

We are going to do a little bit of a segue into your work on autonomous weapons systems, where all of these issues are playing out and have been playing out for quite a while.

HEATHER ROFF: I guess I will pick up on the meaningful human control thing first and then spin it into that and go from there, back in 2014 I partnered with Richard Moyes from Article 36, a disarmament nongovernmental organization (NGO). Richard coined the phrase “meaningful human control,” but he did not put a lot of conceptual meat behind it. I came in to help do that.

ANJA KASPERSEN: Maybe you can say something about Article 36 because of course his NGO is not just focused on that, but that is a pretty important part of the story, why Article 36 has a meaning in this context.

HEATHER ROFF: Richard’s NGO is named Article 36, but it is titled that after Article 36 of Additional Protocol 1 of the Geneva Conventions, which requires weapons reviews so that any state or any military that is going to be fielding a new weapon has to engage in weapons reviews, or as we would say in the United States “legal reviews” of weapons systems to make sure that they comply with the rules of war—that they are discriminate, that they do not cause frivolous injury and undue suffering, things like that. They have to be compliant. They cannot be illegal means and methods of war. That is where the name comes from.

In the debate around autonomous weapons there is a major theme that they are inherently indiscriminate. We will put a pin in that for the moment.

The notion of meaningful human control is that you have to have a sense of governance of the system, you have to have a sense that there is human oversight of the system, and in 2016 when we came out with our first white paper on that really what we were talking about was a nested set of procedures and processes, that they are temporal in nature and are also institutional, procedural, and all this other stuff. There are certain things you need to before a war starts, antebellum concerns, which are testing, evaluation, verification, and all of that kind of stuff that goes on well before the start of armed conflict, and then you have in bello considerations of how you use the systems while you are fighting.

ANJA KASPERSEN: Which is also the life cycle aspect.

HEATHER ROFF: Exactly, and then that gets into the maintenance of those systems as well as targeting. There are concerns around here about appropriate use and then how humans are engaging in that use and what humans are doing with those types of systems.

Then there are postbellum considerations. If something were to go wrong, do you have systems of remediation in place to hold individual people accountable for those harms because obviously we do not hold tools accountable for things. That is what we were talking about with meaningful human control: What are the types of limits you need in place at each one of these temporal steps to ensure that you have a sense of meaningful human control over the processes, means, and methods of war?

In 2016 I did another white paper for the delegates at the UN Convention on Conventional Weapons on autonomous weapons systems, where I compared the U.S. position on “appropriate human judgment” as outlined in DoD Directive 3000.09 with the concept of meaningful human control. The way I put it is that they are very similar. You can say “blank human blank,” “meaningful human control” or “appropriate human judgment.” They are very similar in content and in what they require.

The difference is that meaningful human control takes a broader view—those antebellum, in bello, post bellum considerations—whereas the DoD appropriate human judgment standard is much more tactical. It is much more like: “Did I do the correct engineering beforehand and then did I employ the system in the moment correctly?” Theirs is slightly tighter in scope than I would say meaningful human control is, but fundamentally they overlap and complement each other in a wide variety of ways. They are not that different.

Over the years, though, meaningful human control has been interpreted differently where a lot of people think that meaningful human control requires a human in the loop. I was the first one to put meat on the bones of this concept, and at no point in time was there ever a requirement for a human in the loop. In fact I hate the phrase “human in the loop,” I hate the whole concept of the loop because I think it is inherently suspect from the beginning. John Boyd was the one who came out with the concept of the OODA (observe, orient, decide, and act) loop, and that has been nothing but a problem for everybody ever since. It is so conceptionally bizarre when you start to dig into it.

He never even wrote anything about it other than PowerPoints that he shuffled around the DoD. It is such an overstretched concept on the one hand and it gets abused on the other. On the loop, in the loop, out of the loop—it is kind of nonsense. I do not like that phrase, and I think it does a big disservice to the types of problem solving that we need to utilize to address this, like: What is the function of this system? Where is the human within that functioning? What types of attention span does the human operator, overseer, or whatever require to be able to operate or oversee that system appropriately?

If you focus on the loop you are focusing not on what does the human require. In that respect I think if you focus on what the human requires that is when you get into much better questions and much bigger questions about interface design, about what types of alerts are appropriate, how many, colors on the screen—there are all sorts of these design considerations that I think are important for the appropriate amount of information uptake for a human to make decisions. That is increasingly important in user experience design.

I think the meaningful human control stuff is a big umbrella concept. Appropriate human judgment is a subset or very complementary of that. In terms of the loop, I hate it because the loop leads us into these discussions, as you were saying, about trust.

That is another term I just cannot stand. If someone could just stop using the words “trust” or “trustworthy” I would be really happy, but unfortunately it has a life of its own so we have trustworthy AI, trust this, appropriate trust, blah, blah, blah, and now we have at least a little bit of a shift toward the phrase “confidence,” and “justified confidence.”

I think at least if you were to use the phrase “confidence” you are getting at a better representation of at least what we would use to assess the system because we would mathematically assess and test how that system operates under a variety of conditions, and then we would come up with confidence intervals about its failure rates. In that respect, you can say, “I have confidence based on these mathematical test results that the system will behave this way under these conditions.” That is not a guarantee because confidence intervals can be 99 percent, 95 percent, 90 percent, whatever, but at least that would push us a long way forward to reorienting our thinking about how we need to engage with these systems.

We don’t “trust” them. They are not individuals. There is not a relationship. There is not some long history in these thick anthropomorphic notions of trust that are also highly cultural, contextual, and not appropriate for these physical systems.

ANJA KASPERSEN: Or the marketing sloganeering.

HEATHER ROFF: Yes. It is really, really bad. I think if we can at least push it toward confidence and stop talking about trust generally—because it covers a myriad of sins. The word “trust” has different properties associated with it. I could ask you, for instance, “Do you know someone who does this?”

You might say: “Yes, I have a friend. They do that.”

So you introduce me to the friend, and then the friend will automatically take the call or the email from you because they know you. They do not know me, but there is this transitive property that they should engage with me because we went through you, and that transitive trust is a weird but important concept that is at work with these systems, but the agent is not Heather to Anja to somebody, it is Google or Amazon or this corporation, that we say: “Well, they know that and they have a lot of really smart people so I should trust that whatever they are doing is pretty good because those engineers get paid a lot of money and clearly know what they are doing. They are much smarter than I am, so I should—thanks, Meta. I should let this into my house.”

ANJA KASPERSEN: Does it worry you, given all the work you have done in the defense domain in particular and thinking about—because of course there is a big difference between how we think about the societal impact and the issues around brittleness, etc., whether this is being deployed in peacetime or wartime, and when you see this transitive property that you refer to is almost being captured by certain corporate interests and almost embedded into military discourse, does that worry you?

HEATHER ROFF: Yes it does. In fact I was just reading last night a book on Operation Paperclip from the end of World War II by Annie Jacobsen. In there she describes the farewell speech of President Eisenhower and the famous bit where he says beware of the military-industrial complex, but she also notes the second thing he said in that speech, which is to also beware of the policymakers who are overly enamored with the industry and what they can do with that.

I think that kind of warning from Eisenhower about policy wonks being overly enamored with industry is exactly where we are at right now, and what we see is that the military—I have a lot of friends in the DoD and around the DoD and they are wonderful people, but I do think they are looking to industry to see what they can get out of industry to give them an operational edge. What can they get that China cannot get or Russia cannot get? They think that the Googles, Metas, OpenAI, or whoever, will have an answer and that they can use that system to solve a problem for them that they either cannot currently solve or that will give them some sort of capability boost.

I am not certain that they are going into that discussion with enough healthy skepticism. I think a lot of folks are like: “Wow, that’s amazing. Let’s just get ChatGPT for the military and it will be wonderful.”

I am like, “What are you going to use it for?” You don’t want a system that hallucinates information that is completely bollocks in a military setting. What? It might have limited utility for certain things—writing memos or doing something if you have someone checking it—but I do not see the cost-benefit in operational ways. I think there is not enough healthy skepticism of going to some of these companies and saying, “Yeah, give us your latest large language model.” That is worrisome.

Go to those companies and ask, “Can you give us a classified cloud space?” That makes sense. They need the cloud for a lot of things, but I am a lot more skeptical about generative AI, unless they want to do psyops, then go for it. Again, what is the use of this, and how can it be fooled, and then what can it be for a vulnerability?

There is a sunk cost problem with that too, which is if they think that industry is going to provide them with all of this magical goodness and they have spent so much money on these systems, then walking away from that investment becomes increasingly difficult, especially if Congress or somebody else is going to hold them to account for that money. There are all sorts of those incentives at play.

It is a tough position to be in, but at the same time I am increasingly worried about industry making promises to policymakers and that the policymakers do not have the technical chops to ask the right questions and they get taken for a ride because industry is looking to line their pockets, and the military is a big customer. The DoD’s budget is $875 billion a year. That is a lot of money. If you can trap government contracts, you get a ton of money.

The rank and file of these companies do not want to work on military projects. All of the protests over the years—those engineers do not want to work on military projects, but the leaders of the company definitely want military contracts because that is where the money is. So you have these dueling incentive structures within those corporations, and then you have incentive structures within the military: They want better capabilities but they also do not have the personnel to create those inhouse on that scale, and they also cannot hire those people to work for DoD because the pay scales within DoD do not come close to matching what an engineer can make at Google, Microsoft, Meta, or some other big company.

ANJA KASPERSEN: And not every problem is an AI problem. This is the same issue you face across the public sector, commercial enterprises, and small enterprises, this promise of efficiency and optimization at any cost makes people forget to ask simple questions: What type of problem do I have? Does it require an AI solution? If there is an AI solution—and this is something I often see missing—what type of approach do I need? Most procurement officers do not understand or have a good comprehension of what type of methodology or technique is being used in the product that you are procuring has to fit your problem, which in most cases is probably a human problem.

HEATHER ROFF: Exactly.

ANJA KASPERSEN: There is definitely a lot of that going on, and it is quite concerning as you see more and more, as you said, political leaders, etc., embracing this because they do not want to fall behind or there is this perception that if “We don’t build it someone else will build it and what happens then?”

HEATHER ROFF: That is exactly right. There is this competition mentality, this competition model, “If I don’t do it, someone else will, so I better do it.” That kind of logic justifies anything. If you are committed to responsible innovation and you are committed to responsible AI, whatever that means for you, if you hold the commitment to responsible innovation and you also hold this competition mentality that says I have to do it, otherwise somebody else will, then you do not hold a commitment to responsible AI because you will do anything to maintain the edge in that competition, so I think the logic falls apart there.

I always look to incentive structures. Why would someone drop this right now and what are they really trying to do with it or what are they trying to get from it? It is the qui bono—who benefits?—question. There is obviously the competitive logic at work between the big tech companies at work against one another to some extent, competition for talent, competition for money, prestige, all of these things, but there is also competition for data.

We know that these companies are running out of data. They have scraped the Internet, they have scraped YouTube again and again, and they are running out of data to train these models, which pushes them into the need for synthetic data. We also know that synthetic data after four to five different rounds is going to end up in model collapse. You have to keep injecting real-world data into those systems for them to be operational.

How are you going to do that? “Here, OpenAI. Let me give you this really cool, fun party trick thing. I have one AI agent talking to another AI agent. Oh, by the way, they are not just talking to each other because that would cause model collapse. No, no, no. You’re in there too, and you are going to engage with them. You can engage with them vocally and they can engage with you with images and video, you can show them around your house, you can do all these things.”

Well, guess what they’re doing? They’re sucking up all of that data. You are giving them more training data, so the more people who download ChatGPT 4.0 and engage with it in multimodal sensory, they are giving them multimodal sensory data that they so desperately need. If you are just doing ChatGPT 4.0 you are just giving them text. Now you are giving them everything, and you are also letting them into your homes, your offices, and your cars. There are other people that might be involved, rights to privacy, and all sorts of other things that are kind of worrisome and the fact that they have these lovely feminized voices which consistently bother me because of longstanding gender politics and all of these things.

This is just a bee in my bonnet: What is it with the fascination of: “I just want this assistant” or “I want this entity to be my slave. Go do all the things that I don’t want to do. Make an appointment for me. Call and make an appointment for my haircut?” Why can’t I just be a human and engage with other humans? Am I so busy I can’t take 30 seconds to call and make a hair appointment? Really? It has this odd logic behind it. I want this little minion to do my bidding and not talk back to me and be subservient and feminized.

ANJA KASPERSEN: I want to take you back to the issue of autonomous weapons systems for a second. You wrote an article not too long ago, and I quote: “If the arguments about autonomous weapons systems were purely about technology, then the archaeology and pedigree of today’s weapons shows that we are not in some new paradigm.”

When I read that sentence it stuck with me, and it builds on a lot of what you said including how we are actually now using these technologies and the features that we are giving to it. Especially as we negotiate the terms and conditions of using AI in war-fighting contexts beyond the mere technology, because these systems are fundamentally about power. As you said, you always look for incentives, and some incentives are definitely monetary, but I think with AI what we are seeing is that so much about it is not just the power to be but who gets to define power and define who has power in this system.

I think this is such a pertinent point, and what it does is it brings together, which I know you, I, and many others have tried to do for many years, the discussion happening more in the classical arms control context and saying we cannot look at it from the conventional arms control lens. This is something much bigger and you have to understand the bigger component of it to even start utilizing the tools in arms control, the nonproliferation toolbox, which is confidence building, but these bigger issues need to be understood if to be safely deployed into defensive systems.

HEATHER ROFF: On the autonomous weapons side of the house, first I would say: Look. If you go back to the first public statement, which is the 2012 DoD Directive 3000.09. That was the first statement to come out and say, “This is our policy on this,” and very soon after that we had our first meetings at the UN Convention on Conventional Weapons with their informal meeting of experts, where you and I met years and years ago and where I testified and was involved in that for a number of years.

That discussion has circled the same set of issues for a decade, and I do not see any actual progress being made on that because I think you are right. The traditional arms control framework is not getting at the core of these issues. If you take the traditional arms control framework, what you are looking at is what kinds of technical things am I going to disallow and what types of systems am I going to prohibit.

But if you look at contemporary systems that have been involved in armed conflict since World War II, if you are talking about does the system sense by itself, does it act by itself, what does that mean, how is it doing it, all of our systems are built on databases, sensors, communication links, and all that kind of stuff. So what are you prohibiting?

I launched recently this blog called “The AIs of March.” I just started it so there are only a few things on there, but it is a way for me to kvetch, to complain about some of these issues in a space that is not bound by rules of publication. One of those blogs looks at, when was the first real use of autonomous weapons in war, as we think of them as selecting and engaging targets? The first use of it is this combination of systems that were utilized by the Allied forces to counter V-1 rockets coming from Germany.

ANJA KASPERSEN: For military necessity.

HEATHER ROFF: It was. It was bred out of military necessity. It was a combination of the use of radar, antiaircraft guns, the proximity fuse, and the ability to see a target coming in, know its trajectory, speed, and everything else, and be able to use a computer to calculate the response, and then have that proximity fuse at the end of a 90-mm shell to be able to get close enough to the object without having to hit it and without having to have a human hit a button or already arm it for a time release to get near the thing.

If you have a proximity fuse, through estimation you can say: “A V-1 rocket is coming. It is coming at this trajectory, this wind speed, this altitude,” all of this stuff, and then you can fire a defensive round in that area, and as soon as the fuse gets proximate enough it will detonate and take out the V-1 rocket.

It was military necessity, but nowhere in that system, other than a human turning the radar and computer on and then loading the shell was there any sort of human engagement. That was all automated, and that was back in 1944.

It is like, okay, if our first “autonomous” weapon was in 1944—we can talk about automatic weapons, mines, and all that stuff, but if we really are thinking of it more in the contemporary use of selecting and engaging—that countermeasure to the V-1 rocket was our first instance of seeing autonomous weapons as we currently think of them.

What are you going to do? Are you going to outlaw that? That outlaws all missile defense. Are you going to outlaw the Iron Dome? Are you going to outlaw ballistic missile defense, are you going to outlaw anti-ship missiles? There are all sorts of systems that rely on this. All of our cruise missiles rely on this, Global Positioning Satellites, train mapping, position navigation, and timing. We want position, navigation, and timing because we want them to be more precise. What we hope is that the human who is choosing the target is choosing the right target because once the thing is launched it is just going to the target to which it was assigned.

Or we say, “Well, an autonomous weapon is choosing its own target.” To some extent, but not all. You can train it on a set of targets—it will only attack tanks, it will only attack radar signatures, it will only attack this—so you might say the target set is chosen by its human designers. You don’t know which target it will go for within that particular set in a geographic targeting box, if you will.

If you go to the traditional arms control framing and say I want to prohibit something, what in that bucket are you prohibiting because you have been relying on all of that technology for over 70 years, and no one has made a fuss about that. If it is the selecting and engaging targets without human oversight, sure, but I don’t know which militaries are going to not have any oversight and just say: “You know what? Throw up a whole bunch of stuff. I don’t know where it is going to land, what it is going to do, where it is going to go.”

No. Militaries are doing this because they want some operational utility and edge. Yes, there is oversight. It might be at a distance from that battlefield. It might be temporally distant, but there is definitely oversight. I think the questions are: What are the appropriate limits on distance and time, and then how much are you allowing for contexts to change and then taking on the operational risk assessments of what happens if the fight on the ground is changing and you have launched something that you cannot recall.

From a tech perspective, we have been using these systems for quite some time, and I think that is where the arms control side of the house is not the right frame because I don’t know how you would write a treaty without outlawing a bunch of systems. No state is going to sign up for that.

I think at the beginning of that the informal meetings of experts when the Campaign to Stop Killer Robots came out and said, “We’re not concerned about present systems, we are concerned about future systems.” They said that without understanding the technological archaeology of these systems, and then over the years they have had to backtrack that a little bit and make these concessions, but in doing that you still are left with, then what are we prohibiting and why? Why aren’t Article 36 weapon reviews sufficient? Why is a prohibition on disproportionate uses of force or a prohibition on indiscriminate weapons not sufficient? Why is there a need for a new protocol?

As those arguments keep getting put forward, the ground on which they are standing seems to get shakier and shakier as you press on them. If you say, “Well, it is not about this, it is about that,” then you go, “Well, is it?”

If you make a purely technical argument and say, “Well, these systems are inherently indiscriminate,” that is a technical argument which I can refute. I can say: “No, they’re not. No, they target tanks, and here is my confidence interval that they will only target not just tanks but this specific type of tank,” or “No, they only target radar signatures or electronic warfare signatures” or something like that. In that respect, that argument would then fail, so you cannot say that they are inherently indiscriminate.

Then you get into the argument, “Well, they can’t target people.” This is not my argument, this is me playing devil’s advocate here, so I would say, “Well, why can’t I target people?” If it is the fact that I cannot at this present time come up with a visual data set that represents combatancy and thus the system cannot visually identify combatancy from noncombatancy, that is a technical argument. If at some point in time in the future you do have enough data that you can reliably within some sort of confidence interval identify combatancy as a behavior, then that technical argument is refuted.

If you make a moral argument and say it cannot target people, then you need to find some ground for that moral argument, and if that moral ground is the Martens Clause, that it is against the dictates of public conscience, that is one way to do that.

But as you and I both know, Anja, finding and grounding an entire prohibition of systems based on the Martens Clause has not really been done, nor do I see it happening in the international system. The states are not going to sign onto it.

Instead, you are going to have, as you alluded to, confidence-building measures between like-minded states, partners, and allies saying, “This is what we have done, what have you done?” Let’s say that we share some information, we engage in technology transfer and exchange, we make sure that our systems are interoperable, that if we are doing joint operations these systems are not going to have blue-on-blue or green-on-green problems.

That is exactly what you do see happening. You do see like-minded states signing up with one another that they are going to behave in X way and you have technical agreements like I mentioned earlier with AUKUS, with NATO, or with the Five Eyes. You have these types of standards that these bodies, groups, and coalitions of states will abide by.

It is not as “strong” as a black-letter legal prohibition or a “new instrument” for an international legal prohibition, but I am also uncertain as to why we would need a black-letter legal prohibition if all the states were going to ignore it anyway, and we see that happen again and again with human rights treaties.

Getting back to the beginning of our conversation about humanitarian intervention, we have so many states that have signed onto human rights conventions and absolutely ignore them or violate them repeatedly, but they have signed them. Again, it is like, okay, if we have a majority of states that sign onto this new prohibition on autonomous weapons systems, but some states don’t, and the states that sign them don’t possess them and probably never will, then what is the meaningfulness of that instrument?

It is like the nuclear treaty when you think about it. We have a treaty prohibiting nuclear weapons except every state that has nuclear weapons has not signed onto the treaty.

I sit on the fence on a lot of this. I look for the rational argument. I am not trying to be pro-this or con-that and just look at the logic of the argument and say: “Well, in practical terms, in pragmatic terms, what is that going to do? Is it going to stop people? Is it going to stop militaries? Is it going to stop technical diffusion?”

No. If sovereign indigenous AI is being created, then we cannot decide who decides to sell to whom. If the United States says it is going to abide by International Traffic in Arms Regulations and the Wassenaar Arrangement and missile defense control and all of these different arms control regulations and expert controls, that is great, but that does not mean that China will. The pragmatist in me says, what are we going to do here? If it is influencing state practice, then confidence-building measures and assigning like-minded states is probably the best way you are going to go.

ANJA KASPERSEN: We are seeing more and more studies coming out about our own perception of information, the disinformation capabilities of these systems, because you were saying it could lead to system collapse, but we are also seeing with each iteration of these models their ability for persuading us is actually increasing.

You also write about—this is my interpretation—a synthetic “sleight of hand,” recognizing what does the human want and what does it prefer to see. I am bringing this to you because it takes you back to your social scientist background and your book but also your Kantian approach to these things: What do we want or prefer to see?

HEATHER ROFF: On the one hand we want deception in various things. On the military side of the house we want swarms of systems that can act deceptively to feign, to have emergent properties, to do all these types of things, but then you also want to be able to have it act that way but within a constrained environment, do you do not want it to be deceptive tout court. You want to be: “Do this, but within these boundaries. I want to bind you.”

But when it comes to your point about the human’s ability to be deceived, this is again going historically back, information operations and dystopic futures, propaganda, and all that stuff, we are inherently open to that. Because we are so malleable as human beings, we are so adaptive, because we generalize—we are generalizing “wetware machines,” if you will, if you want to take the Silicon Valley approach to it, we see what we want to see because we are linking it to something that we value, we anchor it on something, anchoring bias, we engage in all of that just because of the function of being human. That can be exploited in a number of ways.

I think we are probably more open to the AI forms of deception because we mediate our interactions through these systems at such a greater scale than any generation before. We live our lives with our faces glued to screens. We do not live our lives outside, having real human interactions not through a screen. I don’t know the last time I have seen you in person, Anja. We would have to go and make sure, is it the real, Anja? Where were we the last time we sat, where did we sit, what did we talk about, or do we have to have a code word? Is it really Anja or is it an AI bot of Anja that I am talking to?

In that respect because we are mediating our lives this way, that just means that has become in some sense our reality. Therefore if it is our reality then we are duped by it, but if I go outside and someone says, “Oh, it’s raining,” and I am like, “No, it’s not, my phone says it is sunny.” Go outside, look. You can check that. I feel like we don’t have the ability as much to think pragmatically because we are so mediated by that technology. That is why we are getting duped and deceived.

ANJA KASPERSEN: What makes you feel hopeful and excited these days? How do you see ethics playing a key role? You know Wendell Wallach, who I co-direct this program with, and we have been trying to push what we call a “new theory of ethics.” Ethics is about grappling with the tension points that are inherent in trying to embed, use, engage with, understand, and ask the right questions, as you said, about this technology, coming to terms with who we are all of our fallibilities and the fallibilities that we project onto these technologies. That is what ethics is about. It is a practical tool.

HEATHER ROFF: Yes. I would agree with that. I disagree on some things, but that is a conversation either over coffee or wine or something, in person.

Your approach to it is actually Aristotelian because Aristotle’s idea of ethics is praxis. It is the practical wisdom. For him, the ethical man—unfortunately—is the phronimos, and that means the “practically wise.” Practical wisdom is about figuring out what to do in those moral tensions: What is the right action? What is the thing to do when confronted with tensions, when confronted with dilemmas?

It is exciting to me to see a larger push of people engaging with it. I definitely agree with you that somehow there is a springing up of hundreds of AI ethicists that I never knew about, but for those of us who are actually trained in ethics at least I can say, “Okay, you don’t have to have a doctorate in philosophy and write a book on Kant, but you at least are engaging in the conversation and having the critical thinking and pushing,” and that is exciting, having that and having real conversations and being critical and using that wetware between your ears. I want to see more of that. That to me is hopeful and pushing back against the “Nothing to see here, everything is wonderful.” No it’s not. I think there is hope in critical thinking and seeing more critical thinking.

ANJA KASPERSEN: When you refer to Aristotle—I am of course well aware—the Greek word that is reflected in the writings we know about Aristotle’s views is phronimos, an actual word which basically means practical wisdom. How do we become and be alchemists of meaning? That has something that has been on my mind.

HEATHER ROFF: I like that phrase, “alchemists of meaning.”

ANJA KASPERSEN: Because there is all this focus on what it is doing and what it means to be human. I have written about how AI changes our notion of what it means to be human, but it also challenges us to come to terms with how we become alchemists of meaning for ourselves and for those around us.

HEATHER ROFF: I like it. We will have to put a pin in that and maybe write a paper together.

ANJA KASPERSEN: Thank you so much, Heather.

HEATHER ROFF: Thank you.

ANJA KASPERSEN: Thank you for your time and sharing your insights and everything. I cannot wait for our listeners to be able to tap into all of that. Thank you.

To all of our listeners, thank you for the privilege of your time. I am Anja Kaspersen, and I hope the discussion has been as informative and thought-provoking for you as it has been for us. For more insights in ethics and international affairs connect with us on social @CarnegieCouncil. A huge thank-you and shout-out to the team at the Carnegie Council, who are producing this podcast. Thank you, everyone.

Carnegie Council for Ethics in International Affairs is an independent and nonpartisan nonprofit. The views expressed within this podcast are those of the speakers and do not necessarily reflect the position of Carnegie Council.

AI, Military Ethics, & Being Alchemists of Meaning, with Heather M. Roff

Guests

Heather M. Roff

Hosted By

Anja Kaspersen

About the Series

You may also like

Responsible AI & the Ethical Trade-offs of Large Models, with Sara Hooker

AI & Warfare: A New Era for Arms Control & Deterrence, with Paul Scharre

Cybernetics, Digital Surveillance, & the Role of Unions in Tech Governance, with Elisabet Haugsbø

Contact

AI, Military Ethics, & Being Alchemists of Meaning, with Heather M. Roff

Guests

Heather M. Roff

Hosted By

Anja Kaspersen

About the Series

Share

Subscribe to the Carnegie Ethics Newsletter

You may also like

Responsible AI & the Ethical Trade-offs of Large Models, with Sara Hooker

AI & Warfare: A New Era for Arms Control & Deterrence, with Paul Scharre

Cybernetics, Digital Surveillance, & the Role of Unions in Tech Governance, with Elisabet Haugsbø

Ethics Empowered

Sign up for news & events

Contact