At, users are invited to enter descriptions of scenarios, whereupon Delphi—an artificial neural network trained on human moral judgments—responds with an evaluation of the input scenario. What could go wrong?

Delphi is full of opinions. Spending $15 on Christmas presents for one’s own children?—It’s reasonable, says Delphi. Spending $15,000 for the same?—It’s okay. Spending $15 billion?—It’s wasteful! But $15 million is okay. Less privileged parents and children may feel that Delphi’s opinions skew in favor of the rich.

Delphi was released last October by the Allen Institute for AI, and initially billed as a “moral machine.” Soon after the website went public, examples of Delphi’s laughable judgments began circulating on the usual social media sites. It seemed relatively easy for visitors to the Delphi website to prod Delphi into making pronouncements that were not just favorable to the rich, but offensively racist, sexist, or pro-genocidal. Within days, the website added a disclaimer, and shortly thereafter came a three-checkbox set of terms & conditions that users had to click through before being allowed to ask Delphi anything. In the shift from version 1.0 to 1.0.4, some of Delphi’s most laughable or inconsistent judgments disappeared, but the terms & conditions now make clear that Delphi can still be prompted to make offensive statements, and its output is still risible. 

Similar to last year’s GPT-3, Delphi is a “transformer” type of artificial neural network, capable of mimicking the short-to-medium range statistical structure of its training data. GPT-3 was trained on a huge dataset scraped from multiple Internet sources, and wowed with its ability to produce paragraphs of smoothly flowing, seemingly human-level prose. It could even transform some prompts into software code. Delphi was trained on a more restricted set of data than GPT-3. Descriptions of morally significant situations culled from textbooks, novels, agony-aunt columns, and similar sources were fed via Amazon’s Mechanical Turk platform to human workers (aka MTurkers) who were tasked with providing their evaluations of these situations. The situations and their evaluations were used to train Delphi.

Delphi is literally full of MTurkers’ opinions, many of them conflicting. Its response to any prompt is a statistically determined amalgam of opinions about verbally similar situations. Even though a prompt may describe a completely new situation, it produces an intelligible if not fully intelligent response. Users quickly attempted to find word combinations that caused Delphi to make silly responses. A favorite in v1.0 (gone from v1.0.4): Delphi claimed it is immoral to violently mash potatoes to feed one’s children. MTurkers reasonably dislike violence, but Delphi has no common sense whatsoever to distinguish harmless from harmful violence, far less to realize the therapeutic value of vigorous potato mashing.

Getting Delphi to make silly pronouncements about violence to potatoes may just seem like harmless fun, but the consequences of releasing Delphi to the world as a start towards a serious solution for a serious problem are potentially much more severe. In a blog post the researchers concede that mistakes were made, and admit to being surprised at just how adversarial people on the Internet would be in trying to expose Delphi’s limitations. The terms & conditions for v1.0.4 acknowledge this, insisting that Delphi is “designed to investigate the promises and more importantly, the limitations of modeling people’s moral judgments on a variety of everyday situations.” But why were these limitations not obvious before Delphi was released to the court of public ridicule? Anyone only half familiar with the story of Microsoft Tay, which had to be pulled from the Internet within 24 hours because of its racist and sexist tweets, should have realized that some version of history would repeat itself. Delphi wasn’t quite as bad as Tay because it was not learning from user interactions in real time, but it was bad enough to force the legalistic requirement of agreeing to terms & conditions upon users. Despite the disclaimers, ordinary, less adversarially minded users might be unable to find Delphi’s flaws for themselves, and more inclined to attribute some sort of moral authority to the hyperbolically named moral oracle.

The problems go deeper than a lack of hindsight and foresight. Delphi demonstrates a kind of disciplinary hubris and lack of basic scholarship that afflicts too much (but not all) of the work by computer scientists in the expanding arena of “AI ethics.” In the research paper released simultaneously with the website as a preprint (i.e., without the benefit of peer review), Delphi’s authors dismiss the application of ethical theories as “arbitrary and simplistic.” They reject “top-down” approaches in favor of their “bottom-up” approach to learning ethics directly from human judgments. But they fail to mention that these terms originate from two decades of scholarship about top-down and bottom-up approaches to machine morality, and they display no awareness of the arguments for why neither approach alone will be enough. An understanding of ethics as more than a technical problem to be solved with more and better machine learning might have prevented these errors.

This is not just about Delphi. Our broader concern is that too many computer scientists are stuck in a rut of thinking that the solution to bad technology is more technology, and that they alone are clever enough to solve problems of their own creation. In their rush to release the next big thing, they are dismissive of expertise outside their own field. More money thrown at ever larger programs and the machines required to run them will not address the complex issues of how to build sociotechnical spaces where people can flourish.

Colin Allen is distinguished professor of history & philosophy of science at the University of Pittsburgh, where Brett Karlan is a postdoctoral research fellow. They work together on “The Machine Wisdom Project” which is funded by the Templeton World Charity Foundation. Allen is not related in any way to the Allen Institute. He is co-author with Wendell Wallach of Moral Machines, Teaching Robots Right from Wrong, Oxford University Press 2009.

Allen's main areas of research concern the philosophical foundations of cognitive science and neuroscience. He is particularly interested in the scientific study of cognition in nonhuman animals and computers, and he has published widely on topics in the philosophy of mind, philosophy of biology, and artificial intelligence. He also has several projects in the area of humanities computing. He is a faculty affiliate of Pitt's Digital Studies & Methods program and of the CMU/PItt Center for the Neural Basis of Cognition.

Brett Karlan received his PhD from Princeton University in June 2020. He works on epistemology and ethics, focusing on normative and theoretical questions in cognitive science in particular. While at Pitt he is working on the a project "Practical Wisdom and Machine Intelligence" supported by a grant to Colin Allen by the Templeton World Charity Foundation.

You may also like

JUN 3, 2024 Podcast

The Intersection of AI, Ethics, & Humanity, with Wendell Wallach

In this wide-ranging discussion, Carnegie Council fellows Samantha Hubner & Wendell Wallach discuss how thinking about the history of machine ethics can inform responsible AI development.

MAY 15, 2024 Podcast

Beneficial AI: Moving Beyond Risks, with Raja Chatila

In this episode of the "AIEI" podcast, Senior Fellow Anja Kaspersen engages with Sorbonne University's Raja Chatila, exploring the integration of robotics, AI, and ethics.

MAY 9, 2024 Podcast

The State of AI Safety in China, with Kwan Yee Ng & Brian Tse

In this "AIEI" podcast, Carnegie-Uehiro Fellow Wendell Wallach speaks with Concordia AI's Kwan Yee Ng & Brian Tse about coordinating emerging tech governance across the world.