Benjamin Weiser at NYT:
The lawsuit began like so many others: A man named Roberto Mata sued the airline Avianca, saying he was injured when a metal serving cart struck his knee during a flight to Kennedy International Airport in New York.
When Avianca asked a Manhattan federal judge to toss out the case, Mr. Mata’s lawyers vehemently objected, submitting a 10-page brief that cited more than half a dozen relevant court decisions. There was Martinez v. Delta Air Lines, Zicherman v. Korean Air Lines and, of course, Varghese v. China Southern Airlines, with its learned discussion of federal law and “the tolling effect of the automatic stay on a statute of limitations.”
There was just one hitch: No one — not the airline’s lawyers, not even the judge himself — could find the decisions or the quotations cited and summarized in the brief.
That was because ChatGPT had invented everything.
The lawyer who created the brief, Steven A. Schwartz of the firm Levidow, Levidow & Oberman, threw himself on the mercy of the court on Thursday, saying in an affidavit that he had used the artificial intelligence program to do his legal research — “a source that has revealed itself to be unreliable.”
This case was not unique. Gerrit De Vynck explains at WP:
Recently, researchers asked two versions of OpenAI’s ChatGPT artificial intelligence chatbot where Massachusetts Institute of Technology professor Tomás Lozano-Pérez was born.
One bot said Spain and the other said Cuba. Once the system told the bots to debate the answers, the one that said Spain quickly apologized and agreed with the one with the correct answer, Cuba.
The finding, in a paper released by a team of MIT researchers last week, is the latest potential breakthrough in helping chatbots to arrive at the correct answer. The researchers proposed using different chatbots to produce multiple answers to the same question and then letting them debate each other until one answer won out. The researchers found using this “society of minds” method made them more factual.
“Language models are trained to predict the next word,” said Yilun Du, a researcher at MIT who was previously a research fellow at OpenAI, and one of the paper’s authors. “They are not trained to tell people they don’t know what they’re doing.” The result is bots that act like precocious people-pleasers, making up answers instead of admitting they simply don’t know.
The researchers’ creative approach is just the latest attempt to solve for one of the most pressing concerns in the exploding field of AI. Despite the incredible leaps in capabilities that “generative” chatbots like OpenAI’s ChatGPT, Microsoft’s Bing and Google’s Bard have demonstrated in the last six months, they still have a major fatal flaw: they make stuff up all the time.
Figuring out how to prevent or fix what the field is calling “hallucinations” has become an obsession among many tech workers, researchers and AI skeptics alike. The issue is mentioned in dozens of academic papers posted to the online database Arxiv and Big Tech CEOs like Google’s Sundar Pichai have addressed it repeatedly. As the tech gets pushed out to millions of people and integrated into critical fields including medicine and law, understanding hallucinations and finding ways to mitigate them has become even more crucial.