AI could (literally) preserve our culture

With help from Mohar Chatterjee

Amid the anxiety about its misuse in elections and for targeted harassment this week, you might have missed another, slightly more feel-good viral story about artificial intelligence.

The Vesuvius Challenge, a crowdsourced project with the goal of reviving unreadable, 2,000-year-old scrolls from the ruins of the Mount Vesuvius eruption, announced a $700,000 grand prize for three young scientists who helped to for the first time produce a readable transcript from a scroll’s completely charred remains.

“Working independently, each member of our team of papyrologists recovered more text from this submission than any other,” the challenge’s organizers wrote on their website, noting that they recovered more than 2,000 individual ancient Greek characters using artificial intelligence software, far surpassing the organizers’ predictions at the start of the contest.

The three researchers, Youssef Nader, Luke Farritor, and Julian Schilliger, passed on their handiwork to that group of papyrologists who, the project’s organizers say, have now been able to read roughly five percent of the burned scroll they analyzed. It appears to be an heretofore undiscovered text from an Epicurean philosopher, praising the virtues of food and general abundance, and implicitly criticizing their opposite numbers in the Stoics, if nothing else fueling the fire (no pun intended) of a now millennia-old philosophical and historiographical argument.

Which is all well and good for the past, but what does the ability to resuscitate texts that were previously thought unreadable and destroyed harbor for the future?

Get ready for ‘virtual unwrapping’ in medicine — and new ways to save digital media.

I spoke today with Stephen Parsons, the Vesuvius Challenge project lead and a researcher in the University of Kentucky laboratory of Brent Seales, one of the challenge’s founders. At Seales’ lab, Parsons and his colleagues used cutting-edge digital and CT scanning techniques to see the inside of the burned scrolls without unrolling them, then posting the scans for the Vesuvius Challenge researchers-slash-competitors to further parse.

Parsons says that the AI revolution of the past decade, specifically the convolutional neural networks that have transformed image recognition (and helped enable the aforementioned political deepfakes), has unlocked a new frontier in his field not just for historical research, but media preservation writ large and even potentially medicine.

“We expect that the segmentation and machine learning approaches [to digital imaging] will have a lot of other applications,” Parsons told me today. “It’s a little early to say, but with this concept of virtual unwrapping, you could do a CT scan and get a different view of the ‘unwrapped’ colon, and inspect the colon lining with a better visualization than is otherwise possible.”

That decidedly up-close-and-personal example aside, Parsons says the research community still has a lot of work to do on the Herculaneum papyrus scrolls, named for the Roman villa in which they were found. Much of it is extremely technical — finding ways to condense and make more accessible the many, many terabytes of data occupied by an ultra-high-resolution scan of the inside of a scroll, much less continuing to assemble them into something that is sequential and readable, an ongoing project for the Vesuvius research community despite the challenge winners’ breakthrough.

“It seemed much more likely when we started out that either we would figure out there was nothing in there, or someone would crack it open and find a method that just read the whole thing,” Parsons said. “It turned out they landed right on our mark [for incremental progress], which is incredible.”

Their technique could also help with newer and equally endangered forms of storage: Researchers in the United Kingdom have used it to recover television broadcasts thought otherwise hopelessly lost to the sands of time. And similar machine learning techniques to those used in the Vesuvius Challenge have been applied not just to actual, physical documents, but to language itself, as with DeepMind’s Ithaca project that uses a predictive transformer model to restore lost chunks of ancient text otherwise irretrievable by photographic means.

Today’s most heated debates about AI, both policy and the technology itself, are focused on its ability to create fiction, whether a slanderous or misleading political deepfake, outright sexual harassment, or anything in between. If nothing else, the Vesuvius Challenge project is likely a welcome reminder to the AI community that the powerful tools for computation and prediction that make those harms possible are also more than capable of performing helpful and humanistic tasks — more redolent of a bygone Microsoft Encarta-style vision of the future than a cyberpunk dystopia.

Kyle Rector, the CIA’s deputy AI director, broke down what the clandestine intelligence agency is learning about using and testing generative AI over a virtual coffee chat Thursday. The event was hosted by the Intelligence and National Security Alliance and sponsored by Microsoft.

Turns out, the CIA dedicates more energy to testing the output of powerful AI models than it puts into bulletproofing the input training data — especially since AI models trained with good data can still produce bad results, Rector said.

For an enterprise as vast as the CIA, “it’s difficult if not impossible at times to fully track authenticity of data,” Rector explained. “We instead believe that you should probably and primarily focus on looking at the outputs of models. So that’s been really the focus for us in recent months and years.”

To experiment with generative AI, the intelligence agency is turning to the open source ecosystem. “Randy Nixon and those over in the open source enterprise have really taken a major lead for the agency in leveraging large language models and generative AI,” Rector said. “The open source enterprise is really one of the best examples of where that needs to happen.”

One reason for the agency to be so bullish on AI is that the CIA is living through a “data explosion,” Rector said.

“We really have to turn to things like AI to help us triage and work through the large volume of data that’s coming in each and every day,” he told the audience. — Mohar Chatterjee

Europe is turning to the Big Tech companies to safeguard against false AI-generated election content.

POLITICO’s Clothilde Goujard reported yesterday on the new demand from the European Union that platforms watermark AI content, to be enforced through the bloc’s Digital Services Act. Commissioner Thierry Breton said the EU “can’t have half-baked measures” with this year’s upcoming elections, but didn’t specify when the new requirements would come into place.

OpenAI and Meta have already committed to labeling such content. Breton says the commission will issue broader guidelines for fighting online false information by March, which will require in his words a “rapid reaction mechanism for any kind of incident.” — Derek Robertson