Researchers from Oxford and Stanford have built "Data Journalist Agent" (Data2Story), a Claude Code skill that turns a CSV file into a full interactive online article.
Seven agents, one editorial workflowBehind each article sits a chain of seven specialized agents the team calls a "virtual newsroom."
The "Analyst" runs code instead of guessing numbers.
Overall, 74 percent preferred the agent article, 25 percent the human version, and 2 percent called it a draw.
When measuring which statements from the human-written article also appear in the agent-generated article, Data2Story covers about half.
Researchers from Oxford and Stanford have built "Data Journalist Agent" (Data2Story), a Claude Code skill that turns a CSV file into a full interactive online article. The output includes research context, statistics, graphics, and a built-in feature linking every visible statement, chart, and interactive element to its evidence, be it code, data sources, or external URLs. The skill is a predefined task set that Claude Code loads and runs on command, orchestrating several specialized agent roles.
Data journalism eats time like few other beats. A single investigation can keep a team busy for weeks. A new AI pipeline aims to automate most of that work without sacrificing verifiability.
The authors demo the system on a dataset that's gotten little coverage so far, the 2026 FIFA World Cup schedule. From the schedule and host cities, it generates a climate-focused article with an interactive map.
About four in ten matches are slated for locations the players' union FIFPRO classifies as extremely high heat risk, with humidity rather than air temperature as the main driver. The authors stress these are typical climate conditions, not a forecast for the actual tournament.
An "Inspector" panel makes every claim traceable
The system's core feature is the "Inspector," a panel showing structured evidence for each sentence and asset. Every annotated sentence, chart, and interactive element gets its own index card displaying either the exact line of code (plus the data file behind it) or the external URL backing a claim.
This lets 93 percent of all visible statements be checked for their origin. That doesn't mean they're correct, the researchers stress, just verifiable. Doubt a figure? Run the code. The baseline for human-written articles is 25 percent, partly because journalists rarely publish analysis code. The gap reflects both a hole in journalism practice and a strength of the system, the researchers claim.
Seven agents, one editorial workflow
Behind each article sits a chain of seven specialized agents the team calls a "virtual newsroom." The "Detective" runs web searches for context, since a table alone rarely tells the full story. For the World Cup data, it links host cities to FIFPRO heat risk ratings and Open-Meteo climate data.
The "Analyst" runs code instead of guessing numbers. The "Editor" picks which findings drive the narrative. The "Designer" chooses the right medium, say a map for geography or an audio clip for music. The "Programmer" builds the HTML page, the "Auditor" checks layout for errors, and the "Inspector" ties everything back to sources.
The base model is Claude Opus 4.7 running on Claude Code. For images, video, and audio, the system pulls in OpenRouter models like gpt-5.4-image-2, seedance-2.0, and lyria-3-pro-preview.
53 readers rate agent articles higher than human originals
The researchers paired 18 public datasets with matching human-written originals from three distinct sources. They used the concise briefings from The Economist, the lavishly designed long reads from The Pudding, and the community datasets from TidyTuesday. 53 recruited readers rated both versions across five categories, including visual design, narrative rhythm, data transparency, verifiability of claims, and insight gained.
Data2Story won all five categories. The biggest lead was in transparency, at +1.49 on a seven-point scale. Overall, 74 percent preferred the agent article, 25 percent the human version, and 2 percent called it a draw.
By source, the picture shifts. The agent won clearly in data-heavy Economist briefings and TidyTuesday pieces. Against Pudding reports, which design teams often spend weeks crafting, it was a statistical tie. The agent couldn't beat handcrafted presentation.
When measuring which statements from the human-written article also appear in the agent-generated article, Data2Story covers about half. Conversely, only 35 percent of the agent’s statements are found in the human text.
The agent adds plenty of its own angles but only partly captures the editorial core. The gap is widest in short, formulaic Economist briefings, where the agent reproduces 73 percent of human findings, likely because those texts hew closely to standard statistics the agent calculates anyway.
Where humans still win
The researchers flag three areas where human authors stay ahead. On editorial perspective, reporters explain things the data can't. A Repair Cafe report traces low repair rates to manufacturers of phones, cars, and tractors deliberately blocking access to diagnostic tools and parts. That's a theory grounded in reporting, not data. The agent shows what breaks, but the "why" stays hidden.
On creative design, a Pudding piece on stand-up comedy turns the full transcript of an Ali Wong show into a user interface. Next to each line sits a circle sized to the length of the laugh. For the same content, the agent just embeds a static YouTube thumbnail.
On dense single graphics, an Economist visualization on the space race layers government and commercial providers, success rates, and annotations into one image. The agent scatters the same data across several charts, and the main point gets lost.
A collaborator, not a replacement
The authors frame Data2Story as a newsroom tool. Humans bring perspective and reporting, agents handle computation, graphics, and machine-verifiable sourcing.
It could prove most useful for topics newsrooms can't cover for lack of capacity, niche datasets that would otherwise never become a readable story. One limitation is that Data2Story currently runs on full autopilot. A version with human-in-the-loop feedback is left for future work. The site is live at data2story.github.io, and the code is on GitHub.
Machine-verifiability is exactly where current AI systems keep stumbling. A recent Peking University benchmark found that leading models often give the right answer in document analysis but cite the wrong sources, a problem the researchers call "attribution hallucination."
Another study suggests AI search agents often don't research at all but mostly confirm what they already know from training. Data2Story tries to close this gap by having the analyst calculate figures with runnable code instead of guessing and having the Inspector link every statement to its source. Perplexity takes a similar tack with "Search as Code," where models write their own web searches instead of calling a black-box API.