Volume 1, No. 65 Thursday, May 7, 2026 AI News Daily

The AI Dispatch

“All the AI News That’s Fit to Compile”


EU Regulation

EU Strikes Digital Omnibus Deal: High-Risk Deadlines Deferred, Nudifier Apps Banned

European Parliament and Council reached a provisional political agreement Thursday, pushing the most contentious AI Act deadlines to 2027 and 2028 — and adding a new outright ban on AI systems designed to generate non-consensual intimate imagery.

European negotiators announced a provisional political agreement on the Digital Omnibus on Thursday, salvaging a reform package that legal analysts had pronounced effectively dead just nine days ago. The deal delays compliance deadlines for Annex III high-risk AI systems — including those used in employment, education, credit scoring, and access to essential services — to December 2027, and pushes Annex I regulated-product AI obligations to August 2028. Watermarking requirements for AI-generated content, originally slated for August 2026, will now take effect in December 2026.

The reversal is dramatic. This newspaper’s April 28 edition reported the collapse of the trilogue track, citing analysis from IAPP, Modulos, and DLA Piper that no realistic path remained to defer the August 2 compliance deadline for prohibited AI practices before Parliament’s summer recess. That analysis assumed a particular procedural sequence; the deal struck Thursday rewrites it. Negotiators moved the trilogue forward, agreed on a single consolidated package rather than the staged reform originally proposed, and traded substance for speed — including, most notably, the addition of an entirely new outright prohibition that did not appear in the European Commission’s original Digital Omnibus draft.

That new ban targets so-called “nudifier apps” — AI systems whose primary purpose is to generate non-consensual intimate images. The prohibition takes effect across all twenty-seven member states with a compliance window of December 2026. It is the first EU-wide ban specifically targeting a single AI application category by its design intent, rather than by deployment context or risk classification. Civil-society groups had pushed for the addition during the late stages of negotiation; the Parliament rapporteur publicly framed it as a non-negotiable price of agreement.

The trade-off the package represents is straightforward. Industry obtains relief on the timeline pressure that had driven the most intense compliance scramble of the past six months — particularly for Annex III deployers who had argued that the technical documentation and post-market monitoring obligations could not realistically be operationalized by August. In exchange, the regulation gains a hard-edged consumer-protection win that polls well across the political spectrum and that lawmakers can point to as evidence that the AI Act framework can still produce enforceable safeguards even as it accommodates implementation realities.

The agreement remains provisional. It requires formal adoption by both the European Parliament and the Council of the EU before entering into force; analysts expect adoption votes in late June or early July. Until that adoption, the original August 2, 2026 deadline for prohibited AI practices technically remains in effect — though enforcement bodies in several member states have signaled they will exercise discretion during the transition. The legal posture, in other words, has flipped from “comply with the original or face penalties” to “comply with the original or watch the deal’s ratification calendar carefully.”

Research Scoop

ReasonMaxxer Matches Full RL Training at 1/1000th the Cost

Researchers from the University of Southern California and the U.S. Army’s DEVCOM Army Research Laboratory published a paper this week titled “Rethinking RL for LLM Reasoning” that calls into question the necessity of expensive reinforcement learning training pipelines used to produce frontier reasoning models. Their analysis of training trajectories across six benchmarks and three model families found that RL affects only 1 to 3 percent of token positions during reasoning — and that at every one of those positions, the post-RL model selects from the same top-five alternatives the base model already considered.

The implication, the authors argue, is that RL is not teaching new skills. It is reweighting choices at a small number of high-entropy decision points — the places where the base model was already uncertain among a handful of plausible next steps. Everywhere else, RL changes nothing. Their proposed method, ReasonMaxxer, applies a targeted contrastive loss only at those high-entropy positions, replacing the full RL pipeline with what amounts to a surgical fine-tune.

The performance numbers are the most striking part. On Qwen2.5-7B, ReasonMaxxer reached 70.6 percent on MATH-500, compared with 65.6 percent for SimpleRL-Zoo — a widely-used full-RL baseline — while requiring roughly one thousand times less compute. The training run completed in minutes on a single GPU, against days on a multi-GPU cluster for the full pipeline. Across all six benchmarks tested (including MATH-500, GSM8K, and four additional math reasoning sets) and across all three model families (Qwen, Llama, and Phi), ReasonMaxxer either matched or exceeded the full-RL comparison at every checkpoint.

If the result generalizes, it forces a hard re-examination of what frontier post-training pipelines are actually doing. The conventional narrative is that RL teaches reasoning — that the gap between a base model and a reasoning-tuned model represents new capability acquired through search and reward signal. The USC/ARL result suggests the gap is smaller and more localized than that: the reasoning capability was largely latent in the base model, and RL was an expensive way to surface it. The cheaper, more targeted alternative the authors propose does the surfacing directly.

Independent replication will be required before the field updates its training playbook, and the paper’s own caveats — small-model focus, math-heavy benchmark suite, and the open question of how the result interacts with much longer reasoning chains — will need to be tested. But the central finding is sharp enough to demand a response. If RL is really just touching 1 to 3 percent of tokens, the question is no longer whether ReasonMaxxer-style methods can compete, but how much of the past two years of frontier compute spend was necessary at all.

RL affects only 1–3% of token positions during reasoning — always selecting from the base model’s existing top-five alternatives, not teaching new skills. — Authors of “Rethinking RL for LLM Reasoning” (arXiv:2605.06241)

Policy & Models

The Mid-Week Wire

State legislatures continue to outrun Washington on deepfakes, while Zyphra previews the first MoE diffusion model converted from an autoregressive LLM.

State Laws

Deepfake Performer Bill Sent to Governor as 30 States Cover Election Deepfakes

A state legislature on Thursday approved HB 2137, an AI deepfake bill prohibiting harmful uses of digital imitations and requiring disclosure of synthetic performers in advertising. The bill now heads to the governor’s desk. The vote lands alongside a Biometric Update analysis published the same day that counts thirty U.S. states with enacted election-deepfake laws — up from twenty-eight at the end of 2025 — as state legislatures race to put protections in place before the 2026 midterm cycle hits its general-election phase. Federal legislation remains stalled in committee; the practical regulatory ceiling for synthetic political media is now being set state by state. Several of the existing laws have drawn First Amendment challenges, and the constitutional fight over compelled disclosure obligations is expected to reach the federal appellate level before the November elections. The pace of state-level enactment, however, has continued through every adverse ruling so far.

Open Models

Zyphra Teases ZAYA1-8B-Diffusion-Preview: MoE Converted From Autoregressive

Days after its ZAYA1-8B autoregressive launch, Zyphra previewed a diffusion variant — ZAYA1-8B-Diffusion-Preview — that the company is positioning as the first mixture-of-experts model converted from an autoregressive LLM into a diffusion architecture. Zyphra reports up to a 7.7× throughput speedup on generation tasks compared with the standard autoregressive model, attributing the gain to the parallel decoding properties of the diffusion approach combined with sparse expert routing. The release is explicitly framed as an early research preview rather than a production drop. No public weights accompanied the initial announcement, and no third-party benchmarks have been published yet. If the conversion technique generalizes to other autoregressive MoE models — a question Zyphra’s preview deliberately leaves open — the implications for inference cost on open-weight stacks could be significant.

Toolbox

Codex CLI v0.129.0 Ships Modal Vim, Resume/Fork Redesign, /hooks Browser

OpenAI’s Codex CLI version 0.129.0, released Thursday, is one of the most workflow-dense single-version drops the tool has shipped this year. The release leans into power-user ergonomics: a long-requested modal Vim editor for the composer, a redesigned session picker, and a new in-TUI lifecycle hook browser. Highlights:

  • Modal Vim editing for the composer input area, toggle-able per session, with full Vim keybindings (normal/insert/visual modes, registers, dot-repeat). Off by default; opt in via /config.
  • Redesigned resume/fork session picker with improved scrollback rendering and inline diff workflows when forking from a mid-session checkpoint.
  • New /hooks browser for viewing and editing lifecycle hooks inline inside the TUI, without leaving the active turn. Supports both per-project and user-level hooks.
  • Expanded plugin access controls, including remote sync of plugin configurations across machines and finer discoverability toggles for shared plugin registries.
  • Python SDK republished as openai-codex on PyPI (importable as openai_codex), with pinned runtime-generated types and concurrent turn routing for parallel agent sessions.

The Vim editor addition is the surface-level headline, but the /hooks browser is arguably the more consequential change: lifecycle hooks have been the primary mechanism for customizing Codex behavior at organizational scale, and editing them previously required leaving the TUI for an external editor. The shift to in-line editing brings Codex closer to parity with Claude Code on hook-driven extensibility while preserving its distinct session model.

Briefs

From the Desk

A cross-model expansion for Copilot CLI’s Rubber Duck reviewer, and a synthetic-reasoning framework that exposes a blind spot in current RL evaluation.

Rubber Duck Goes Cross-Model

GitHub’s Copilot CLI Rubber Duck — the second-opinion review agent that critiques the primary agent’s plans and diffs — expanded Thursday to cover both major model families. GPT-orchestrated sessions now receive a Claude-powered critic; Claude-orchestrated sessions have their reviewer upgraded from GPT-5.4 to GPT-5.5. Previously, Rubber Duck was exclusive to Claude sessions, which limited its usefulness for teams standardized on Copilot’s GPT pathway. The agent fires automatically after file edits, planning checkpoints, and explicit second-opinion requests; it requires the /experimental flag toggled on. The change is small in surface area but consequential: it makes the second-pair-of-eyes pattern symmetric across model families, removing what had been one of the more visible asymmetries in the Copilot CLI feature set.

ScaleLogic on Expressiveness

A second arXiv preprint this week — ScaleLogic (arXiv:2605.06638) — introduces a synthetic reasoning framework with independent control over proof planning depth and logical expressiveness. The framework allows researchers to disentangle two variables that are usually conflated in math-heavy benchmark suites: how many inferential steps a problem requires, and how rich the underlying logical vocabulary is. RL improves performance reliably on low-expressiveness tasks (first-order logic with bounded quantifiers), the authors find — but struggles to generalize on high-expressiveness domains requiring second-order reasoning or modal operators. The implication: the math and arithmetic-dominated benchmarks the field relies on may systematically underestimate the structural limits of current LLMs, because they fix expressiveness at a level RL handles well. Paired with the ReasonMaxxer result above, the two papers suggest the same uncomfortable conclusion from opposite directions: current RL is doing less than the field had assumed, and is being evaluated on the easier half of the problem space.

GitHub Trending — Thursday Snapshot

GitHub Trending — Thursday Snapshot
Repo Language Today’s Signal What it does
NousResearch/hermes-agent TS / Python v0.13.0 today Self-hosted autonomous AI agent framework — “Tenacity” release ships May 7.
mattpocock/skills TypeScript +44.5K May Curated Claude Code skills collection — reusable agent capabilities packaged as composable units.
nexu-io/open-design TypeScript +38K May Local-first open-source design system generator — produces brand tokens, components, and docs from a single config.
antoinezambelli/forge Python ~91K stars Reliability framework for self-hosted LLM tool-calling — retry, validation, and schema-enforced output for OSS models.
colbymchenry/codegraph TypeScript New Pre-indexed code knowledge graph for Claude Code — serves repo-wide symbol relationships at agent-friendly latencies.
jesseduffield/lazygit Go Trending Terminal UI for git — perennial favorite, riding fresh attention from the agent-orchestration community.