Home Stories About RSS Feed
2 min read

Government Pre-Release AI Safety Reviews Mandated

Back to News

In a landmark agreement that fundamentally alters the trajectory of AI deployment, the world’s leading artificial intelligence laboratories have formally agreed to a mandated pre-release safety review framework managed by a joint US-EU AI Safety Institute consortium.

The End of “Release and Patch”

For years, the industry operated on a model of rapid deployment, where safety issues and “jailbreaks” were often discovered by the public and patched retroactively. With frontier models now possessing capabilities touching on cybersecurity, bio-risk, and critical infrastructure management, governments have stepped in to end this practice.

Under the new framework, companies developing models above a specified compute threshold (currently targeting the GPT-5 and Claude 4 class of models) must provide the Safety Institute with early access to the models during the final stages of training.

The Red-Teaming Process

Government-backed researchers will spend several weeks rigorously “red-teaming” the models, attempting to coax them into generating harmful outputs, providing instructions for illegal acts, or demonstrating autonomous malicious behaviors.

If a model fails these safety evaluations, the developer must implement structural mitigations and submit the model for re-evaluation before it can be deployed commercially or open-sourced.

Industry and Open Source Reactions

The major labs—including OpenAI, Google DeepMind, and Anthropic—have largely supported the measure, viewing it as necessary for public trust and regulatory stability.

However, the open-source AI community has voiced strong opposition, arguing that the compute thresholds are arbitrary and that the mandated review process will stifle innovation and consolidate power among the few massive tech companies capable of navigating the bureaucratic hurdles.


Source: whitehouse.gov, commission.europa.eu