Microsoft has launched three new MAI (Microsoft AI) models, available through Microsoft Foundry on Azure. The release represents Microsoft’s clearest signal yet that it intends to build — not just distribute — frontier AI capabilities.
The Three Models
MAI-Transcribe-1
A speech-to-text model supporting the top 25 most-used languages:
- 3.9% average Word Error Rate on the FLEURS benchmark — state-of-the-art accuracy
- 2.5x faster batch transcription than previous Azure offerings
- Engineered for noisy, real-world audio environments
MAI-Voice-1
A high-fidelity speech generation model:
- Produces 60 seconds of natural, expressive audio in under one second on a single GPU
- Supports custom voice creation from just a few seconds of audio samples
- Designed for enterprise voice applications, accessibility, and content creation
MAI-Image-2
A text-to-image model focused on quality and control:
- Debuted at #3 on the Arena.ai leaderboard for image model families
- Excels in natural lighting, accurate skin tones, and texture fidelity
- Generates clear in-image text — a persistent weakness in competing models
- An efficient variant, MAI-Image-2e, followed on April 14 for high-volume production workflows
Microsoft Foundry
All three models are available through Microsoft Foundry, the company’s central platform for model discovery, evaluation, fine-tuning, and deployment. The platform also hosts models from OpenAI, Meta, and other partners, with built-in responsible AI guardrails and governance controls.
Notably, Microsoft expanded Foundry Local into public preview the same week — allowing enterprise and sovereign customers to run AI workloads on-premises without cloud connectivity.
The Strategic Angle
These are the same models powering Copilot, Bing, and PowerPoint. By making them available as standalone APIs, Microsoft is:
- Reducing dependency on OpenAI for its own product stack
- Building a cost-competitive alternative for enterprise developers who don’t need frontier reasoning
- Creating a vertically integrated AI platform where Microsoft controls the full stack from model to application
The message is clear: Microsoft is no longer content to be just the infrastructure layer behind someone else’s models.
Source: microsoft.ai, microsoft.com, redmondmag.com