2 min read

Microsoft Enters the Model Race with MAI-Transcribe, MAI-Voice, and MAI-Image

Back to News

Microsoft has launched three new MAI (Microsoft AI) models, available through Microsoft Foundry on Azure. The release represents Microsoft’s clearest signal yet that it intends to build — not just distribute — frontier AI capabilities.

The Three Models

MAI-Transcribe-1

A speech-to-text model supporting the top 25 most-used languages:

MAI-Voice-1

A high-fidelity speech generation model:

MAI-Image-2

A text-to-image model focused on quality and control:

Microsoft Foundry

All three models are available through Microsoft Foundry, the company’s central platform for model discovery, evaluation, fine-tuning, and deployment. The platform also hosts models from OpenAI, Meta, and other partners, with built-in responsible AI guardrails and governance controls.

Notably, Microsoft expanded Foundry Local into public preview the same week — allowing enterprise and sovereign customers to run AI workloads on-premises without cloud connectivity.

The Strategic Angle

These are the same models powering Copilot, Bing, and PowerPoint. By making them available as standalone APIs, Microsoft is:

  1. Reducing dependency on OpenAI for its own product stack
  2. Building a cost-competitive alternative for enterprise developers who don’t need frontier reasoning
  3. Creating a vertically integrated AI platform where Microsoft controls the full stack from model to application

The message is clear: Microsoft is no longer content to be just the infrastructure layer behind someone else’s models.


Source: microsoft.ai, microsoft.com, redmondmag.com