Guide

Learn more about how to use AI models with TypingMind

Filtered by: openrouter
xAI: Grok 4.20 Multi-Agent Beta logoxAI: Grok 4.20 Multi-Agent Beta via OpenRouter

Access xAI: Grok 4.20 Multi-Agent Beta via OpenRouter

Grok 4.20 Multi-Agent Beta is a variant of xAI’s Grok 4.20 designed for collaborative, agent-based workflows. Multiple agents operate in parallel to conduct deep research, coordinate tool use, and synthesize information across complex tasks. Reasoning effort behavior: - low / medium: 4 agents - high / xhigh: 16 agents

3 min read
xAI: Grok 4.20 Beta logoxAI: Grok 4.20 Beta via OpenRouter

Access xAI: Grok 4.20 Beta via OpenRouter

Grok 4.20 Beta is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently precise and truthful responses. Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens)

3 min read
Hunter Alpha logoHunter Alpha via OpenRouter

Access Hunter Alpha via OpenRouter

Hunter Alpha is a 1 Trillion parameter + 1M token context frontier intelligence model built for agentic use. It excels at long-horizon planning, complex reasoning, and sustained multi-step task execution, with the reliability and instruction-following precision that frameworks like OpenClaw need. **Note:** All prompts and completions for this model are logged by the provider and may be used to improve the model.

3 min read
Healer Alpha logoHealer Alpha via OpenRouter

Access Healer Alpha via OpenRouter

Healer Alpha is a frontier omni-modal model with vision, hearing, reasoning, and action capabilities. It brings the full power of agentic intelligence into the real world: natively perceiving visual and audio inputs, reasoning across modalities, and executing complex multi-step tasks with precision and reliability. **Note:** All prompts and completions for this model are logged by the provider and may be used to improve the model.

3 min read
NVIDIA: Nemotron 3 Super (free) logoNVIDIA: Nemotron 3 Super (free) via OpenRouter

Access NVIDIA: Nemotron 3 Super (free) via OpenRouter

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models. The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified. Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.

3 min read
ByteDance Seed: Seed-2.0-Lite logoByteDance Seed: Seed-2.0-Lite via OpenRouter

Access ByteDance Seed: Seed-2.0-Lite via OpenRouter

Seed-2.0-Lite is a versatile, cost‑efficient enterprise workhorse that delivers strong multimodal and agent capabilities while offering noticeably lower latency, making it a practical default choice for most production workloads across text, vision, and tools. Engineered for high-frequency visual understanding and agentic workflows, it's an ideal choice for deployment at scale with minimal latency.

3 min read
Qwen: Qwen3.5-9B logoQwen: Qwen3.5-9B via OpenRouter

Access Qwen: Qwen3.5-9B via OpenRouter

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design with early fusion of multimodal tokens, allowing the model to process and reason across text and images within the same context.

3 min read
OpenAI: GPT-5.4 Pro logoOpenAI: GPT-5.4 Pro via OpenRouter

Access OpenAI: GPT-5.4 Pro via OpenRouter

GPT-5.4 Pro is OpenAI's most advanced model, building on GPT-5.4's unified architecture with enhanced reasoning capabilities for complex, high-stakes tasks. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs. Optimized for step-by-step reasoning, instruction following, and accuracy, GPT-5.4 Pro excels at agentic coding, long-context workflows, and multi-step problem solving.

3 min read
OpenAI: GPT-5.4 logoOpenAI: GPT-5.4 via OpenRouter

Access OpenAI: GPT-5.4 via OpenRouter

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, enabling high-context reasoning, coding, and multimodal analysis within the same workflow. The model delivers improved performance in coding, document understanding, tool use, and instruction following. It is designed as a strong default for both general-purpose tasks and software engineering, capable of generating production-quality code, synthesizing information across multiple sources, and executing complex multi-step workflows with fewer iterations and greater token efficiency.

3 min read
Inception: Mercury 2 logoInception: Mercury 2 via OpenRouter

Access Inception: Mercury 2 via OpenRouter

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving >1,000 tokens/sec on standard GPUs. Mercury 2 is 5x+ faster than leading speed-optimized LLMs like Claude 4.5 Haiku and GPT 5 Mini, at a fraction of the cost. Mercury 2 supports tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output. Built for coding workflows where latency compounds, real-time voice/search, and agent loops. OpenAI API compatible. Read more in the [blog post](https://www.inceptionlabs.ai/blog/introducing-mercury-2).

3 min read
OpenAI: GPT-5.3 Chat logoOpenAI: GPT-5.3 Chat via OpenRouter

Access OpenAI: GPT-5.3 Chat via OpenRouter

GPT-5.3 Chat is an update to ChatGPT's most-used model that makes everyday conversations smoother, more useful, and more directly helpful. It delivers more accurate answers with better contextualization and significantly reduces unnecessary refusals, caveats, and overly cautious phrasing that can interrupt conversational flow.

3 min read
Google: Gemini 3.1 Flash Lite Preview logoGoogle: Gemini 3.1 Flash Lite Preview via OpenRouter

Access Google: Gemini 3.1 Flash Lite Preview via OpenRouter

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across key capabilities. Improvements span audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion. Supports full thinking levels (minimal, low, medium, high) for fine-grained cost/performance trade-offs. Priced at half the cost of Gemini 3 Flash.

3 min read
ByteDance Seed: Seed-2.0-Mini logoByteDance Seed: Seed-2.0-Mini via OpenRouter

Access ByteDance Seed: Seed-2.0-Mini via OpenRouter

Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding, and is optimized for lightweight tasks where cost and speed take priority.

3 min read
Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview) logoGoogle: Nano Banana 2 (Gemini 3.1 Flash Image Preview) via OpenRouter

Access Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview) via OpenRouter

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced contextual understanding with fast, cost-efficient inference, making complex image generation and iterative edits significantly more accessible. Aspect ratios can be controlled with the [image_config API Parameter](https://openrouter.ai/docs/features/multimodal/image-generation#image-aspect-ratio-configuration)

3 min read
Qwen: Qwen3.5-35B-A3B logoQwen: Qwen3.5-35B-A3B via OpenRouter

Access Qwen: Qwen3.5-35B-A3B via OpenRouter

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall performance is comparable to that of the Qwen3.5-27B.

3 min read
Qwen: Qwen3.5-27B logoQwen: Qwen3.5-27B via OpenRouter

Access Qwen: Qwen3.5-27B via OpenRouter

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.

3 min read
Qwen: Qwen3.5-122B-A10B logoQwen: Qwen3.5-122B-A10B via OpenRouter

Access Qwen: Qwen3.5-122B-A10B via OpenRouter

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.

3 min read
Qwen: Qwen3.5-Flash logoQwen: Qwen3.5-Flash via OpenRouter

Access Qwen: Qwen3.5-Flash via OpenRouter

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.

3 min read
LiquidAI: LFM2-24B-A2B logoLiquidAI: LFM2-24B-A2B via OpenRouter

Access LiquidAI: LFM2-24B-A2B via OpenRouter

LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B parameter Mixture-of-Experts model with only 2B active parameters per token, it delivers high-quality generation while maintaining low inference costs. The model fits within 32 GB of RAM, making it practical to run on consumer laptops and desktops without sacrificing capability.

3 min read
Google: Gemini 3.1 Pro Preview Custom Tools logoGoogle: Gemini 3.1 Pro Preview Custom Tools via OpenRouter

Access Google: Gemini 3.1 Pro Preview Custom Tools via OpenRouter

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party or user-defined functions are available. This specialized preview endpoint significantly increases function calling reliability and ensures the model selects the most appropriate tool in coding agents and complex, multi-tool workflows. It retains the core strengths of Gemini 3.1 Pro, including multimodal reasoning across text, image, video, audio, and code, a 1M-token context window, and strong software engineering performance.

3 min read
OpenAI: GPT-5.3-Codex logoOpenAI: GPT-5.3-Codex via OpenRouter

Access OpenAI: GPT-5.3-Codex via OpenRouter

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results on SWE-Bench Pro and strong performance on Terminal-Bench 2.0 and OSWorld-Verified, reflecting improved multi-language coding, terminal proficiency, and real-world computer-use skills. The model is optimized for long-running, tool-using workflows and supports interactive steering during execution, making it suitable for complex development tasks, debugging, deployment, and iterative product work. Beyond coding, GPT-5.3-Codex performs strongly on structured knowledge-work benchmarks such as GDPval, supporting tasks like document drafting, spreadsheet analysis, slide creation, and operational research across domains. It is trained with enhanced cybersecurity awareness, including vulnerability identification capabilities, and deployed with additional safeguards for high-risk use cases. Compared to prior Codex models, it is more token-efficient and approximately 25% faster, targeting professional end-to-end workflows that span reasoning, execution, and computer interaction.

3 min read
AionLabs: Aion-2.0 logoAionLabs: Aion-2.0 via OpenRouter

Access AionLabs: Aion-2.0 via OpenRouter

Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging. It also handles mature and darker themes with more nuance and depth.

3 min read
Google: Gemini 3.1 Pro Preview logoGoogle: Gemini 3.1 Pro Preview via OpenRouter

Access Google: Gemini 3.1 Pro Preview via OpenRouter

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation of the Gemini 3 series, it combines high-precision reasoning across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning. The 3.1 update introduces measurable gains in SWE benchmarks and real-world coding environments, along with stronger autonomous task execution in structured domains such as finance and spreadsheet-based workflows. Designed for advanced development and agentic systems, Gemini 3.1 Pro Preview improves long-horizon stability and tool orchestration while increasing token efficiency. It introduces a new medium thinking level to better balance cost, speed, and performance. The model excels in agentic coding, structured planning, multimodal analysis, and workflow automation, making it well-suited for autonomous agents, financial modeling, spreadsheet automation, and high-context enterprise tasks.

3 min read
Anthropic: Claude Sonnet 4.6 logoAnthropic: Claude Sonnet 4.6 via OpenRouter

Access Anthropic: Claude Sonnet 4.6 via OpenRouter

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with memory, polished document creation, and confident computer use for web QA and workflow automation.

3 min read
Qwen: Qwen3.5 Plus 2026-02-15 logoQwen: Qwen3.5 Plus 2026-02-15 via OpenRouter

Access Qwen: Qwen3.5 Plus 2026-02-15 via OpenRouter

The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of task evaluations, the 3.5 series consistently demonstrates performance on par with state-of-the-art leading models. Compared to the 3 series, these models show a leap forward in both pure-text and multimodal capabilities.

3 min read
Qwen: Qwen3.5 397B A17B logoQwen: Qwen3.5 397B A17B via OpenRouter

Access Qwen: Qwen3.5 397B A17B via OpenRouter

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers state-of-the-art performance comparable to leading-edge models across a wide range of tasks, including language understanding, logical reasoning, code generation, agent-based tasks, image understanding, video understanding, and graphical user interface (GUI) interactions. With its robust code-generation and agent capabilities, the model exhibits strong generalization across diverse agent.

3 min read
MiniMax: MiniMax M2.5 logoMiniMax: MiniMax M2.5 via OpenRouter

Access MiniMax: MiniMax M2.5 via OpenRouter

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1 to extend into general office work, reaching fluency in generating and operating Word, Excel, and Powerpoint files, context switching between diverse software environments, and working across different agent and human teams. Scoring 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp, M2.5 is also more token efficient than previous generations, having been trained to optimize its actions and output through planning.

3 min read
Z.ai: GLM 5 logoZ.ai: GLM 5 via OpenRouter

Access Z.ai: GLM 5 via OpenRouter

GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading closed-source models. With advanced agentic planning, deep backend reasoning, and iterative self-correction, GLM-5 moves beyond code generation to full-system construction and autonomous execution.

3 min read
Qwen: Qwen3 Max Thinking logoQwen: Qwen3 Max Thinking via OpenRouter

Access Qwen: Qwen3 Max Thinking via OpenRouter

Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it delivers major gains in factual accuracy, complex reasoning, instruction following, alignment with human preferences, and agentic behavior.

3 min read
Anthropic: Claude Opus 4.6 logoAnthropic: Claude Opus 4.6 via OpenRouter

Access Anthropic: Claude Opus 4.6 via OpenRouter

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective for large codebases, complex refactors, and multi-step debugging that unfolds over time. The model shows deeper contextual understanding, stronger problem decomposition, and greater reliability on hard engineering tasks than prior generations. Beyond coding, Opus 4.6 excels at sustained knowledge work. It produces near-production-ready documents, plans, and analyses in a single pass, and maintains coherence across very long outputs and extended sessions. This makes it a strong default for tasks that require persistence, judgment, and follow-through, such as technical design, migration planning, and end-to-end project execution. For users upgrading from earlier Opus versions, see our [official migration guide here](https://openrouter.ai/docs/guides/guides/model-migrations/claude-4-6-opus)

3 min read
Qwen: Qwen3 Coder Next logoQwen: Qwen3 Coder Next via OpenRouter

Access Qwen: Qwen3 Coder Next via OpenRouter

Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per token, delivering performance comparable to models with 10 to 20x higher active compute, which makes it well suited for cost-sensitive, always-on agent deployment. The model is trained with a strong agentic focus and performs reliably on long-horizon coding tasks, complex tool usage, and recovery from execution failures. With a native 256k context window, it integrates cleanly into real-world CLI and IDE environments and adapts well to common agent scaffolds used by modern coding tools. The model operates exclusively in non-thinking mode and does not emit <think> blocks, simplifying integration for production coding agents.

3 min read
Free Models Router logoFree Models Router via OpenRouter

Access Free Models Router via OpenRouter

The simplest way to get free inference. openrouter/free is a router that selects free models at random from the models available on OpenRouter. The router smartly filters for models that support features needed for your request such as image understanding, tool calling, structured outputs and more.

3 min read
StepFun: Step 3.5 Flash (free) logoStepFun: Step 3.5 Flash (free) via OpenRouter

Access StepFun: Step 3.5 Flash (free) via OpenRouter

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. It is a reasoning model that is incredibly speed efficient even at long contexts.

3 min read
StepFun: Step 3.5 Flash logoStepFun: Step 3.5 Flash via OpenRouter

Access StepFun: Step 3.5 Flash via OpenRouter

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. It is a reasoning model that is incredibly speed efficient even at long contexts.

3 min read
Arcee AI: Trinity Large Preview (free) logoArcee AI: Trinity Large Preview (free) via OpenRouter

Access Arcee AI: Trinity Large Preview (free) via OpenRouter

Trinity-Large-Preview is a frontier-scale open-weight language model from Arcee, built as a 400B-parameter sparse Mixture-of-Experts with 13B active parameters per token using 4-of-256 expert routing. It excels in creative writing, storytelling, role-play, chat scenarios, and real-time voice assistance, better than your average reasoning model usually can. But we’re also introducing some of our newer agentic performance. It was trained to navigate well in agent harnesses like OpenCode, Cline, and Kilo Code, and to handle complex toolchains and long, constraint-filled prompts. The architecture natively supports very long context windows up to 512k tokens, with the Preview API currently served at 128k context using 8-bit quantization for practical deployment. Trinity-Large-Preview reflects Arcee’s efficiency-first design philosophy, offering a production-oriented frontier model with open weights and permissive licensing suitable for real-world applications and experimentation.

3 min read
MoonshotAI: Kimi K2.5 logoMoonshotAI: Kimi K2.5 via OpenRouter

Access MoonshotAI: Kimi K2.5 via OpenRouter

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens, it delivers strong performance in general reasoning, visual coding, and agentic tool-calling.

3 min read
Upstage: Solar Pro 3 logoUpstage: Solar Pro 3 via OpenRouter

Access Upstage: Solar Pro 3 via OpenRouter

Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B total parameters and 12B active parameters per forward pass, it delivers exceptional performance while maintaining computational efficiency. Optimized for Korean with English and Japanese support.

3 min read
MiniMax: MiniMax M2-her logoMiniMax: MiniMax M2-her via OpenRouter

Access MiniMax: MiniMax M2-her via OpenRouter

MiniMax M2-her is a dialogue-first large language model built for immersive roleplay, character-driven chat, and expressive multi-turn conversations. Designed to stay consistent in tone and personality, it supports rich message roles (user_system, group, sample_message_user, sample_message_ai) and can learn from example dialogue to better match the style and pacing of your scenario, making it a strong choice for storytelling, companions, and conversational experiences where natural flow and vivid interaction matter most.

3 min read
Writer: Palmyra X5 logoWriter: Palmyra X5 via OpenRouter

Access Writer: Palmyra X5 via OpenRouter

Palmyra X5 is Writer's most advanced model, purpose-built for building and scaling AI agents across the enterprise. It delivers industry-leading speed and efficiency on context windows up to 1 million tokens, powered by a novel transformer architecture and hybrid attention mechanisms. This enables faster inference and expanded memory for processing large volumes of enterprise data, critical for scaling AI agents.

3 min read
LiquidAI: LFM2.5-1.2B-Thinking (free) logoLiquidAI: LFM2.5-1.2B-Thinking (free) via OpenRouter

Access LiquidAI: LFM2.5-1.2B-Thinking (free) via OpenRouter

LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG—while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is designed to provide higher-quality “thinking” responses in a small 1.2B model.

3 min read
LiquidAI: LFM2.5-1.2B-Instruct (free) logoLiquidAI: LFM2.5-1.2B-Instruct (free) via OpenRouter

Access LiquidAI: LFM2.5-1.2B-Instruct (free) via OpenRouter

LFM2.5-1.2B-Instruct is a compact, high-performance instruction-tuned model built for fast on-device AI. It delivers strong chat quality in a 1.2B parameter footprint, with efficient edge inference and broad runtime support.

3 min read
OpenAI: GPT Audio logoOpenAI: GPT Audio via OpenRouter

Access OpenAI: GPT Audio via OpenRouter

The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Audio is priced at $32 per million input tokens and $64 per million output tokens.

3 min read
OpenAI: GPT Audio Mini logoOpenAI: GPT Audio Mini via OpenRouter

Access OpenAI: GPT Audio Mini via OpenRouter

A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million tokens and output is priced at $2.40 per million tokens.

3 min read
Z.ai: GLM 4.7 Flash logoZ.ai: GLM 4.7 Flash via OpenRouter

Access Z.ai: GLM 4.7 Flash via OpenRouter

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning, and tool collaboration, and has achieved leading performance among open-source models of the same size on several current public benchmark leaderboards.

3 min read
OpenAI: GPT-5.2-Codex logoOpenAI: GPT-5.2-Codex via OpenRouter

Access OpenAI: GPT-5.2-Codex via OpenRouter

GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks. The model supports building projects from scratch, feature development, debugging, large-scale refactoring, and code review. Compared to GPT-5.1-Codex, 5.2-Codex is more steerable, adheres closely to developer instructions, and produces cleaner, higher-quality code outputs. Reasoning effort can be adjusted with the `reasoning.effort` parameter. Read the [docs here](https://openrouter.ai/docs/use-cases/reasoning-tokens#reasoning-effort-level) Codex integrates into developer environments including the CLI, IDE extensions, GitHub, and cloud tasks. It adapts reasoning effort dynamically—providing fast responses for small tasks while sustaining extended multi-hour runs for large projects. The model is trained to perform structured code reviews, catching critical flaws by reasoning over dependencies and validating behavior against tests. It also supports multimodal inputs such as images or screenshots for UI development and integrates tool use for search, dependency installation, and environment setup. Codex is intended specifically for agentic coding applications.

3 min read
AllenAI: Molmo2 8B logoAllenAI: Molmo2 8B via OpenRouter

Access AllenAI: Molmo2 8B via OpenRouter

Molmo2-8B is an open vision-language model developed by the Allen Institute for AI (Ai2) as part of the Molmo2 family, supporting image, video, and multi-image understanding and grounding. It is based on Qwen3-8B and uses SigLIP 2 as its vision backbone, outperforming other open-weight, open-data models on short videos, counting, and captioning, while remaining competitive on long-video tasks.

3 min read
AllenAI: Olmo 3.1 32B Instruct logoAllenAI: Olmo 3.1 32B Instruct via OpenRouter

Access AllenAI: Olmo 3.1 32B Instruct via OpenRouter

Olmo 3.1 32B Instruct is a large-scale, 32-billion-parameter instruction-tuned language model engineered for high-performance conversational AI, multi-turn dialogue, and practical instruction following. As part of the Olmo 3.1 family, this variant emphasizes responsiveness to complex user directions and robust chat interactions while retaining strong capabilities on reasoning and coding benchmarks. Developed by Ai2 under the Apache 2.0 license, Olmo 3.1 32B Instruct reflects the Olmo initiative’s commitment to openness and transparency.

3 min read
ByteDance Seed: Seed 1.6 Flash logoByteDance Seed: Seed 1.6 Flash via OpenRouter

Access ByteDance Seed: Seed 1.6 Flash via OpenRouter

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of up to 16k tokens.

3 min read
ByteDance Seed: Seed 1.6 logoByteDance Seed: Seed 1.6 via OpenRouter

Access ByteDance Seed: Seed 1.6 via OpenRouter

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window.

3 min read
MiniMax: MiniMax M2.1 logoMiniMax: MiniMax M2.1 via OpenRouter

Access MiniMax: MiniMax M2.1 via OpenRouter

MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world capability while maintaining exceptional latency, scalability, and cost efficiency. Compared to its predecessor, M2.1 delivers cleaner, more concise outputs and faster perceived response times. It shows leading multilingual coding performance across major systems and application languages, achieving 49.4% on Multi-SWE-Bench and 72.5% on SWE-Bench Multilingual, and serves as a versatile agent “brain” for IDEs, coding tools, and general-purpose assistance. To avoid degrading this model's performance, MiniMax highly recommends preserving reasoning between turns. Learn more about using reasoning_details to pass back reasoning in our [docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks).

3 min read
Z.ai: GLM 4.7 logoZ.ai: GLM 4.7 via OpenRouter

Access Z.ai: GLM 4.7 via OpenRouter

GLM-4.7 is Z.ai’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while delivering more natural conversational experiences and superior front-end aesthetics.

3 min read