Guide

Learn more about how to use AI models with TypingMind

OpenAI: GPT-5 Image Mini logoOpenAI: GPT-5 Image Mini via OpenRouter

Access OpenAI: GPT-5 Image Mini via OpenRouter

GPT-5 Image Mini combines OpenAI's GPT-5 Mini with state-of-the-art image generation capabilities. It provides the same instruction-following and safety-tuning benefits as GPT-5, but with reduced latency and cost. This natively multimodal model incorporates GPT Image 1 Mini's superior instruction following, text rendering, and detailed image editing, making it ideal for applications that require both efficient text processing and high-quality visual creation at scale.

3 min read
Anthropic: Claude Haiku 4.5 logoAnthropic: Claude Haiku 4.5 via OpenRouter

Access Anthropic: Claude Haiku 4.5 via OpenRouter

Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance across reasoning, coding, and computer-use tasks, Haiku 4.5 brings frontier-level capability to real-time and high-volume applications. It introduces extended thinking to the Haiku line; enabling controllable reasoning depth, summarized or interleaved thought output, and tool-assisted workflows with full support for coding, bash, web search, and computer-use tools. Scoring >73% on SWE-bench Verified, Haiku 4.5 ranks among the world’s best coding models while maintaining exceptional responsiveness for sub-agents, parallelized execution, and scaled deployment.

3 min read
Claude Haiku 4.5 logoClaude Haiku 4.5 from anthropic

Connect and use Claude Haiku 4.5 from anthropic with API Key

Claude Haiku 4.5 from Anthropic - text, image input, 200,000 token context

5 min read
Claude Haiku 4.5 logoClaude Haiku 4.5 from anthropic

Connect and use Claude Haiku 4.5 from anthropic with API Key

Claude Haiku 4.5 from Anthropic - text, image input, 200,000 token context

5 min read
Qwen: Qwen3 VL 8B Thinking logoQwen: Qwen3 VL 8B Thinking via OpenRouter

Access Qwen: Qwen3 VL 8B Thinking via OpenRouter

Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and long-context processing (native 256K, expandable to 1M tokens) for tasks such as scientific visual analysis, causal inference, and mathematical reasoning over image or video inputs. Compared to the Instruct edition, the Thinking version introduces deeper visual-language fusion and deliberate reasoning pathways that improve performance on long-chain logic tasks, STEM problem-solving, and multi-step video understanding. It achieves stronger temporal grounding via Interleaved-MRoPE and timestamp-aware embeddings, while maintaining robust OCR, multilingual comprehension, and text generation on par with large text-only LLMs.

3 min read
Qwen: Qwen3 VL 8B Instruct logoQwen: Qwen3 VL 8B Instruct via OpenRouter

Access Qwen: Qwen3 VL 8B Instruct via OpenRouter

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon temporal reasoning, DeepStack for fine-grained visual-text alignment, and text-timestamp alignment for precise event localization. The model supports a native 256K-token context window, extensible to 1M tokens, and handles both static and dynamic media inputs for tasks like document parsing, visual question answering, spatial reasoning, and GUI control. It achieves text understanding comparable to leading LLMs while expanding OCR coverage to 32 languages and enhancing robustness under varied visual conditions.

3 min read
OpenAI: GPT-5 Image logoOpenAI: GPT-5 Image via OpenRouter

Access OpenAI: GPT-5 Image via OpenRouter

[GPT-5](https://openrouter.ai/openai/gpt-5) Image combines OpenAI's most advanced language model with state-of-the-art image generation capabilities. It offers major improvements in reasoning, code quality, and user experience while incorporating GPT Image 1's superior instruction following, text rendering, and detailed image editing.

3 min read
inclusionAI: Ling-1T logoinclusionAI: Ling-1T via OpenRouter

Access inclusionAI: Ling-1T via OpenRouter

Ling-1T is a trillion-parameter open-weight large language model developed by inclusionAI and released under the MIT license. It represents the first flagship non-thinking model in the Ling 2.0 series, built around a sparse-activation architecture with roughly 50 billion active parameters per token. The model supports up to 128 K tokens of context and emphasizes efficient reasoning through an “Evolutionary Chain-of-Thought (Evo-CoT)” training strategy. Pre-trained on more than 20 trillion reasoning-dense tokens, Ling-1T achieves strong results across code generation, mathematics, and logical reasoning benchmarks while maintaining high inference efficiency. It employs FP8 mixed-precision training, MoE routing with QK normalization, and MTP layers for compositional reasoning stability. The model also introduces LPO (Linguistics-unit Policy Optimization) for post-training alignment, enhancing sentence-level semantic control. Ling-1T can perform complex text generation, multilingual reasoning, and front-end code synthesis with a focus on both functionality and aesthetics.

3 min read
OpenAI: o3 Deep Research logoOpenAI: o3 Deep Research via OpenRouter

Access OpenAI: o3 Deep Research via OpenRouter

o3-deep-research is OpenAI's advanced model for deep research, designed to tackle complex, multi-step research tasks. Note: This model always uses the 'web_search' tool which adds additional cost.

3 min read
OpenAI: o4 Mini Deep Research logoOpenAI: o4 Mini Deep Research via OpenRouter

Access OpenAI: o4 Mini Deep Research via OpenRouter

o4-mini-deep-research is OpenAI's faster, more affordable deep research model—ideal for tackling complex, multi-step research tasks. Note: This model always uses the 'web_search' tool which adds additional cost.

3 min read
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 logoNVIDIA: Llama 3.3 Nemotron Super 49B V1.5 via OpenRouter

Access NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 via OpenRouter

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality. In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.

3 min read
Baidu: ERNIE 4.5 21B A3B Thinking logoBaidu: ERNIE 4.5 21B A3B Thinking via OpenRouter

Access Baidu: ERNIE 4.5 21B A3B Thinking via OpenRouter

ERNIE-4.5-21B-A3B-Thinking is Baidu's upgraded lightweight MoE model, refined to boost reasoning depth and quality for top-tier performance in logical puzzles, math, science, coding, text generation, and expert-level academic benchmarks.

3 min read
Synthetic logoSynthetic

How to use Synthetic API Key for AI chat

Synthetic.new is a privacy-focused AI platform offering private access to multiple open-source LLMs through simple flat-rate subscriptions starting at $20/month for 125 requests per 5 hours or $60/month for 1250 requests. The platform provides access to 19+ always-on models including Llama 3 variants with up to 128K token context windows, specialized coding models, and task-specific LoRA adapters, with guaranteed privacy through no training on user data and automatic deletion within 14 days. Key features include OpenAI-compatible API for integration with tools like Roo, Cline, and Octofriend, web-based chat interface, on-demand model launching from Hugging Face repositories on cloud GPUs with separate per-minute billing, predictable pricing without per-token charges, and support for large context coding tasks. The platform prioritizes developer workflows and code generation with strong privacy guarantees and cost-effective access to powerful open-source models.

5 min read
Chutes logoChutes

How to use Chutes API Key for AI chat

Chutes.ai is a decentralized serverless AI compute platform built on Bittensor Subnet 64, enabling developers to deploy, run, and scale AI models without managing infrastructure. The platform processes nearly 160 billion tokens daily serving over 400,000 users with up to 90% lower costs than traditional providers through a distributed network of GPU miners compensated with TAO tokens. Key features include always-hot serverless compute with instant inference, model-agnostic support for LLMs, image, and audio models plus custom code, fully abstracted infrastructure handling provisioning and scaling automatically, standardized API access with OpenRouter integration, and open pay-per-use pricing. The roadmap includes long-running jobs, fine-tuning capabilities, AI agents, and Trusted Execution Environments for enhanced privacy, with a startup accelerator offering up to $20,000 in credits.

5 min read
MoonshotAI logoMoonshotAI

How to use MoonshotAI API Key for AI chat

Moonshot AI is a Beijing-based AI platform offering the Kimi large language model API, with the flagship Kimi K2 being a state-of-the-art Mixture-of-Experts (MoE) model featuring 1 trillion total parameters and 32 billion activated parameters per query. Key features include an exceptional 256,000-token context window (the longest available for processing extended documents and conversations), strong coding and STEM performance competitive with GPT-4.1, native tool calling and function integration for agentic workflows, and stable large-scale training using the novel MuonClip optimizer on 15.5 trillion tokens. The platform provides OpenAI-compatible API access through the Kimi Open Platform with variants including Kimi-K2-Base (for fine-tuning) and Kimi-K2-Instruct (optimized for chat and autonomous tasks), supporting advanced multi-turn interactions, reasoning, research, and software development applications.

5 min read
Google: Gemini 2.5 Flash Image (Nano Banana) logoGoogle: Gemini 2.5 Flash Image (Nano Banana) via OpenRouter

Access Google: Gemini 2.5 Flash Image (Nano Banana) via OpenRouter

Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Aspect ratios can be controlled with the [image_config API Parameter](https://openrouter.ai/docs/features/multimodal/image-generation#image-aspect-ratio-configuration)

3 min read
Qwen: Qwen3 VL 30B A3B Thinking logoQwen: Qwen3 VL 30B A3B Thinking via OpenRouter

Access Qwen: Qwen3 VL 30B A3B Thinking via OpenRouter

Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.

3 min read
Qwen: Qwen3 VL 30B A3B Instruct logoQwen: Qwen3 VL 30B A3B Instruct via OpenRouter

Access Qwen: Qwen3 VL 30B A3B Instruct via OpenRouter

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.

3 min read
OpenAI: GPT-5 Pro logoOpenAI: GPT-5 Pro via OpenRouter

Access OpenAI: GPT-5 Pro via OpenRouter

GPT-5 Pro is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like "think hard about this." Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.

3 min read
Groq logoGroq

How to use Groq API Key for AI chat

Groq is the world's fastest AI inference platform powered by the proprietary LPU™ (Language Processing Unit) Inference Engine, purpose-built hardware designed specifically for running large language models at exceptional speed and low cost. The LPU architecture delivers 300-500 tokens per second with up to 18x faster processing than traditional GPUs through tensor streaming technology optimized for sequential computation and low-latency inference. GroqCloud provides API access to leading open-source models (Llama, Mixtral, Gemma) with Tokens-as-a-Service pricing, enabling developers to build production-ready AI applications with ultra-low latency and high throughput. Key features include deterministic performance, reduced memory bottlenecks, energy-efficient processing, real-time inference capabilities, and scalable cloud deployment with straightforward API integration.

5 min read
GLM 4.6 Turbo logoGLM 4.6 Turbo from chutes

Connect and use GLM 4.6 Turbo from chutes with API Key

GLM 4.6 Turbo from Chutes - text input, 204,800 token context

5 min read
Z.AI: GLM 4.6 logoZ.AI: GLM 4.6 via OpenRouter

Access Z.AI: GLM 4.6 via OpenRouter

Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.

3 min read
GLM 4.6 FP8 logoGLM 4.6 FP8 from chutes

Connect and use GLM 4.6 FP8 from chutes with API Key

GLM 4.6 FP8 from Chutes - text input, 204,800 token context

5 min read
GLM 4.6 logoGLM 4.6 from synthetic

Connect and use GLM 4.6 from synthetic with API Key

GLM 4.6 from Synthetic - text input, 200,000 token context

5 min read
Anthropic: Claude Sonnet 4.5 logoAnthropic: Claude Sonnet 4.5 via OpenRouter

Access Anthropic: Claude Sonnet 4.5 via OpenRouter

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with improvements across system design, code security, and specification adherence. The model is designed for extended autonomous operation, maintaining task continuity across sessions and providing fact-based progress tracking. Sonnet 4.5 also introduces stronger agentic capabilities, including improved tool orchestration, speculative parallel execution, and more efficient context and memory management. With enhanced context tracking and awareness of token usage across tool calls, it is particularly well-suited for multi-context and long-running workflows. Use cases span software engineering, cybersecurity, financial analysis, research agents, and other domains requiring sustained reasoning and tool use.

3 min read
DeepSeek: DeepSeek V3.2 Exp logoDeepSeek: DeepSeek V3.2 Exp via OpenRouter

Access DeepSeek: DeepSeek V3.2 Exp via OpenRouter

DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios while maintaining output quality. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config) The model was trained under conditions aligned with V3.1-Terminus to enable direct comparison. Benchmarking shows performance roughly on par with V3.1 across reasoning, coding, and agentic tool-use tasks, with minor tradeoffs and gains depending on the domain. This release focuses on validating architectural optimizations for extended context lengths rather than advancing raw task accuracy, making it primarily a research-oriented model for exploring efficient transformer designs.

3 min read
Claude Sonnet 4.5 logoClaude Sonnet 4.5 from anthropic

Connect and use Claude Sonnet 4.5 from anthropic with API Key

Claude Sonnet 4.5 from Anthropic - text, image input, 200,000 token context

5 min read
Claude Sonnet 4.5 logoClaude Sonnet 4.5 from anthropic

Connect and use Claude Sonnet 4.5 from anthropic with API Key

Claude Sonnet 4.5 from Anthropic - text, image input, 200,000 token context

5 min read
DeepSeek V3.2 Exp logoDeepSeek V3.2 Exp from chutes

Connect and use DeepSeek V3.2 Exp from chutes with API Key

DeepSeek V3.2 Exp from Chutes - text input, 128,000 token context

5 min read
OpenAI logoOpenAI

How to use OpenAI API Key for AI chat

ChatGPT is OpenAI's conversational AI assistant now powered by the latest GPT-5 family of models released in 2025, featuring unified multimodal capabilities across text, images, and audio with advanced reasoning. The current lineup includes GPT-5 (flagship general-purpose model with real-time routing, faster responses, reduced hallucinations, and customizable personalities), o3 (advanced reasoning model for complex math, science, and programming with 88.9% AIME accuracy), o4-mini (cost-efficient reasoning at scale with 92.7% AIME accuracy), and GPT-5-Codex (specialized for dynamic coding tasks). Key features include autonomous tool use (web browsing, code execution, file operations), self-fact checking, multimodal unification, and variants from Nano to Pro for different use cases.

5 min read
Anthropic logoAnthropic

How to use Anthropic API Key for AI chat

Claude is a next-generation AI assistant developed by Anthropic, featuring a family of state-of-the-art large language models trained to be safe, accurate, and helpful. The latest models include Claude Sonnet 4.5 (the world's best coding model with advanced agentic capabilities) and Claude Opus 4.1, both offering hybrid reasoning modes, 200K token context windows, and sophisticated vision capabilities. Key features include tool use for external API integration, code execution environments, multi-step workflow automation, files API, persistent memory management, and enterprise-grade security with deployment on AWS Bedrock and Google Cloud Vertex AI. Claude excels at complex reasoning, code generation, visual data interpretation, customer support, and building autonomous AI agents with natural, human-like conversations.

5 min read
Google logoGoogle

How to use Google API Key for AI chat

Google Gemini is Google DeepMind's advanced multimodal AI platform designed for the "agentic era," with the latest Gemini 2.5 family of models released in 2025 featuring breakthrough thinking and reasoning capabilities. The current lineup includes Gemini 2.5 Deep Think (most advanced reasoning model using parallel multi-agent reasoning for complex math and coding), Gemini 2.5 Pro (flagship thinking model with enhanced performance and 1M token context window), Gemini 2.5 Flash (fastest thinking model balancing speed and intelligence), and Gemini 2.5 Flash-Lite (optimized for cost-effective deployment). Key features include native tool use, multimodal understanding (text, images, audio, video), code execution integration, Google Search connectivity, reinforcement learning-enhanced reasoning, and the ability to think through responses before answering for improved accuracy.

5 min read
OpenRouter logoOpenRouter

How to use OpenRouter API Key for AI chat

OpenRouter is a unified API gateway that provides access to 400+ AI models from 50+ providers through a single OpenAI-compatible endpoint, eliminating vendor lock-in and simplifying multi-model integration. Key features include automatic fallbacks (seamlessly switching to backup models if primary fails), smart model routing with :floor (cheapest) and :nitro (fastest) options, standardized API normalization across all providers, and multi-modal support for text and images. OpenRouter uses pass-through pricing at exact provider rates plus a 5% platform fee (5.5% on credits), offers 13+ free models with daily limits, consolidated analytics dashboards, and enterprise-grade privacy controls with no code changes needed to switch between models like GPT-4, Claude, Gemini, Llama, and DeepSeek.

5 min read
Mistral logoMistral

How to use Mistral API Key for AI chat

Mistral AI offers high-performance, cost-effective language models with a focus on efficiency and European AI development. Their models are known for strong reasoning and coding capabilities.

5 min read
DeepSeek logoDeepSeek

How to use DeepSeek API Key for AI chat

DeepSeek AI is a Chinese open-source AI company offering advanced large language models with the latest DeepSeek-V3.1 (released August 2025) combining both general-purpose and reasoning capabilities in a hybrid architecture. Key models include DeepSeek-V3.1 (flagship with 128K token context window, 43% improved multi-step reasoning, and dual thinking/non-thinking modes), DeepSeek-R1 (specialized reasoning model with chain-of-thought processing matching OpenAI o1 performance), and DeepSeek-VL2 (state-of-the-art vision-language model). Features include hybrid Mixture-of-Experts (MoE) architecture, extended context handling up to 1M tokens, enhanced tool calling for agentic workflows, 20-50% faster inference than previous versions, JSON output support, and fully open-source with MIT licensing. Access the platform at [deepseek.ai](https://deepseek.ai) with API documentation at [api-docs.deepseek.com](https://api-docs.deepseek.com).

5 min read
xAI logoxAI

How to use xAI API Key for AI chat

Grok is xAI's flagship family of large language models designed to deliver truthful and insightful AI responses. The xAI API provides developers with access to powerful models including Grok-4 and Grok-2-1212 (with a 131K token context window), supporting multimodal capabilities like vision processing and image generation via Flux.1. Key features include function calling for API automation, compatibility with OpenAI/Anthropic SDKs, structured outputs, and enterprise-grade security with GDPR/HIPAA compliance. The platform offers Python and JavaScript SDKs for easy integration into applications ranging from conversational AI to complex workflow automation.

5 min read
Perplexity logoPerplexity

How to use Perplexity API Key for AI chat

Perplexity AI is an AI-powered search engine that combines large language models with real-time web search to deliver accurate, cited answers with up-to-date information. Key features include Pro Search (access to GPT-5, Claude 4, Gemini 2.5 Pro, Grok 4, and proprietary Sonar models), Deep Research (autonomous multi-step research synthesizing hundreds of sources into comprehensive reports with PDF export), Comet Browser (free AI-powered Chromium browser with conversational web control and sidecar assistant for summaries and automation), and API access for enterprise integration. Pro subscribers enjoy unlimited Deep Research queries, advanced model selection, domain-specific searches, file uploads, and image generation/editing tools, while free users get limited access to core features with real-time citations. Access at [perplexity.ai](https://www.perplexity.ai) with API documentation available for Pro users.

5 min read
Jina AI logoJina AI

How to use Jina AI API Key for AI chat

Jina AI is a specialized AI platform providing best-in-class search infrastructure through embeddings, rerankers, web readers, and small language models for multilingual and multimodal data. Key offerings include jina-embeddings-v3 (570M parameter model supporting 89 languages with 8K token context and task-specific LoRA adapters for retrieval, clustering, and classification), jina-embeddings-v4 (3.8B parameter multimodal model unifying text and images), Reader API (converts any URL to LLM-ready Markdown using ReaderLM-v2), Reranker API (improves search result accuracy), and DeepSearch (comprehensive search agent combining web search, reading, and reasoning with OpenAI-compatible API). The platform features FlashAttention 2 optimization, 1024-dimensional embeddings, late-chunking for better snippet selection, and integration with popular frameworks, making it ideal for RAG systems and semantic search applications.

5 min read
Fireworks-AI logoFireworks-AI

How to use Fireworks-AI API Key for AI chat

Fireworks.ai is a high-performance generative AI platform that provides the fastest inference for open-source LLMs and multimodal models through a developer-friendly API. The platform features a proprietary FireAttention engine delivering 50% faster speed and 250% higher throughput than standard engines, with support for popular models like LLaMA, Mixtral, DeepSeek, and Falcon. Key capabilities include serverless inference, advanced fine-tuning (LoRA, RLHF), function calling, batch processing, on-demand GPU access (NVIDIA H100/H200, AMD MI300X), and OpenAI-compatible APIs.

5 min read
TheDrummer: Cydonia 24B V4.1 logoTheDrummer: Cydonia 24B V4.1 via OpenRouter

Access TheDrummer: Cydonia 24B V4.1 via OpenRouter

Uncensored and creative writing model based on Mistral Small 3.2 24B with good recall, prompt adherence, and intelligence.

3 min read
Relace: Relace Apply 3 logoRelace: Relace Apply 3 via OpenRouter

Access Relace: Relace Apply 3 via OpenRouter

Relace Apply 3 is a specialized code-patching LLM that merges AI-suggested edits straight into your source files. It can apply updates from GPT-4o, Claude, and others into your files at 7,500 tokens/sec on average. The model requires the prompt to be in the following format: <instruction>{instruction}</instruction> <code>{initial_code}</code> <update>{edit_snippet}</update> Zero Data Retention is enabled for Relace. Learn more about this model in their [documentation](https://docs.relace.ai/api-reference/instant-apply/apply)

3 min read
Google: Gemini 2.5 Flash Preview 09-2025 logoGoogle: Gemini 2.5 Flash Preview 09-2025 via OpenRouter

Access Google: Gemini 2.5 Flash Preview 09-2025 via OpenRouter

Gemini 2.5 Flash Preview September 2025 Checkpoint is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).

3 min read
Google: Gemini 2.5 Flash Lite Preview 09-2025 logoGoogle: Gemini 2.5 Flash Lite Preview 09-2025 via OpenRouter

Access Google: Gemini 2.5 Flash Lite Preview 09-2025 via OpenRouter

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the [Reasoning API parameter](https://openrouter.ai/docs/use-cases/reasoning-tokens) to selectively trade off cost for intelligence.

3 min read
Gemini Flash-Lite Latest logoGemini Flash-Lite Latest from google

Connect and use Gemini Flash-Lite Latest from google with API Key

Gemini Flash-Lite Latest from Google - text, image, audio, video, pdf input, 1,048,576 token context

5 min read
Gemini Flash Latest logoGemini Flash Latest from google

Connect and use Gemini Flash Latest from google with API Key

Gemini Flash Latest from Google - text, image, audio, video, pdf input, 1,048,576 token context

5 min read
Gemini 2.5 Flash Preview 09-25 logoGemini 2.5 Flash Preview 09-25 from google

Connect and use Gemini 2.5 Flash Preview 09-25 from google with API Key

Gemini 2.5 Flash Preview 09-25 from Google - text, image, audio, video, pdf input, 1,048,576 token context

5 min read
Gemini 2.5 Flash Lite Preview 09-25 logoGemini 2.5 Flash Lite Preview 09-25 from google

Connect and use Gemini 2.5 Flash Lite Preview 09-25 from google with API Key

Gemini 2.5 Flash Lite Preview 09-25 from Google - text, image, audio, video, pdf input, 1,048,576 token context

5 min read
Qwen: Qwen3 VL 235B A22B Thinking logoQwen: Qwen3 VL 235B A22B Thinking via OpenRouter

Access Qwen: Qwen3 VL 235B A22B Thinking via OpenRouter

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math. The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows, turning sketches or mockups into code and assisting with UI debugging, while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.

3 min read
Qwen: Qwen3 VL 235B A22B Instruct logoQwen: Qwen3 VL 235B A22B Instruct via OpenRouter

Access Qwen: Qwen3 VL 235B A22B Instruct via OpenRouter

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.

3 min read
Qwen: Qwen3 Max logoQwen: Qwen3 Max via OpenRouter

Access Qwen: Qwen3 Max via OpenRouter

Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode.

3 min read
Qwen: Qwen3 Coder Plus logoQwen: Qwen3 Coder Plus via OpenRouter

Access Qwen: Qwen3 Coder Plus via OpenRouter

Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities.

3 min read