Best Yi-VL-34B Alternatives in 2025
-

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
-

C4AI Aya Vision 8B: Open-source multilingual vision AI for image understanding. OCR, captioning, reasoning in 23 languages.
-

GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI.
-

Transform businesses with YiVal, an enterprise-grade generative AI platform. Develop high-performing apps with GPT-4 at a lower cost. Explore endless possibilities now!
-

GLM-4.5V: Empower your AI with advanced vision. Generate web code from screenshots, automate GUIs, & analyze documents & video with deep reasoning.
-

Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters.
-

CogVLM and CogAgent are powerful open-source visual language models that excel in image understanding and multi-turn dialogue.
-

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
-

DeepSeek-VL2, a vision - language model by DeepSeek-AI, processes high - res images, offers fast responses with MLA, and excels in diverse visual tasks like VQA and OCR. Ideal for researchers, developers, and BI analysts.
-

Unlock the power of YaLM 100B, a GPT-like neural network that generates and processes text with 100 billion parameters. Free for developers and researchers worldwide.
-

Step-1V: A highly capable multimodal model developed by Jieyue Xingchen, showcasing exceptional performance in image understanding, multi-turn instruction following, mathematical ability, logical reasoning, and text creation.
-

BAGEL: Open-source multimodal AI from ByteDance-Seed. Understands, generates, edits images & text. Powerful, flexible, comparable to GPT-4o. Build advanced AI apps.
-

With a total of 8B parameters, the model surpasses proprietary models such as GPT-4V-1106, Gemini Pro, Qwen-VL-Max and Claude 3 in overall performance.
-

Mini-Gemini supports a series of dense and MoE Large Language Models (LLMs) from 2B to 34B with image understanding, reasoning, and generation simultaneously. We build this repo based on LLaVA.
-

The New Paradigm of Development Based on MaaS , Unleashing AI with our universal model service
-

XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.
-

CM3leon: A versatile multimodal generative model for text and images. Enhance creativity and create realistic visuals for gaming, social media, and e-commerce.
-

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
-

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
-

Qwen2.5 series language models offer enhanced capabilities with larger datasets, more knowledge, better coding and math skills, and closer alignment to human preferences. Open-source and available via API.
-

Discover EXAONE 3.5 by LG AI Research. A suite of bilingual (English & Korean) instruction - tuned generative models from 2.4B to 32B parameters. Support long - context up to 32K tokens, with top - notch performance in real - world scenarios.
-

PolyLM, a revolutionary polyglot LLM, supports 18 languages, excels in tasks, and is open-source. Ideal for devs, researchers, and businesses for multilingual needs.
-

A high-throughput and memory-efficient inference and serving engine for LLMs
-

Voyager is a professional pipeline for enhanced AI art generation. Voyager uses about 6X the resources of standard models like SDXL to produce unique and higher-end outputs.
-

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
-

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
-

Molmo AI is an open-source multimodal artificial intelligence model developed by AI2. It can process and generate various types of data, including text and images.
-

VibeVoice: Free online AI text-to-speech. Instantly create realistic, multi-speaker audio conversations up to 90 mins. No downloads or signup!
-

Yuan2.0-M32 is a Mixture-of-Experts (MoE) language model with 32 experts, of which 2 are active.
-

MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech).
