30 Best CogVLM & CogAgent Alternatives in 2025

GLM-4.5V

GLM-4.5V: Empower your AI with advanced vision. Generate web code from screenshots, automate GUIs, & analyze documents & video with deep reasoning.

Large Language Models Free

GLM-4.5V Alternatives

0

glm-4v-9b

GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI.

Large Language Models Free

glm-4v-9b Alternatives

0

Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Large Language Models Free

Qwen2-VL Alternatives

0

Yi Visual Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images.

Large Language Models Free

Yi-VL-34B Alternatives

0

GLM-4

The New Paradigm of Development Based on MaaS , Unleashing AI with our universal model service

Large Language Models Freemium

GLM-4 Alternatives

6

Bagel

BAGEL: Open-source multimodal AI from ByteDance-Seed. Understands, generates, edits images & text. Powerful, flexible, comparable to GPT-4o. Build advanced AI apps.

Large Language Models Free

Bagel Alternatives

1

Aya Vision 8B

C4AI Aya Vision 8B: Open-source multilingual vision AI for image understanding. OCR, captioning, reasoning in 23 languages.

Large Language Models Free

Aya Vision 8B Alternatives

0

Cognee

Enhance your RAG! Cognee's open-source semantic memory builds knowledge graphs, improving LLM accuracy and reducing hallucinations.

Developer Tools Free

Cognee Alternatives

4

CM3leon

CM3leon: A versatile multimodal generative model for text and images. Enhance creativity and create realistic visuals for gaming, social media, and e-commerce.

Large Language Models Free

CM3leon Alternatives

33

Mini-Gemini

Mini-Gemini supports a series of dense and MoE Large Language Models (LLMs) from 2B to 34B with image understanding, reasoning, and generation simultaneously. We build this repo based on LLaVA.

Large Language Models Free

Mini-Gemini Alternatives

0

CogVideoX

CogVideoX models are based on advanced large-scale model technology to meet the needs of commercial-grade applications

Large Language Models Free

CogVideoX Alternatives

0

MiniCPM-Llama3-V 2.5

With a total of 8B parameters, the model surpasses proprietary models such as GPT-4V-1106, Gemini Pro, Qwen-VL-Max and Claude 3 in overall performance.

Large Language Models Free

MiniCPM-Llama3-V 2.5 Alternatives

0

Cambrian-1

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Large Language Models Free

Cambrian-1 Alternatives

6

CogVideoX-5B-I2V

CogVideoX-5B-I2V by Zhipu AI is an open-source image-to-video model. Generate 6-second, 720×480 videos from a picture and text prompts.

Large Language Models Free

CogVideoX-5B-I2V Alternatives

0

ChatGLM-6B

ChatGLM-6B is an open CN&EN model w/ 6.2B paras (optimized for Chinese QA & dialogue for now).

Large Language Models Free

ChatGLM-6B Alternatives

0

InternLM2

Explore InternLM2, an AI tool with open-sourced models! Excel in long-context tasks, reasoning, math, code interpretation, and creative writing. Discover its versatile applications and strong tool utilization capabilities for research, application development, and chat interactions. Upgrade your AI landscape with InternLM2.

Large Language Models Free

InternLM2 Alternatives

1

VoltAgent

VoltAgent: Open-source TypeScript framework for building powerful, custom AI agents. Gain control & flexibility. Integrate LLMs, tools, & data.

Developer Tools Free

VoltAgent Alternatives

2

AutoGen

Build next-gen LLM applications effortlessly with AutoGen. Simplify development, converse with agents and humans, and maximize LLM utility.

Developer Tools Free

AutoGen Alternatives

11

DeepSeek-VL2

DeepSeek-VL2, a vision - language model by DeepSeek-AI, processes high - res images, offers fast responses with MLA, and excels in diverse visual tasks like VQA and OCR. Ideal for researchers, developers, and BI analysts.

Large Language Models Free

DeepSeek-VL2 Alternatives

1

OmniParser V2

OmniParser V2 solves GUI automation issues for LLMs. It tokenizes UI screenshots, has enhanced small element detection, 60% faster inference, and OmniTool integration. Ideal for software testing, web tasks, and customer support.

Large Language Models Free

OmniParser V2 Alternatives

1

LightAgent

LightAgent: The lightweight, open-source AI agent framework. Simplify development of efficient, intelligent agents, saving tokens & boosting performance.

Developer Tools Free

LightAgent Alternatives

0

Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Large Language Models Free

Ovis Alternatives

0

WizardLM-2

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models.

Large Language Models Free

WizardLM-2 Alternatives

6

AutoAgent

AutoAgent: Zero-code AI agent builder. Create powerful LLM agents with natural language. Top performance, flexible, easy to use.

Developer Tools Free

AutoAgent Alternatives

1

Janus

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Machine Learning Free

Janus Alternatives

0

BuboGPT

BuboGPT is an advanced Large Language Model (LLM) that incorporates multi-modal inputs including text, image and audio, with a unique ability to ground its responses to visual objects.

Large Language Models Free

BuboGPT Alternatives

4

VLM Run

VLM Run: Unify visual AI in production. Pre-built schemas, accurate models, rapid fine-tuning. Ideal for healthcare, finance, media. Seamless integration. High accuracy & scalability. Cost-effective.

Developer Tools Paid

VLM Run Alternatives

2