Best Phi-3 Mini-128K-Instruct ONNX Alternatives in 2025
-

ONNX Runtime: Run ML models faster, anywhere. Accelerate inference & training across platforms. PyTorch, TensorFlow & more supported!
-

Phi-2 is an ideal model for researchers to explore different areas such as mechanistic interpretability, safety improvements, and fine-tuning experiments.
-

Explore Local AI Playground, a free app for offline AI experimentation. Features include CPU inferencing, model management, and more.
-

MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.
-

Gemma 3 270M: Compact, hyper-efficient AI for specialized tasks. Fine-tune for precise instruction following & low-cost, on-device deployment.
-

Build high-performance AI apps on-device without the hassle of model compression or edge deployment.
-

NetMind: Your unified AI platform. Build, deploy & scale with diverse models, powerful GPUs & cost-efficient tools.
-

Nexa AI simplifies deploying high-performance, private generative AI on any device. Build faster with unmatched speed, efficiency & on-device privacy.
-

MiniMax-M1: Open-weight AI model with 1M token context & deep reasoning. Process massive data efficiently for advanced AI applications.
-

ggml is a tensor library for machine learning to enable large models and high performance on commodity hardware.
-

Build AI models from scratch! MiniMind offers fast, affordable LLM training on a single GPU. Learn PyTorch & create your own AI.
-

Nemotron-4 340B, a family of models optimized for NVIDIA NeMo and NVIDIA TensorRT-LLM, includes cutting-edge instruct and reward models, and a dataset for generative AI training.
-

Discover EXAONE 3.5 by LG AI Research. A suite of bilingual (English & Korean) instruction - tuned generative models from 2.4B to 32B parameters. Support long - context up to 32K tokens, with top - notch performance in real - world scenarios.
-

Neural Magic offers high-performance inference serving for open-source LLMs. Reduce costs, enhance security, and scale with ease. Deploy on CPUs/GPUs across various environments.
-

Gemma 3n brings powerful multimodal AI to the edge. Run image, audio, video, & text AI on devices with limited memory.
-

Reka Flash 3: Low-latency, open-source AI reasoning model for fast, efficient apps. Powering chatbots, on-device AI & Nexus.
-

Shrink AI models by 87%, boost speed 12x with CLIKA ACE. Automate compression for faster, cheaper hardware deployment. Preserve accuracy!
-

Mistral Small 3 ( 2501 ) sets a new benchmark in the "small" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models!
-

Stop struggling with AI infra. Novita AI simplifies AI model deployment & scaling with 200+ models, custom options, & serverless GPU cloud. Save time & money.
-

KTransformers, an open - source project by Tsinghua's KVCache.AI team and QuJing Tech, optimizes large - language model inference. It reduces hardware thresholds, runs 671B - parameter models on 24GB - VRAM single - GPUs, boosts inference speed (up to 286 tokens/s pre - processing, 14 tokens/s generation), and is suitable for personal, enterprise, and academic use.
-

Neuton Tiny ML - Make Edge Devices Intelligent - Automatically build extremely tiny models without coding and embed them into any microcontroller
-

Amazon Nova is a suite of state-of-the-art foundation models for AI applications, offering both understanding and creative content generation capabilities.
-

Modular is an AI platform designed to enhance any AI pipeline, offering an AI software stack for optimal efficiency on various hardware.
-

Access AI models optimized and validated by Qualcomm
-

MiniCPM is an End-Side LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings (2.7B in total).
-

Jamba 1.5 Open Model Family, launched by AI21, based on SSM-Transformer architecture, with long text processing ability, high speed and quality, is the best among similar products in the market and suitable for enterprise-level users dealing with large data and long texts.
-

Gemma 3: Google's open-source AI for powerful, multimodal apps. Build multilingual solutions easily with flexible, safe models.
-

CogniSelect SDK: Build AI apps that run LLMs privately in the browser. Get zero-cost runtime, total data privacy & instant scalability.
-

Synexa AI is a powerful AI platform that provides a simple and easy-to-use API interface and supports multiple AI functions such as generating images, videos, and voices. Its goal is to help developers and enterprises quickly integrate AI capabilities and improve work efficiency.
-

Ray is the AI Compute Engine. It powers the world's top AI platforms, supports all AI/ML workloads, scales from laptop to thousands of GPUs, and is Python - native. Unlock AI potential with Ray!
