30 Best Phi-3 Mini-128K-Instruct ONNX Alternatives in 2025

ONNX Runtime

ONNX Runtime: Run ML models faster, anywhere. Accelerate inference & training across platforms. PyTorch, TensorFlow & more supported!

Machine Learning Free

ONNX Runtime Alternatives

9

Phi-2 by Microsoft

Phi-2 is an ideal model for researchers to explore different areas such as mechanistic interpretability, safety improvements, and fine-tuning experiments.

Large Language Models Free

Phi-2 by Microsoft Alternatives

41

local.ai

Explore Local AI Playground, a free app for offline AI experimentation. Features include CPU inferencing, model management, and more.

Developer Tools Free

local.ai Alternatives

6

MiniCPM3-4B

MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.

Large Language Models Free

MiniCPM3-4B Alternatives

0

Gemma 3 270M

Gemma 3 270M: Compact, hyper-efficient AI for specialized tasks. Fine-tune for precise instruction following & low-cost, on-device deployment.

Large Language Models Free

Gemma 3 270M Alternatives

12

Nexa AI

Build high-performance AI apps on-device without the hassle of model compression or edge deployment.

Machine Learning Free

Nexa AI Alternatives

4

Netmind Power

NetMind: Your unified AI platform. Build, deploy & scale with diverse models, powerful GPUs & cost-efficient tools.

Machine Learning Paid

Netmind Power Alternatives

5

Nexa.ai

Nexa AI simplifies deploying high-performance, private generative AI on any device. Build faster with unmatched speed, efficiency & on-device privacy.

Developer Tools Freemium

Nexa.ai Alternatives

4

MiniMax-M1

MiniMax-M1: Open-weight AI model with 1M token context & deep reasoning. Process massive data efficiently for advanced AI applications.

Large Language Models Free

MiniMax-M1 Alternatives

1

GGML

ggml is a tensor library for machine learning to enable large models and high performance on commodity hardware.

Developer Tools Free

GGML Alternatives

6

MiniMind

Build AI models from scratch! MiniMind offers fast, affordable LLM training on a single GPU. Learn PyTorch & create your own AI.

Machine Learning Free

MiniMind Alternatives

1

Nemotron-4 340B

Nemotron-4 340B, a family of models optimized for NVIDIA NeMo and NVIDIA TensorRT-LLM, includes cutting-edge instruct and reward models, and a dataset for generative AI training.

Large Language Models Free

Nemotron-4 340B Alternatives

0

Discover EXAONE 3.5 by LG AI Research. A suite of bilingual (English & Korean) instruction - tuned generative models from 2.4B to 32B parameters. Support long - context up to 32K tokens, with top - notch performance in real - world scenarios.

Large Language Models Free

EXAONE 3.5 Alternatives

0

Neural Magic

Neural Magic offers high-performance inference serving for open-source LLMs. Reduce costs, enhance security, and scale with ease. Deploy on CPUs/GPUs across various environments.

Machine Learning Paid

Neural Magic Alternatives

7

Gemma 3n

Gemma 3n brings powerful multimodal AI to the edge. Run image, audio, video, & text AI on devices with limited memory.

Large Language Models Free

Gemma 3n Alternatives

0

Reka Flash 3

Reka Flash 3: Low-latency, open-source AI reasoning model for fast, efficient apps. Powering chatbots, on-device AI & Nexus.

Large Language Models Free

Reka Flash 3 Alternatives

1

Clika.io

Shrink AI models by 87%, boost speed 12x with CLIKA ACE. Automate compression for faster, cheaper hardware deployment. Preserve accuracy!

Developer Tools Free Trial

Clika.io Alternatives

4

Mistral Small 3

Mistral Small 3 ( 2501 ) sets a new benchmark in the "small" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models!

Large Language Models Free

Mistral Small 3 Alternatives

0

Novita.ai

Stop struggling with AI infra. Novita AI simplifies AI model deployment & scaling with 200+ models, custom options, & serverless GPU cloud. Save time & money.

Developer Tools Paid

Novita.ai Alternatives

3

ktransformers

KTransformers, an open - source project by Tsinghua's KVCache.AI team and QuJing Tech, optimizes large - language model inference. It reduces hardware thresholds, runs 671B - parameter models on 24GB - VRAM single - GPUs, boosts inference speed (up to 286 tokens/s pre - processing, 14 tokens/s generation), and is suitable for personal, enterprise, and academic use.

Machine Learning Free

ktransformers Alternatives

1

Neuton TinyML

Neuton Tiny ML - Make Edge Devices Intelligent - Automatically build extremely tiny models without coding and embed them into any microcontroller

Machine Learning Free Trial

Neuton TinyML Alternatives

6

Amazon Nova

Amazon Nova is a suite of state-of-the-art foundation models for AI applications, offering both understanding and creative content generation capabilities.

Large Language Models

Amazon Nova Alternatives

41

Modular

Modular is an AI platform designed to enhance any AI pipeline, offering an AI software stack for optimal efficiency on various hardware.

Developer Tools

Modular Alternatives

11

Qualcomm AI Hub

Access AI models optimized and validated by Qualcomm

Developer Tools Paid

Qualcomm AI Hub Alternatives

5

MiniCPM-2B

MiniCPM is an End-Side LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings (2.7B in total).

Large Language Models Free

MiniCPM-2B Alternatives

0

Jamba 1.5 Open Model Family

Jamba 1.5 Open Model Family, launched by AI21, based on SSM-Transformer architecture, with long text processing ability, high speed and quality, is the best among similar products in the market and suitable for enterprise-level users dealing with large data and long texts.

Large Language Models Free

Jamba 1.5 Open Model Family Alternatives

7

Gemma 3

Gemma 3: Google's open-source AI for powerful, multimodal apps. Build multilingual solutions easily with flexible, safe models.

Large Language Models Free

Gemma 3 Alternatives

12

CogniSelect

CogniSelect SDK: Build AI apps that run LLMs privately in the browser. Get zero-cost runtime, total data privacy & instant scalability.

Productivity Free

CogniSelect Alternatives

0

Synexa AI

Synexa AI is a powerful AI platform that provides a simple and easy-to-use API interface and supports multiple AI functions such as generating images, videos, and voices. Its goal is to help developers and enterprises quickly integrate AI capabilities and improve work efficiency.

Developer Tools Paid

Synexa AI Alternatives

2

Ray

Ray is the AI Compute Engine. It powers the world's top AI platforms, supports all AI/ML workloads, scales from laptop to thousands of GPUs, and is Python - native. Unlock AI potential with Ray!

Machine Learning Free

Ray Alternatives

9

Phi-3 Mini-128K-Instruct ONNX Alternatives

Best Phi-3 Mini-128K-Instruct ONNX Alternatives in 2025

Related comparisons