What is koboldcpp?

KoboldCpp is an easy-to-use AI text-generation software for GGML models. It's a single self contained distributable from Concedo, that builds off llama.cpp, and adds a versatile Kobold API endpoint, additional format support, backward compatibility, as well as a fancy UI with persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer.

Usage

Download the latest .exe release here or clone the git repo.

Windows binaries are provided in the form of koboldcpp.exe, which is a pyinstaller wrapper for a few .dll files and koboldcpp.py. If you feel concerned, you may prefer to rebuild it yourself with the provided makefiles and scripts.

Weights are not included, you can use the official llama.cpp quantize.exe to generate them from your official weight files (or download them from other places such as TheBloke's Huggingface.

To run, execute koboldcpp.exe or drag and drop your quantized ggml_model.bin file onto the .exe, and then connect with Kobold or Kobold Lite. If you're not on windows, then run the script KoboldCpp.py after compiling the libraries.

Launching with no command line arguments displays a GUI containing a subset of configurable settings. Generally you dont have to change much besides the Presets and GPU Layers. Read the --help for more info about each settings.

By default, you can connect to http://localhost:5001

You can also run it using the command line koboldcpp.exe [ggml_model.bin] [port]. For info, please check koboldcpp.exe --help

Default context size to small? Try --contextsize 3072 to 1.5x your context size! without much perplexity gain. Note that you'll have to increase the max context in the Kobold Lite UI as well (click and edit the number text field).

Big context too slow? Try the --smartcontext flag to reduce prompt processing frequency. Also, you can try to run with your GPU using CLBlast, with --useclblast flag for a speedup

Want even more speedup? Combine --useclblast with --gpulayers to offload entire layers to the GPU! Much faster, but uses more VRAM. Experiment to determine number of layers to offload, and reduce by a few if you run out of memory.

If you are having crashes or issues, you can try turning off BLAS with the --noblas flag. You can also try running in a non-avx2 compatibility mode with --noavx2. Lastly, you can try turning off mmap with --nommap.

For more information, be sure to run the program with the --help flag.

More information on koboldcpp

Launched

2023

Pricing Model

Free

Starting Price

Global Rank

Country

Month Visit

<5k

Tech used

koboldcpp was manually vetted by our editorial team and was first featured on September 4th 2024.

koboldcpp Alternatives

Load more Alternatives

RWKV-Runner
0

Visit Site

A RWKV management and startup tool, full automation, only 8MB. And provides an interface compatible

Compare
NMKD Stable Diffusion
33

Visit Site

Generate AI images on your Windows GPU for free with NMKD Stable Diffusion GUI. Supports text-to-image, image-to-image, and more. No complicated installation.

Compare
GGML
6

Visit Site

ggml is a tensor library for machine learning to enable large models and high performance on commodity hardware.

Compare
Text Generation WebUI
0

Visit Site

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (GGUF), Llama models.

Compare
Code Llama
33

Visit Site

Discover Code Llama, a cutting-edge AI tool for code generation and understanding. Boost productivity, streamline workflows, and empower developers.

Compare

koboldcpp

What is koboldcpp?

More information on koboldcpp

koboldcpp Alternatives

RWKV-Runner

NMKD Stable Diffusion

GGML

Text Generation WebUI

Code Llama