koboldcpp

8 comments
Boost your AI text-generation with KoboldCpp – an easy-to-use software that offers a versatile Kobold API endpoint, backward compatibility, and a fancy UI. Download now!0
Visit website

What is koboldcpp?

KoboldCpp is an easy-to-use AI text-generation software for GGML models. It's a single self contained distributable from Concedo, that builds off llama.cpp, and adds a versatile Kobold API endpoint, additional format support, backward compatibility, as well as a fancy UI with persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer.



Usage

Download the latest .exe release here or clone the git repo.

Windows binaries are provided in the form of koboldcpp.exe, which is a pyinstaller wrapper for a few .dll files and koboldcpp.py. If you feel concerned, you may prefer to rebuild it yourself with the provided makefiles and scripts.

Weights are not included, you can use the official llama.cpp quantize.exe to generate them from your official weight files (or download them from other places such as TheBloke's Huggingface.

To run, execute koboldcpp.exe or drag and drop your quantized ggml_model.bin file onto the .exe, and then connect with Kobold or Kobold Lite. If you're not on windows, then run the script KoboldCpp.py after compiling the libraries.

Launching with no command line arguments displays a GUI containing a subset of configurable settings. Generally you dont have to change much besides the Presets and GPU Layers. Read the --help for more info about each settings.

By default, you can connect to http://localhost:5001

You can also run it using the command line koboldcpp.exe [ggml_model.bin] [port]. For info, please check koboldcpp.exe --help

Default context size to small? Try --contextsize 3072 to 1.5x your context size! without much perplexity gain. Note that you'll have to increase the max context in the Kobold Lite UI as well (click and edit the number text field).

Big context too slow? Try the --smartcontext flag to reduce prompt processing frequency. Also, you can try to run with your GPU using CLBlast, with --useclblast flag for a speedup

Want even more speedup? Combine --useclblast with --gpulayers to offload entire layers to the GPU! Much faster, but uses more VRAM. Experiment to determine number of layers to offload, and reduce by a few if you run out of memory.

If you are having crashes or issues, you can try turning off BLAS with the --noblas flag. You can also try running in a non-avx2 compatibility mode with --noavx2. Lastly, you can try turning off mmap with --nommap.

For more information, be sure to run the program with the --help flag.


More information on koboldcpp

Launched
2023
Pricing Model
Free
Starting Price
Global Rank
Country
Month Visit
<5k
Tech used
koboldcpp was manually vetted by our editorial team and was first featured on September 4th 2024.
Aitoolnet Featured banner
Related Searches

koboldcpp Alternatives

Load more Alternatives
  1. A RWKV management and startup tool, full automation, only 8MB. And provides an interface compatible

  2. Generate AI images on your Windows GPU for free with NMKD Stable Diffusion GUI. Supports text-to-image, image-to-image, and more. No complicated installation.

  3. ggml is a tensor library for machine learning to enable large models and high performance on commodity hardware.

  4. A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (GGUF), Llama models.

  5. Discover Code Llama, a cutting-edge AI tool for code generation and understanding. Boost productivity, streamline workflows, and empower developers.