What is StarCoder?
StarCoderBase and StarCoder are Large Language Models (Code LLMs), trained on permissively-licensed data from GitHub. This includes data from 80+ programming language, Git commits and issues, Jupyter Notebooks, and Git commits.
We trained a 15B-parameter model for 1 trillion tokens, similar to LLaMA.
We refined the StarCoderBase for 35B Python tokens. The result is a new model we call StarCoder.
StarCoderBase is a model that outperforms other open Code LLMs in popular programming benchmarks. It also matches or exceeds closed models like code-cushman001 from OpenAI, the original Codex model which powered early versions GitHub Copilot. StarCoder models are able to process more input with a context length over 8,000 tokens than any other open LLM. This allows for a variety of interesting applications. By prompting the StarCoder model with a series dialogues, we allowed them to act like a technical assistant.
More information on StarCoder
StarCoder Alternatives
Load more Alternatives-
Making our our text to SQL model 30 percentage points more accurate over 5 months
-
DeciCoder 1B is a 1 billion parameter decoder-only code completion model trained on the Python, Java, and Javascript subsets of Starcoder Training Dataset.
-
This product is designed to assist programmers with their daily work while also providing a great le
-
Discover Code Llama, a cutting-edge AI tool for code generation and understanding. Boost productivity, streamline workflows, and empower developers.
-
Enhance language models, improve performance, and get accurate results. WizardLM is the ultimate tool for coding, math, and NLP tasks.