Which LLM model is best for generating Rust code?
Summary: #
We ran multiple large language models(LLM) locally in order to figure out which one is the best at Rust programming. All models were asked the same question:
>>> generate an advanced rust function
Some models generated pretty good and others terrible results.
Note: we do not recommend nor endorse using llm-generated Rust code.
Which LLM is best for generating Rust code? #
First, we tried some models using Jan AI, which has a nice UI. However, after some struggles with Synching up a few Nvidia GPU’s to it, we tried a different approach: running Ollama, which on Linux works very well out of the box. LM Studio is also a popular option.
We ended up running Ollama with CPU only mode on a standard HP Gen9 blade server.
How much RAM do we need? #
The RAM usage is dependent on the model you use and if its use 32-bit floating-point (FP32) representations for model parameters and activations or 16-bit floating-point (FP16). FP16 uses half the memory compared to FP32, which means the RAM requirements for FP16 models can be approximately half of the FP32 requirements. For example, a 175 billion parameter model that requires 512 GB - 1 TB of RAM in FP32 could potentially be reduced to 256 GB - 512 GB of RAM by using FP16. 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
Model Size | Amount of Ram(FP16) |
---|---|
2.7 Billion | 3.1GB |
3 Billion | 4GB |
7 Billion | 8GB |
13 Billion | 16GB |
15 Billion | 32 |
Check if your GPU and setup can run a specific model, using this tool:
https://huggingface.co/spaces/Vokturz/can-it-run-llm
Before we start, we want to mention that there are a giant amount of proprietary “AI as a Service” companies such as chatgpt, claude etc. We only want to use datasets that we can download and run locally, no black magic.
Which LLM model is best for generating Rust code? #
Stable code has put together a nice performance comparison of models less than 3b:
Model | Size | Python | C++ | Javascript | Java | PHP | Rust |
---|---|---|---|---|---|---|---|
Stable Code | 3B | 32.4% | 30.9% | 32.1% | 32.1% | 24.2% | 23.0% |
CodeLLama | 7B | 30.0% | 28.2% | 32.5% | 31.1% | 25.7% | 26.3% |
Deepseek Coder | 1.3B | 28.6% | 29.2% | 28.7% | 29.0% | 23.6% | 18.5% |
Wizard Coder | 3B | 31.6% | 25.6% | 26.2% | 25.8% | 25.3% | 20.4% |
StarCoder | 3B | 21.6% | 19.8% | 21.5% | 20.5% | 19.0% | 16.9% |
Replit Code V1.5 | 3B | 23.0% | 25.9% | 26.2% | 23.6% | 23.2% | 21.5% |
Deci Coder | 1B | 19.1% | 6.8% | 18.4% | 16.7% | 2.1% | 1.7% |
Made by stable code authors using the bigcode-evaluation-harness
test repo.
Source:
https://huggingface.co/stabilityai/stable-code-3b#model-description
https://github.com/bigcode-project/bigcode-evaluation-harness
Ollama: #
Ollama lets us run large language models locally, it comes with a pretty simple with a docker-like cli interface to start, stop, pull and list processes.
# ./bin/ollama
Usage:
ollama [flags]
ollama [command]
Available Commands:
serve Start ollama
create Create a model from a Modelfile
show Show information for a model
run Run a model
stop Stop a running model
pull Pull a model from a registry
push Push a model to a registry
list List models
ps List running models
cp Copy a model
rm Remove a model
help Help about any command
The first thing we want to do is start the Ollama daemon:
$ ./bin/ollama serve &
Now we have Ollama running, let’s try out some models.
Where can we find large language models? #
Finding Models to download and run is pretty avaliable with easy to use sites such as https://ollama.com/ and https://huggingface.co/models. The majority of models at huggingface, we can run locally with Ollama using the following pull syntax, to download them:
$ ./bin/ollama pull hf.co/<USERNAME>/<REPO>:<TAG>
Example: #
$ ./bin/ollama pull hf.co/beowolx/CodeNinja-1.0-OpenChat-7B-GGUF:Q4_K_M
Llama3.2 Model: #
Llama3.2 is a lightweight(1B and 3) version of version of Meta’s Llama3. These models are trained to understand and generate text across multiple languages, with special tuning for interactive conversations and tasks like information retrieval and summarization
Link to model:
https://ollama.com/library/llama3.2
Download and run:
./bin/ollama run llama3.2:1b
>>> generate an advanced rust function
Here is an example of an advanced Rust function that demonstrates several advanced concepts:
/// Returns a vector of pairs containing the sum of
two numbers as one element, and the product of the two numbers as the other element.
///
/// # Examples
///
/// ```
/// use crate::functions;
///
/// assert_eq!(functions::sum_and_product(3, 4), (7, 12));
/// ```rust
pub fn sum_and_product(a: i32, b: i32) -> (i32, i32) {
let result = a + b;
let product = a * b;
// Using the fold method to create a vector of pairs
(result, product)
}
fn main() {
assert_eq!(sum_and_product(3, 4), (7, 12));
}
Llama3 8b Model: #
LLama(Large Language Model Meta AI)3, the next generation of Llama 2, Trained on 15T tokens (7x more than Llama 2) by Meta comes in two sizes, the 8b and 70b version.
Link to model:
https://ollama.com/library/llama3
https://huggingface.co/docs/transformers/en/model_doc/llama3
https://ai.meta.com/blog/meta-llama-3/
https://en.wikipedia.org/wiki/Llama_(language_model)
https://en.wikipedia.org/wiki/Yann_LeCun
Download and run:
$ ./bin/ollama pull llama3:8b
pulling manifest
pulling 6a0746a1ec1a... 100% ▕█████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕█████████████▏ 12 KB
pulling 8ab4849b038c... 100% ▕█████████████▏ 254 B
pulling 577073ffcc6c... 100% ▕█████████████▏ 110 B
pulling 3f8eb4da87fa... 100% ▕█████████████▏ 485 B
verifying sha256 digest
writing manifest
success
$ ./bin/ollama run llama3:8b
>>> generate an advanced rust function
Here's a simple implementation of a trie data structure
in Rust, which is a type of prefix tree that can be used
to store a dynamic set or associative array where the keys are usually strings.
use std::collections::HashMap;
use std::iter::FromIterator;
struct TrieNode {
children: HashMap<char, TrieNode>,
is_end_of_word: bool,
}
impl TrieNode {
fn new() -> TrieNode {
TrieNode { children: HashMap::new(),
is_end_of_word: false }
}
}
pub struct Trie {
root: TrieNode,
}
impl Trie {
pub fn new() -> Trie {
Trie { root: TrieNode::new() }
}
pub fn insert(&mut self, word: &str) {
let mut node = &mut self.root;
for c in word.chars() {
match node.children.insert(c,
TrieNode::new()) {
Some(_) => {}
None => {
return; // cannot insert duplicat eword
}
}
node = node.children.get_mut(&c).unwrap();
}
node.is_end_of_word = true;
}
pub fn search(&self, word: &str) -> bool {
let mut node = &self.root;
for c in word.chars() {
match node.children.get(&c) {
Some(child) => node = child,
None => return false, // not found
}
}
node.is_end_of_word
}
pub fn starts_with(&self, prefix: &str) -> bool {
let mut node = &self.root;
for c in prefix.chars() {
match node.children.get(&c) {
Some(child) => node = child,
None => return false, // not found
}
}
true // found prefix
}
}
fn main() {
let mut trie = Trie::new();
trie.insert("hello");
trie.insert("world");
println!("{}", if trie.search("hello") {"found"}
else {"not found"});
println!("{}", if trie.starts_with("hel") {"found"}
else {"not found"});
}
This code creates a basic Trie data structure and
provides methods to insert words, search for words, and check if a prefix is present in the Trie. The Trie
struct holds a root node which has children that are also nodes of the Trie. Each node also keeps track of whether it’s the end of a word.
The insert
method iterates over each character in the given word and inserts it into the Trie if it’s not already present. If a duplicate word is attempted to be inserted, the function returns without inserting anything.
The search
method starts at the root node and follows
the child nodes until it reaches the end of the word or
runs out of characters. It then checks whether the end
of the word was found and returns this information.
The starts_with
method does the same thing as
search
, but it doesn’t check for the end of a word.
Deepseek coder v2 model: #
Deepseek Coder V2 outperformed OpenAI’s GPT-4-Turbo-1106 and GPT-4-061, Google’s Gemini1.5 Pro and Anthropic’s Claude-3-Opus models at Coding. Made by Deepseker AI as an Opensource(MIT license) competitor to these industry giants.
Download and run the model
$ ./bin/ollama pull deepseek-coder-v2
$ ./bin/ollama run deepseek-coder-v2
Link to model:
https://ollama.com/library/deepseek-coder-v2
https://github.com/deepseek-ai/DeepSeek-Coder-V2
>>> generate an advanced rust function
Certainly! Below is an example of a more advanced Rust functi[0/142
that demonstrates several concepts such as higher-order
functions, closures, generics, and error handling. This function
calculates the factorial of a number in a safe and efficient
manner using these techniques.
use std::num::ParseIntError;
// Define a trait for numeric types to allow generic
calculations
trait Numeric: Sized + Copy {
fn mul(self, other: Self) -> Self;
fn one() -> Self;
}
impl Numeric for i32 {
fn mul(self, other: Self) -> Self {
self * other
}
fn one() -> Self {
1
}
}
impl Numeric for u64 {
fn mul(self, other: Self) -> Self {
self * other
}
fn one() -> Self {
1
}
}
// Define a generic factorial function using closures and
higher-order functions
fn factorial<T>(n: T) -> Result<T, ParseIntError>
where
T: Numeric + std::str::FromStr,
{
if n < T::one() {
return Ok(T::one());
}
let mut result = T::one();
for i in 1..=n {
result = result.mul(i);
}
Ok(result)
}
fn main() -> Result<(), ParseIntError> {
// Test the factorial function with different numeric types
let num_str = "5";
match num_str.parse::<u64>() {
Ok(num) => match factorial(num) {
Ok(fact) => println!("The factorial of {} is {}",
num, fact),
Err(e) => eprintln!("Error calculating factorial:
{:?}", e),
},
Err(e) => eprintln!("Invalid number: {:?}", e),
}
let num_str = "5";
match num_str.parse::<i32>() {
Ok(num) => match factorial(num) {
Ok(fact) => println!("The factorial of {} is {}",
num, fact),
Err(e) => eprintln!("Error calculating factorial:
{:?}", e),
},
Err(e) => eprintln!("Invalid number: {:?}", e),
}
Ok(())
}
Explanation: #
- Numeric Trait: This trait defines basic operations for numeric types, including multiplication and a method to get the value one. It is implemented for both
i32
andu64
. - Factorial Function: The
factorial
function is generic over any type that implements theNumeric
trait. It uses a closure to multiply the result by each integer from 1 up ton
.
This approach allows the function to be used with both signed (i32
) and unsigned integers (u64
).
-
Error Handling: The factorial calculation may fail if the input string cannot be parsed into an integer. Therefore, the function returns a
Result<T, ParseIntError>
. -
Main Function: Demonstrates how to use the
factorial
function with bothu64
andi32
types by parsing strings to integers. This part of the code handles potential errors from string parsing and factorial computation gracefully.
This example showcases advanced Rust features such as trait-based generic programming, error handling, and higher-order functions, making it a robust and versatile implementation for calculating factorials in different numeric contexts.
End of Model input.
Stable code Model: #
Stable code is a 2.7B by Stability AI, made as a coding completion LLM to 7b models.
./bin/ollama pull stable-code
>>> generate an advanced rust function
Sure, here's an example of a more advanced Rust function that
uses the `rayon` crate for parallel execution:
use std::error::Error;
use rayon::prelude::*;
fn divide_into_batches(
nums: &mut Vec<i32>,
batch_size: usize,
) -> Result<(), Box<dyn Error>>> {
// Divide the numbers into batches of the specified size.
let batches = nums
.chunks_mut(batch_size)
.collect::<Vec<_>>();
Ok(())
}
This function takes a mutable reference to a vector of integers,
and an integer specifying the batch size. The function divides
the numbers into batches of the specified size using Rayon’s
chunks_mut
method.
Note that this is just one example of a more advanced Rust
function that uses the rayon
crate for parallel execution.
There are many other ways to achieve parallelism in Rust,
depending on the specific requirements and constraints of your
application.
Code Gemma Model: #
CodeGemma is a collection of compact models specialized in coding tasks, from code completion and generation to understanding natural language, solving math problems, and following instructions. Its lightweight design maintains powerful capabilities across these diverse programming functions, made by Google.
Link to model:
https://ollama.com/library/codegemma
https://ai.google.dev/gemma/docs/codegemma
Download and run:
./bin/ollama run codegemma
>>> generate an advanced rust function
use std::collections::HashMap;
struct TurnState {
players: HashMap<String, i32>,
current_player: String,
dice_pool: Vec<i32>,
}
impl TurnState {
fn new(players: HashMap<String, i32>) -> Self {
let first_player = players.keys().next().unwrap();
TurnState {
players,
current_player: first_player.to_string(),
dice_pool: vec![1, 2, 3, 4, 5, 6],
}
}
fn end_of_turn(&mut self) {
// Roll the dice
let dice_rolls = self.dice_pool.iter().map(|die|
rand::random::<i32>() % 6 + 1).collect::<Vec<_>>();
// Calculate the score for the current player
let player_score = dice_rolls.iter().sum();
// Add the score to the player's total
self.players
.entry(self.current_player.clone())
.and_modify(|score| *score += player_score);
// Advance to the next player
let players_iter = self.players.keys();
let current_player_index = players_iter.position(|&p| p ==
&self.current_player).unwrap();
let next_player_index = (current_player_index + 1) %
players_iter.count();
self.current_player =
players_iter.nth(next_player_index).unwrap().to_string();
}
}
Usage:
let mut turn_state = TurnState::new(HashMap::from([("Alice", 0),
("Bob", 0)]));
loop {
turn_state.end_of_turn();
// Check if any player has reached a score of 100
if turn_state.players.values().any(|&score| score >= 100) {
break;
}
}
println!("Winner: {}", turn_state.current_player);
Features:
- Random dice roll simulation: Uses the
rand
crate to simulate random dice rolls. - Player turn management: Keeps track of the current player and rotates players after each turn.
- Score calculation: Calculates the score for each turn based on the dice rolls.
- Winner detection: Checks if any player has reached a score of 100.
Note:
- This code requires the
rand
crate to be installed. - The
end_of_turn()
function assumes that the dice pool contains six dice with values from 1 to 6. - The game logic can be further extended to include additional features, such as special dice or different scoring rules.
Code ninja model: #
Code Ninja is a 7b, trained on and fine-tuned through Supervised Fine Tuning on two expansive datasets, encompassing over 400,000 coding instruction. Made as a more advanced version than OpenChat 3.5.
Link to model:
https://huggingface.co/beowolx/CodeNinja-1.0-OpenChat-7B
Download and run:
./bin/ollama pull hf.co/beowolx/CodeNinja-1.0-OpenChat-7B-GGUF:Q4_K_M
./bin/ollama run hf.co/beowolx/CodeNinja-1.0-OpenChat-7B-GGUF:Q4_K_M
>>> generate an advanced rust function
fn advanced_function(x: i32, y: i32) -> i32 {
let sum = x + y;
let product = x * y;
let difference = x - y;
match sum > 100 {
true => return product,
false => return difference
}
}
Starcoder Model: #
Starcoder is a Grouped Query Attention Model that has been trained on over 600 programming languages based on BigCode’s the stack v2 dataset. Made with the intent of code completion.
The model comes in 3, 7 and 15B sizes.
Let’s save on the resources and use the 7b one(starcoder2:7b was trained on 17 programming languages and 3.5+ trillion tokens.)
Link to model: https://ollama.com/library/starcoder2
Download and run:
./bin/ollama pull starcoder2:7b
pulling manifest
pulling cf866ccad4e4... 100% ▕████████████▏ 4.0 GB
pulling 4ec42cd966c9... 100% ▕████████████▏ 12 KB
pulling 3b190e68fefe... 100% ▕████████████▏ 142 B
pulling 37c99d4c7b60... 100% ▕████████████▏ 41 B
pulling 2ac306bb5658... 100% ▕████████████▏ 345 B
verifying sha256 digest
writing manifest
success
./bin/ollama run starcoder2:7b
>>> generate an advanced rust function
use std::mem;
pub fn <|>
Starcoder2:7b wins the shortest output!:)
Lets try the more advanced one, the 9GB, 15b with 600+ programmming languages version.
./bin/ollama pull starcoder2:15b
One would assume this version would perform better, it did much worse……
Output:
./bin/ollama run starcoder2:15b
>>> generate an advanced rust function
"""
assert_generates(
"""
generate an advanced rust function
fn foo() {
bar();
baz();
}
""",
completion,
expected,
)
def test_completions_from_start():
completion = "generate an advanced rust"
expected = """fn foo
And then it crashed…
Lets try to give it something else:
>>> write 1+1 in python
# [](# "example")Example output:
write 2 in python
<BLANKLINE>
The last line can also be an empty string.
## [](# "test-suite")Test suite:
### [](# "test-suite@pass")`pass`:
write 1+1 in python
crashed…
Something seems pretty off with this model….
Mistral Model: #
Mistral 7B is a 7.3B parameter open-source(apache2 license) language model that outperforms much larger models like Llama 2 13B and matches many benchmarks of Llama 1 34B. Its key innovations include Grouped-query attention and Sliding Window Attention for efficient processing of long sequences. Released under Apache 2.0 license, it can be deployed locally or on cloud platforms, and its chat-tuned version competes with 13B models. The model particularly excels at coding and reasoning tasks while using significantly fewer resources than comparable models.
Link to model:
https://ollama.com/library/mistral
https://mistral.ai/news/announcing-mistral-7b/
Download and run:
./bin/ollama pull mistral
pulling manifest
pulling ff82381e2bea... 100% ▕████████████▏ 4.1 GB
pulling 43070e2d4e53... 100% ▕████████████▏ 11 KB
pulling 491dfa501e59... 100% ▕████████████▏ 801 B
pulling ed11eda7790d... 100% ▕████████████▏ 30 B
pulling 42347cd80dc8... 100% ▕████████████▏ 485 B
verifying sha256 digest
writing manifest
success
./bin/ollama run mistral
>>> generate an advanced rust function
Here is an example of a more advanced Rust function, which uses
recursion to find the Fibonacci sequence. This function takes an
integer `n` as input and returns the `n`th number in the Fibonacci
sequence.
fn fibonacci(n: u32) -> u32 {
match n {
0 | 1 => n, // base case: 0 and 1 are both Fibonacci numbers
_ if n < 2 => 0, // the first number after 1 in the sequence
is 0
_ => fibonacci(n - 1) + fibonacci(n - 2), // recursive case:
add the previous two numbers in the sequence
}
}
This function uses pattern matching to handle the base cases (when n
is either 0 or 1) and the recursive case, where it calls itself twice
with decreasing arguments. The resulting values are then added
together to compute the n
th number in the Fibonacci sequence. This
implementation follows the definition of the Fibonacci sequence as the
sum of the previous two numbers, starting with 0 and 1.
You can call this function using:
let fib_5 = fibonacci(5); // fib_5q will be assigned the val
ue 3 (the fifth number in the Fibonacci sequence)
Codellama Model: #
Codellama is a model made for generating and discussing code, the model has been built on top of Llama2 by Meta. An LLM made to complete coding tasks and helping new developers.
We do not recommend using Code Llama or Code Llama - Python to perform general natural language tasks since neither of these models are designed to follow natural language instructions. Code Llama is specialized for code-specific tasks and isn’t appropriate as a foundation model for other tasks. - Meta
Link to model:
https://ollama.com/library/codellama
https://huggingface.co/meta-llama
https://ai.meta.com/blog/code-llama-large-language-model-coding/
Download and run:
./bin/ollama run codellama:7b
pulling manifest
pulling 3a43f93b78ec... 100% ▕██████████████████████████▏ 3.8 GB
pulling 8c17c2ebb0ea... 100% ▕██████████████████████████▏ 7.0 KB
pulling 590d74a5569b... 100% ▕██████████████████████████▏ 4.8 KB
pulling 2e0493f67d0c... 100% ▕██████████████████████████▏ 59 B
pulling 7f6a57943a88... 100% ▕██████████████████████████▏ 120 B
pulling 316526ac7323... 100% ▕██████████████████████████▏ 529 B
verifying sha256 digest
writing manifest
success
>>> generate an advanced rust function
Here's a more advanced Rust function that demonstrates some of the language's
features:
fn process_numbers(numbers: Vec<i32>) -> Result<Vec<i32>, &str> {
// First, we'll filter out any negative numbers from the input vector
let filtered = numbers.into_iter().filter(|&x| x >= 0).collect::<Vec<i32>>();
// Then, we'll map each number to its square root and collect the results into a
new vector
let squared = filtered.into_iter().map(|&x| x * x).collect::<Vec<i32>>();
// Finally, we'll return the result as a tuple of the filtered and squared
vectors
Ok((filtered, squared))
}
This function takes in a vector of integers numbers
and returns a tuple of two
vectors: the first containing only positive numbers, and the second containing the
square roots of each number. The function uses several advanced Rust features,
including:
- Pattern matching: The
filtered
variable is created by using pattern matching to filter out any negative numbers from the input vector. - Collecting into a new vector: The
squared
variable is created by collecting the results of themap
function into a new vector. - Returning a tuple: The function returns a tuple of the two vectors as its result.
To use this function, you can pass in a vector of integers as the argument, like so:
let input = vec![1, 2, 3, 4, 5];
let (filtered, squared) = process_numbers(input).unwrap();
The unwrap()
method is used to extract the result from the Result
type, which is
returned by the function.
Final verdict: #
We got a wide variarty of different results:
- Llama 3 (1b and 8b models):
- 1b generated a basic function
sum_and_product
which demonstrates Rust basics like returning multiple values as a tuple. - 8b provided a more complex implementation of a Trie data structure. The code included struct definitions, methods for insertion and lookup, and demonstrated recursive logic and error handling.
- 1b generated a basic function
- Deepseek Coder V2:
- Showcased a generic function for calculating factorials with error handling using traits and higher-order functions. The implementation was designed to support multiple numeric types like
i32
andu64
.
- Showcased a generic function for calculating factorials with error handling using traits and higher-order functions. The implementation was designed to support multiple numeric types like
- Stable Code:
- Presented a function that divided a vector of integers into batches using the external Rayon crate for parallel processing. The example highlighted the use of parallel execution in Rust.
- CodeGemma:
- Implemented a simple turn-based game using a
TurnState
struct, which included player management, dice roll simulation, and winner detection. The code demonstrated struct-based logic, random number generation, and conditional checks.
- Implemented a simple turn-based game using a
- CodeNinja:
- Created a function that calculated a product or difference based on a condition. The example was relatively straightforward, emphasizing simple arithmetic and branching using a match expression.
- Starcoder (7b and 15b):
- The 7b version provided a minimal and incomplete Rust code snippet with only a placeholder. The 15b version outputted debugging tests and code that seemed incoherent, suggesting significant issues in understanding or formatting the task prompt. We would rate this model as the worst one.
- Mistral:
- Delivered a recursive Fibonacci function. The implementation illustrated the use of pattern matching and recursive calls to generate Fibonacci numbers, with basic error-checking.
- CodeLlama:
- Generated an incomplete function that aimed to process a list of numbers, filtering out negatives and squaring the results. It demonstrated the use of iterators and transformations but was left unfinished.
Summary Insight: #
- Models like Deepseek Coder V2 and Llama 3 8b excelled in handling advanced programming concepts like generics, higher-order functions, and data structures.
- Some models struggled to follow through or provided incomplete code (e.g., Starcoder, CodeLlama).
- Others demonstrated simple but clear examples of advanced Rust usage, like Mistral with its recursive approach or Stable Code with parallel processing.
Stay tuned for the next blog post at blog.rust.careers, where we will use the Ollama api.