Which LLM model is best for generating Rust code

Summary: #

We ran multiple large language models(LLM) locally in order to figure out which one is the best at Rust programming. All models were asked the same question:
>>> generate an advanced rust function

Some models generated pretty good and others terrible results.

Note: we do not recommend nor endorse using llm-generated Rust code.

Which LLM is best for generating Rust code? #

First, we tried some models using Jan AI, which has a nice UI. However, after some struggles with Synching up a few Nvidia GPU’s to it, we tried a different approach: running Ollama, which on Linux works very well out of the box.

We ended up running Ollama with CPU only mode on a standard HP Gen9 blade server.

Will macroeconimcs limit the developement of AI? Recommended read

How much RAM do we need? #

The RAM usage is dependent on the model you use and if its use 32-bit floating-point (FP32) representations for model parameters and activations or 16-bit floating-point (FP16). FP16 uses half the memory compared to FP32, which means the RAM requirements for FP16 models can be approximately half of the FP32 requirements. For example, a 175 billion parameter model that requires 512 GB - 1 TB of RAM in FP32 could potentially be reduced to 256 GB - 512 GB of RAM by using FP16. 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

Model Size	Amount of Ram(FP16)
2.7 Billion	3.1GB
3 Billion	4GB
7 Billion	8GB
13 Billion	16GB
15 Billion	32

Check if your GPU and setup can run a specific model, using this tool:
https://huggingface.co/spaces/Vokturz/can-it-run-llm

Before we start, we want to mention that there are a giant amount of proprietary “AI as a Service” companies such as chatgpt, claude etc. We only want to use datasets that we can download and run locally, no black magic.

Which LLM model is best for generating Rust code? #

Stable code has put together a nice performance comparison of models less than 3b:

Model	Size	Python	C++	Javascript	Java	PHP	Rust
Stable Code	3B	32.4%	30.9%	32.1%	32.1%	24.2%	23.0%
CodeLLama	7B	30.0%	28.2%	32.5%	31.1%	25.7%	26.3%
Deepseek Coder	1.3B	28.6%	29.2%	28.7%	29.0%	23.6%	18.5%
Wizard Coder	3B	31.6%	25.6%	26.2%	25.8%	25.3%	20.4%
StarCoder	3B	21.6%	19.8%	21.5%	20.5%	19.0%	16.9%
Replit Code V1.5	3B	23.0%	25.9%	26.2%	23.6%	23.2%	21.5%
Deci Coder	1B	19.1%	6.8%	18.4%	16.7%	2.1%	1.7%

Made by stable code authors using the bigcode-evaluation-harness test repo.

Source:
https://huggingface.co/stabilityai/stable-code-3b#model-description
https://github.com/bigcode-project/bigcode-evaluation-harness

Ollama: #

Ollama lets us run large language models locally, it comes with a pretty simple with a docker-like cli interface to start, stop, pull and list processes.

# ./bin/ollama         
Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  stop        Stop a running model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

The first thing we want to do is start the Ollama daemon:
$ ./bin/ollama serve &

Now we have Ollama running, let’s try out some models.

Where can we find large language models? #

Finding Models to download and run is pretty avaliable with easy to use sites such as https://ollama.com/ and https://huggingface.co/models. The majority of models at huggingface, we can run locally with Ollama using the following pull syntax, to download them:

$ ./bin/ollama pull hf.co/<USERNAME>/<REPO>:<TAG>

Example: #

$ ./bin/ollama pull hf.co/beowolx/CodeNinja-1.0-OpenChat-7B-GGUF:Q4_K_M

Llama3.2 Model: #

Llama3.2 is a lightweight(1B and 3) version of version of Meta’s Llama3. These models are trained to understand and generate text across multiple languages, with special tuning for interactive conversations and tasks like information retrieval and summarization

Link to model:
https://ollama.com/library/llama3.2

Download and run:

./bin/ollama run llama3.2:1b
>>> generate an advanced rust function
Here is an example of an advanced Rust function that demonstrates several advanced concepts:
/// Returns a vector of pairs containing the sum of 
two numbers as one element, and the product of the two numbers as the other element.

///
/// # Examples
///
/// ```
/// use crate::functions;
///
/// assert_eq!(functions::sum_and_product(3, 4), (7, 12));
/// ```rust
pub fn sum_and_product(a: i32, b: i32) -> (i32, i32) {
    let result = a + b;
    let product = a * b;

    // Using the fold method to create a vector of pairs
    (result, product)
}

fn main() {
    assert_eq!(sum_and_product(3, 4), (7, 12));
}

Llama3 8b Model: #

LLama(Large Language Model Meta AI)3, the next generation of Llama 2, Trained on 15T tokens (7x more than Llama 2) by Meta comes in two sizes, the 8b and 70b version.

Link to model:
https://ollama.com/library/llama3
https://huggingface.co/docs/transformers/en/model_doc/llama3
https://ai.meta.com/blog/meta-llama-3/
https://en.wikipedia.org/wiki/Llama_(language_model)
https://en.wikipedia.org/wiki/Yann_LeCun

Download and run:

$ ./bin/ollama pull llama3:8b    
pulling manifest 
pulling 6a0746a1ec1a... 100% ▕█████████████▏ 4.7 GB     
pulling 4fa551d4f938... 100% ▕█████████████▏  12 KB     
pulling 8ab4849b038c... 100% ▕█████████████▏  254 B     
pulling 577073ffcc6c... 100% ▕█████████████▏  110 B     
pulling 3f8eb4da87fa... 100% ▕█████████████▏  485 B     
verifying sha256 digest 
writing manifest 
success 
$ ./bin/ollama run llama3:8b

>>> generate an advanced rust function
Here's a simple implementation of a trie data structure 
in Rust, which is a type of prefix tree that can be used
to store a dynamic set or associative array where the keys are usually strings.

use std::collections::HashMap;
use std::iter::FromIterator;

struct TrieNode {
    children: HashMap<char, TrieNode>,
    is_end_of_word: bool,
}

impl TrieNode {
    fn new() -> TrieNode {                           
        TrieNode { children: HashMap::new(), 
is_end_of_word: false }
    }
}

pub struct Trie {
    root: TrieNode,
}

impl Trie {
    pub fn new() -> Trie {
        Trie { root: TrieNode::new() }
    }

    pub fn insert(&mut self, word: &str) {
        let mut node = &mut self.root;
        for c in word.chars() {
            match node.children.insert(c, 
TrieNode::new()) {
                Some(_) => {}
                None => {
                    return; // cannot insert duplicat eword
                }
            }
            node = node.children.get_mut(&c).unwrap();
        }
        node.is_end_of_word = true;
    }

    pub fn search(&self, word: &str) -> bool {
        let mut node = &self.root;
        for c in word.chars() {
            match node.children.get(&c) {
                Some(child) => node = child,
                None => return false, // not found
            }
        }
        node.is_end_of_word
    }

    pub fn starts_with(&self, prefix: &str) -> bool {
        let mut node = &self.root;
        for c in prefix.chars() {
            match node.children.get(&c) {
                Some(child) => node = child,
                None => return false, // not found
            }
        }
        true // found prefix
    }
}

fn main() {
    let mut trie = Trie::new();
    trie.insert("hello");
    trie.insert("world");
    println!("{}", if trie.search("hello") {"found"} 
else {"not found"});
    println!("{}", if trie.starts_with("hel") {"found"} 
else {"not found"});
}

This code creates a basic Trie data structure and provides methods to insert words, search for words, and check if a prefix is present in the Trie. The Trie struct holds a root node which has children that are also nodes of the Trie. Each node also keeps track of whether it’s the end of a word.

The insert method iterates over each character in the given word and inserts it into the Trie if it’s not already present. If a duplicate word is attempted to be inserted, the function returns without inserting anything.

The search method starts at the root node and follows the child nodes until it reaches the end of the word or runs out of characters. It then checks whether the end of the word was found and returns this information.

The starts_with method does the same thing as search, but it doesn’t check for the end of a word.

Deepseek coder v2 model: #

Deepseek Coder V2 outperformed OpenAI’s GPT-4-Turbo-1106 and GPT-4-061, Google’s Gemini1.5 Pro and Anthropic’s Claude-3-Opus models at Coding. Made by Deepseker AI as an Opensource(MIT license) competitor to these industry giants.

Download and run the model

$ ./bin/ollama pull deepseek-coder-v2
$ ./bin/ollama run deepseek-coder-v2

Link to model:
https://ollama.com/library/deepseek-coder-v2
https://github.com/deepseek-ai/DeepSeek-Coder-V2

 >>> generate an advanced rust function

 Certainly! Below is an example of a more advanced Rust functi[0/142
that demonstrates several concepts such as higher-order 
functions, closures, generics, and error handling. This function 
calculates the factorial of a number in a safe and efficient 
manner using these techniques.

use std::num::ParseIntError;

// Define a trait for numeric types to allow generic 
calculations
trait Numeric: Sized + Copy {
    fn mul(self, other: Self) -> Self;
    fn one() -> Self;
}

impl Numeric for i32 {
    fn mul(self, other: Self) -> Self {
        self * other
    }

    fn one() -> Self {
        1
    }
}

impl Numeric for u64 {
    fn mul(self, other: Self) -> Self {
        self * other
    }
    fn one() -> Self {
        1
    }
}

// Define a generic factorial function using closures and 
higher-order functions
fn factorial<T>(n: T) -> Result<T, ParseIntError>
where
    T: Numeric + std::str::FromStr,
{
    if n < T::one() {
        return Ok(T::one());
    }

    let mut result = T::one();
    for i in 1..=n {
        result = result.mul(i);
    }

    Ok(result)
}




fn main() -> Result<(), ParseIntError> {
    // Test the factorial function with different numeric types
    let num_str = "5";
    match num_str.parse::<u64>() {
        Ok(num) => match factorial(num) {
            Ok(fact) => println!("The factorial of {} is {}", 
num, fact),
            Err(e) => eprintln!("Error calculating factorial: 
{:?}", e),
        },
        Err(e) => eprintln!("Invalid number: {:?}", e),
    }

    let num_str = "5";
    match num_str.parse::<i32>() {
        Ok(num) => match factorial(num) {
            Ok(fact) => println!("The factorial of {} is {}",
num, fact),
            Err(e) => eprintln!("Error calculating factorial:
{:?}", e),
        },
        Err(e) => eprintln!("Invalid number: {:?}", e),
    }

    Ok(())
}

Explanation: #

Numeric Trait: This trait defines basic operations for numeric types, including multiplication and a method to get the value one. It is implemented for both i32 and u64.
Factorial Function: The factorial function is generic over any type that implements the Numeric trait. It uses a closure to multiply the result by each integer from 1 up to n.

This approach allows the function to be used with both signed (i32) and unsigned integers (u64).

Error Handling: The factorial calculation may fail if the input string cannot be parsed into an integer. Therefore, the function returns a Result<T, ParseIntError>.
Main Function: Demonstrates how to use the factorial function with both u64 and i32 types by parsing strings to integers. This part of the code handles potential errors from string parsing and factorial computation gracefully.

This example showcases advanced Rust features such as trait-based generic programming, error handling, and higher-order functions, making it a robust and versatile implementation for calculating factorials in different numeric contexts.

End of Model input.

Stable code Model: #

Stable code is a 2.7B by Stability AI

./bin/ollama pull stable-code

>>> generate an advanced rust function
Sure, here's an example of a more advanced Rust function that 
uses the `rayon` crate for parallel execution:

use std::error::Error;
use rayon::prelude::*;

fn divide_into_batches(
    nums: &mut Vec<i32>,
    batch_size: usize,
) -> Result<(), Box<dyn Error>>> {
    // Divide the numbers into batches of the specified size.
    let batches = nums
        .chunks_mut(batch_size)
        .collect::<Vec<_>>();

    Ok(())
}

This function takes a mutable reference to a vector of integers, and an integer specifying the batch size. The function divides the numbers into batches of the specified size using Rayon’s chunks_mut method.

Note that this is just one example of a more advanced Rust function that uses the rayon crate for parallel execution. There are many other ways to achieve parallelism in Rust, depending on the specific requirements and constraints of your application.

Code Gemma Model: #

CodeGemma is a collection of compact models specialized in coding tasks, from code completion and generation to understanding natural language, solving math problems, and following instructions. Its lightweight design maintains powerful capabilities across these diverse programming functions, made by Google.

Link to model:
https://ollama.com/library/codegemma https://ai.google.dev/gemma/docs/codegemma

Download and run:

 ./bin/ollama run codegemma
>>> generate an advanced rust function

use std::collections::HashMap;

struct TurnState {
    players: HashMap<String, i32>,
    current_player: String,
    dice_pool: Vec<i32>,
}

impl TurnState {
    fn new(players: HashMap<String, i32>) -> Self {
        let first_player = players.keys().next().unwrap();
        TurnState {
            players,
            current_player: first_player.to_string(),
            dice_pool: vec![1, 2, 3, 4, 5, 6],
        }
    }

    fn end_of_turn(&mut self) {
        // Roll the dice
        let dice_rolls = self.dice_pool.iter().map(|die| 
rand::random::<i32>() % 6 + 1).collect::<Vec<_>>();

        // Calculate the score for the current player
        let player_score = dice_rolls.iter().sum();

        // Add the score to the player's total
        self.players
            .entry(self.current_player.clone())
            .and_modify(|score| *score += player_score);

        // Advance to the next player
        let players_iter = self.players.keys();
        let current_player_index = players_iter.position(|&p| p == 
&self.current_player).unwrap();
        let next_player_index = (current_player_index + 1) % 
players_iter.count();
        self.current_player = 
players_iter.nth(next_player_index).unwrap().to_string();
    }
}

Usage:

let mut turn_state = TurnState::new(HashMap::from([("Alice", 0), 
("Bob", 0)]));

loop {
    turn_state.end_of_turn();

    // Check if any player has reached a score of 100
    if turn_state.players.values().any(|&score| score >= 100) {
        break;
    }
}

println!("Winner: {}", turn_state.current_player);

Features:

Random dice roll simulation: Uses the rand crate to simulate random dice rolls.
Player turn management: Keeps track of the current player and rotates players after each turn.
Score calculation: Calculates the score for each turn based on the dice rolls.
Winner detection: Checks if any player has reached a score of 100.

Note:

This code requires the rand crate to be installed.
The end_of_turn() function assumes that the dice pool contains six dice with values from 1 to 6.
The game logic can be further extended to include additional features, such as special dice or different scoring rules.

Code ninja model: #

Link to model:
https://huggingface.co/beowolx/CodeNinja-1.0-OpenChat-7B

./bin/ollama pull hf.co/beowolx/CodeNinja-1.0-OpenChat-7B-GGUF:Q4_K_M                                                  
./bin/ollama run hf.co/beowolx/CodeNinja-1.0-OpenChat-7B-GGUF:Q4_K_M
>>> generate an advanced rust function

 fn advanced_function(x: i32, y: i32) -> i32 {
    let sum = x + y;
    let product = x * y;
    let difference = x - y;

    match sum > 100 {
        true => return product,
        false => return difference
    }
}

Starcoder Model: #

Starcoder is a Grouped Query Attention Model that has been trained on over 600 programming languages based on BigCode’s the stack v2 dataset. Made with the intent of code completion.
The model comes in 3, 7 and 15B sizes.

Let’s save on the resources and use the 7b one(starcoder2:7b was trained on 17 programming languages and 3.5+ trillion tokens.)

Link to model: https://ollama.com/library/starcoder2

Download and run:

 ./bin/ollama pull starcoder2:7b          
pulling manifest 
pulling cf866ccad4e4... 100% ▕████████████▏ 4.0 GB                    
pulling 4ec42cd966c9... 100% ▕████████████▏  12 KB                    
pulling 3b190e68fefe... 100% ▕████████████▏  142 B                    
pulling 37c99d4c7b60... 100% ▕████████████▏   41 B                    
pulling 2ac306bb5658... 100% ▕████████████▏  345 B                    
verifying sha256 digest 
writing manifest 
success 
 ./bin/ollama run starcoder2:7b

>>> generate an advanced rust function

use std::mem;
pub fn <|>

Starcoder2:7b wins the shortest output!:)

Lets try the more advanced one, the 9GB, 15b with 600+ programmming languages version.

./bin/ollama pull starcoder2:15b

One would assume this version would perform better, it did much worse……

Output:

./bin/ollama run starcoder2:15b
>>> generate an advanced rust function

"""
    assert_generates(
        """
        generate an advanced rust function

        fn foo() {
            bar();
            baz();
        }
        """,
        completion,
        expected,
    )

def test_completions_from_start():
    completion = "generate an advanced rust"

    expected = """fn foo

And then it crashed…

Lets try to give it something else:

>>> write 1+1 in python

# [](# "example")Example output:

    write 2 in python
    <BLANKLINE>



The last line can also be an empty string.


## [](# "test-suite")Test suite:


### [](# "test-suite@pass")`pass`:

write 1+1 in python

crashed…

Something seems pretty off with this model….

Mistral Model: #

Mistral 7B is a 7.3B parameter open-source(apache2 license) language model that outperforms much larger models like Llama 2 13B and matches many benchmarks of Llama 1 34B. Its key innovations include Grouped-query attention and Sliding Window Attention for efficient processing of long sequences. Released under Apache 2.0 license, it can be deployed locally or on cloud platforms, and its chat-tuned version competes with 13B models. The model particularly excels at coding and reasoning tasks while using significantly fewer resources than comparable models.

Link to model:
https://ollama.com/library/mistral
https://mistral.ai/news/announcing-mistral-7b/

Download and run:

 ./bin/ollama pull mistral
pulling manifest 
pulling ff82381e2bea... 100% ▕████████████▏ 4.1 GB               
pulling 43070e2d4e53... 100% ▕████████████▏  11 KB               
pulling 491dfa501e59... 100% ▕████████████▏  801 B               
pulling ed11eda7790d... 100% ▕████████████▏   30 B               
pulling 42347cd80dc8... 100% ▕████████████▏  485 B               
verifying sha256 digest 
writing manifest 
success 


./bin/ollama run mistral              
>>> generate an advanced rust function
 Here is an example of a more advanced Rust function, which uses 
recursion to find the Fibonacci sequence. This function takes an 
integer `n` as input and returns the `n`th number in the Fibonacci 
sequence.

fn fibonacci(n: u32) -> u32 {
    match n {
        0 | 1 => n, // base case: 0 and 1 are both Fibonacci numbers
        _ if n < 2 => 0, // the first number after 1 in the sequence 
is 0
        _ => fibonacci(n - 1) + fibonacci(n - 2), // recursive case: 
add the previous two numbers in the sequence
    }
}

This function uses pattern matching to handle the base cases (when n is either 0 or 1) and the recursive case, where it calls itself twice with decreasing arguments. The resulting values are then added together to compute the nth number in the Fibonacci sequence. This implementation follows the definition of the Fibonacci sequence as the sum of the previous two numbers, starting with 0 and 1.

You can call this function using:

let fib_5 = fibonacci(5); // fib_5q will be assigned the val
ue 3 (the fifth number in the Fibonacci sequence)

Codellama Model: #

Codellama is a model made for generating and discussing code, the model has been built on top of Llama2 by Meta. An LLM made to complete coding tasks and helping new developers.

We do not recommend using Code Llama or Code Llama - Python to perform general natural language tasks since neither of these models are designed to follow natural language instructions. Code Llama is specialized for code-specific tasks and isn’t appropriate as a foundation model for other tasks. - Meta

Link to model:
https://ollama.com/library/codellama
https://huggingface.co/meta-llama
https://ai.meta.com/blog/code-llama-large-language-model-coding/

Download and run:

./bin/ollama run codellama:7b      
pulling manifest 
pulling 3a43f93b78ec... 100% ▕██████████████████████████▏ 3.8 GB
pulling 8c17c2ebb0ea... 100% ▕██████████████████████████▏ 7.0 KB
pulling 590d74a5569b... 100% ▕██████████████████████████▏ 4.8 KB
pulling 2e0493f67d0c... 100% ▕██████████████████████████▏   59 B
pulling 7f6a57943a88... 100% ▕██████████████████████████▏  120 B
pulling 316526ac7323... 100% ▕██████████████████████████▏  529 B
verifying sha256 digest 
writing manifest 
success

>>> generate an advanced rust function

Here's a more advanced Rust function that demonstrates some of the language's 
features:

fn process_numbers(numbers: Vec<i32>) -> Result<Vec<i32>, &str> {
    // First, we'll filter out any negative numbers from the input vector
    let filtered = numbers.into_iter().filter(|&x| x >= 0).collect::<Vec<i32>>();

    // Then, we'll map each number to its square root and collect the results into a 
new vector
    let squared = filtered.into_iter().map(|&x| x * x).collect::<Vec<i32>>();

    // Finally, we'll return the result as a tuple of the filtered and squared 
vectors
    Ok((filtered, squared))
}

This function takes in a vector of integers numbers and returns a tuple of two vectors: the first containing only positive numbers, and the second containing the square roots of each number. The function uses several advanced Rust features, including:

Pattern matching: The filtered variable is created by using pattern matching to filter out any negative numbers from the input vector.
Collecting into a new vector: The squared variable is created by collecting the results of the map function into a new vector.
Returning a tuple: The function returns a tuple of the two vectors as its result.

To use this function, you can pass in a vector of integers as the argument, like so:

let input = vec![1, 2, 3, 4, 5];
let (filtered, squared) = process_numbers(input).unwrap();

The unwrap() method is used to extract the result from the Result type, which is returned by the function.

Final verdict: #

We got a wide variarty of different results:

Llama 3 (1b and 8b models):
- 1b generated a basic function sum_and_product which demonstrates Rust basics like returning multiple values as a tuple.
- 8b provided a more complex implementation of a Trie data structure. The code included struct definitions, methods for insertion and lookup, and demonstrated recursive logic and error handling.
Deepseek Coder V2:
- Showcased a generic function for calculating factorials with error handling using traits and higher-order functions. The implementation was designed to support multiple numeric types like i32 and u64.
Stable Code:
- Presented a function that divided a vector of integers into batches using the Rayon crate for parallel processing. The example highlighted the use of parallel execution in Rust.
CodeGemma:
- Implemented a simple turn-based game using a TurnState struct, which included player management, dice roll simulation, and winner detection. The code demonstrated struct-based logic, random number generation, and conditional checks.
CodeNinja:
- Created a function that calculated a product or difference based on a condition. The example was relatively straightforward, emphasizing simple arithmetic and branching using a match expression.
Starcoder (7b and 15b):
- The 7b version provided a minimal and incomplete Rust code snippet with only a placeholder. The 15b version outputted debugging tests and code that seemed incoherent, suggesting significant issues in understanding or formatting the task prompt.
Mistral:
- Delivered a recursive Fibonacci function. The implementation illustrated the use of pattern matching and recursive calls to generate Fibonacci numbers, with basic error-checking.
CodeLlama:
- Generated an incomplete function that aimed to process a list of numbers, filtering out negatives and squaring the results. It demonstrated the use of iterators and transformations but was left unfinished.

Summary Insight: #

Models like Deepseek Coder V2 and Llama 3 8b excelled in handling advanced programming concepts like generics, higher-order functions, and data structures.
Some models struggled to follow through or provided incomplete code (e.g., Starcoder, CodeLlama).
Others demonstrated simple but clear examples of advanced Rust usage, like Mistral with its recursive approach or Stable Code with parallel processing.

Stay tuned for the next blog post at blog.rust.careers, where we will use the Ollama api.

Oct 25, 2024

https://blog.rust.careers/post/which_llm_is_best_at_rust/ Rust Career