My Rust Journey - 9 Dec 2024

This is the 9th post about my journey learning the Rust programming language using the Rust Book. Previous posts include:

Chapter 1: Basics of Rust and Cargo

Chapter 3: Mutability and shadowing, variables and constants, scalar and compound data types, functions, control flow with conditional statements and loops

Chapter 4: Ownership, reference and borrowing, and slice type

Chapter 5: Ownership, reference and borrowing, and slice type

Chapter 6: Enums, Control Flow and Matching

Chapter 7: Packages, Crates and Modules

Chapter 8: Common collections

Chapter 9: Error handling

I am documenting this as I think it is a useful thing to do for people interested in learning Rust from my non-developer perspective.

At this stage, you have already installed Rust on your machine, and you are ready to write and run your first Rust programs.

I am using VS code with the rust-analyzer extension. I am working on an M1-mac.

The following tutorial will cover Chapter 10 of the Rust Book. It is meant to be a summary and used with the book as a complementary source of information.

Generic data types

The code below shows two functions to find the larger number or character in a vector.

fn main() {
    let number_list = vec![34, 50, 25, 100, 65];
    
    // let mut largest = &number_list[0];

    // for number in &number_list {
    //     if number > largest {
    //         largest = number;
    //     }
    // }

    let result = largest_i32(&number_list);

    println!("The largest number is {result}.");

    let char_list = vec!['y', 'm', 'a', 'q'];

    let result = largest_char(&char_list);

    println!("The largest number is {result}.");

}

fn largest_i32 (list: &[i32]) -> &i32 {
    let mut largest = &list[0];

    for item in list {
        if item > largest {
            largest = item;
        }
    }

    largest

}

fn largest_char (list: &[char]) -> &char {
    let mut largest = &list[0];

    for item in list {
        if item > largest {
            largest = item;
        }
    }

    largest

}

The two functions differ only in their names and types in their signature.

We can parametrize types in one single function using the T identifier.

fn largest<T: std::cmp::PartialOrd>(list: &[T]) -> &T {
    let mut largest = &list[0];

    for item in list {
        if item > largest {
            largest = item;
        }
    }

    largest

}

std::cmp::PartialOrd is a trait that enables comparison and it allows us to specify a type whose values can be ordered. We cannot use all possible types of T.

Struct and Enum Definitions

The code below uses generics to define a struct.

struct Point<T> {
    x: T,
    y: T,
}

The code will compile as long as we use the same type for both x and y fields. If we want fields of different types, we need to implement a second generic type.

struct Point<T, U> {
    x: T,
    y: U,
}

The enums Option<T> and Error<T, E> are two examples of generics. The first one implements one generic type, while the other one implements two generic types.

In method definitions

We can implement a method for the previously defined struct.

impl<T> Point<T> {
    fn x(&self) -> &T {
        &self.x
    }
}

This method named x on the struct Point<T> will return a reference to the x field of type T. We have to declare T after the impl to specify that we are implementing methods on the type Point<T>. We can also choose to implement methods on concrete types. If we specify impl Point<f32> we tell Rust to use the method only on the Point<f32> type.

Generic types can defer between struct definition and method implementation.

struct Point2<X1, Y1> {
    x: X1,
    y: Y1,
}

impl<X1, Y1> Point2<X1, Y1> {
    fn mixup<X2, Y2> (self, other: Point2<X2, Y2>) -> Point2<X1, Y2> {
        Point2 {
            x: self.x,
            y: other.y,
        }
    }
}

See how the generic parameters declared with impl differ from those in the method definition. The generic parameters X1 and Y1 are declared after impl because they follow the struct definition. The generics X2 and Y2 are declared after the mixup function because they are relevant to the method only.

Code performance with generics

Writing code using generics will not make the program slower than when using concrete types. Rust achieves this via monomorphization: the process of turning generic types into concrete types that are used when compiled.

Traits

A trait defines the functionality of a particular type that can be shared with other types.

Below is an example fo implementing a trait that consists of the behavior provided by the summarize method.

pub trait Summary {
    fn summarize(&self) -> String;
}

Note how after the method signature we end with a ; instead of providing an implementation. Each type implementing the trait must provide its own behavior.

Implementing a Trait on a Type

We can implement the Summary trait on the NewsArticle and Tweet structs below.

pub struct NewsArticle {
    pub headline: String,
    pub location: String,
    pub author: String,
    pub content: String,
}

impl Summary on NewsArticle {
    fn summarize(&self) -> String {
        format!(
            "{}, by {} ({})",
            self.headline,
            self.author,
            self.location
        )
    }
}

pub struct Tweet {
    pub username: String,
    pub content: String,
    pub reply: bool,
    pub retweet: bool,
}

impl Summary for Tweet {
    fn summarize(&self) -> String {
        format!("{}: {}", self.username, self.content)
    }
}

The implementation uses the struct fields headline, author, and location to create a return value for summarize. The trait is pub so that crates depending on this one can also use it.

We can save the Summary trait and the implementations in a library crate called aggregator that can display a summary of data that might be stored in a Tweet instance for example. We can then add to the cargo.toml the dependency on the aggregator crate by specifying the correct path.

In the binary crate, before the main function we can use the aggregator crate as follows:

Use aggregator::{Summary, Tweet}

Then we can call the summarize method on a Tweet instance as follows:

let tweet = Tweet {
        username: String::from("horse_ebooks"),
        content: String::from(
            "of course, as you probably already know, people",
        ),
        reply: false,
        retweet: false,
    };

println!("1 new tweet: {}", tweet.summarize());

Default Implementations

It can sometimes be useful to have a default behavior for some or all methods in a trait instead of writing multiple implementations.

A default trait can be specified as follows:

pub trait Summary {
    fn summarize(&self) -> String {
        String::from("(Read more...)")
    }
}

To use the default implementation, we just need to set impl Summary for NewsArticle {}. We do not need to define the summarize method in NewsArticle. We can also define a trait to have a method whose implementation is required, and another method that has a default implementation.

pub trait Summary {
    fn summarize_author(&self) -> String;

    fn summarize(&self) -> String {
        format!(
            "(Read more from {}...)",
            self.summarize_author()
        )
    }
}

The summarize_author method requires implementation while the summarize method has a default implementation that uses summarize_author. To use this version of the Summary trait with need to define summarize_author:

impl Summary for Tweet {
    fn summarize_author(&self) -> String {
        format!("@{}", self.username)
    }
}

We can call summarize on instances of the Tweet struct. The default implementation will call the definition for summarize_author that we provided.

Traits as parameters

We can use traits within functions, as parameters.

pub fn notify(item: &impl Summary) {
    println!("Breaking news! {}", item.summarize())
}

The notify function has the Summary trait as a parameter, meaning that it accepts any type that implements the trait. In the function body, we can call any methods on item that come from the Summary trait.

The code above is a shortcut for what is called trait bound.

pub fn notify<T: Summary>(item: &T) {}

That is particularly useful when we have more complex situations, such as two parameters:

pub fn notify<T: Summary>(item1: &T, item2: &T) {}

We can also specify multiple trait bounds using the + syntax as follows:

pub fn notify(item: &(impl Summay + Display)) {}

And also on generic types:

pub fn notify<T: Summary + Display>(item: &T) {}

But each generic has its own trait bounds. It follows that functions with multiple generics can be difficult to understand using the syntax above. Below an example using a where clause after the function signature.

fn some_function<T, U>(t: &T, u: &U) -> i32
where
	T: Display + Clone,
	U: Clone + Debug,
{}

It is also possible to use the trait implementation in the return position of a function to return a value of some type that implements a trait.

fn returns_summarizable() -> impl Summary + std::fmt::Debug {
    Tweet {
        username: String::from("horse_ebooks"),
        content: String::from(
            "of course, as you probably already know, people"
        ),
        reply: false,
        retweet: false,
    }
}

Note that we need to make sure that the returned type implements debug by making sure we specify #[derive(Debug)] on the Tweet struct. Note also how we need to add the Debug trait if we want use use println! to print the result.

Specifying the trait but not the type that implements it is useful in more complex situations. However, the impl Trait works only if we return one single type.

Trait bounds can be used to conditionally implement methods.

#[derive(Debug)]
pub struct Pair<T> {
    pub x: T,
    pub y: T,
}

impl<T> Pair<T> {
    pub fn new(x: T, y: T) -> Self {
        Self {x, y}
    }
}

impl<T: core::fmt::Display + PartialOrd> Pair<T> {
    pub fn cmp_display(&self) {
        if self.x >= self.y {
            println!("The largest number is {}.", self.x);
        } else {
            println!("The largest number is {}.", self.y)
        }
    }
}

The code above always implements the new function to return a new instance of Pair<T>. The second impl block only implements the cmp_display method if the inner type T implements Display and PartialOrd traits.

Lifetimes

Lifetimes ensure that reference are valid as long as we need them to be. Every reference is Rust has a lifetime, that is the scope in which a reference is valid. In Rust, we must annotate lifetimes when the lifetimes of references could be related in few different ways.

Lifetimes Annotations

The name of a lifetime parameter must start with ' and are usually lower case and very short like generic types. Below an example of how to annotate a lifetime parameter.

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

We need to declare generic lifetime parameters inside <>, and we want the signature to express that the returned value will be valid as long as both the parameters x and y are valid.

Without lifetimes the function above does not know if the return value is borrowed from x or y, and we cannot know which arm of the if statement will be executed.

The generic lifetime ‘a will get the concrete lifetime that is equal to the smaller lifetimes of x and y. The lifetime of the reference in the result must be smaller than the lifetime of the two arguments.

If one of the arguments in a function has no relationship with the return value or other arguments, there is no need to annotate the lifetime for that specific parameter.

When returning a reference from a function, the lifetime parameter of the return type must match the lifetime parameter of one of the parameters. If the reference returned does not refer to one of the parameters, it must refer to a value created within the function. This would be a dangling reference because the value will go out of scope at the end of the function.

Lifetime Elision Rules

Elision rules are a set of particular cases that the compiler will consider. If your code fits into these cases, you won’t need to specify lifetimes.

Lifetimes on function or method parameters are called input lifetimes, while those on return values are called output lifetimes. The compiler uses three rules:

The compiler assigns a lifetime parameter to each parameter that is a reference
If there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters
If there are multiple input lifetime parameters, but one of them is &self or &mut self because this is a method, the lifetime of self is assigned to all output lifetime parameters

In this chapter, we learned about generics, traits, and lifetimes in Rust. See you in the next post!