Text Processing in Rust

Rust Programming Language Logo

Create handy command-line utilities in Rust.

This article is about text processing in Rust, but it also contains a quick introduction to pattern matching, which can be very handy when working with text.

Strings are a huge subject in Rust, which can be easily realized by the fact that Rust has two data types for representing strings as well as support for macros for formatting strings. However, all of this also proves how powerful Rust is in string and text processing.

Apart from covering some theoretical topics, this article shows how to develop some handy yet easy-to-implement command-line utilities that let you work with plain-text files. If you have the time, it'd be great to experiment with the Rust code presented here, and maybe develop your own utilities.

Rust and Text

Rust supports two data types for working with strings: String and str. The String type is for working with mutable strings that belong to you, and it has length and a capacity property. On the other hand, the str type is for working with immutable strings that you want to pass around. You most likely will see an str variable be used as &str. Put simply, an str variable is accessed as a reference to some UTF-8 data. An str variable is usually called a "string slice" or, even simpler, a "slice". Due to its nature, you can't add and remove any data from an existing str variable. Moreover, if you try to call the capacity() function on an &str variable, you'll get an error message similar to the following:


error[E0599]: no method named `capacity` found for type
 ↪`&str` in the current scope

Generally speaking, you'll want to use an str when you want to pass a string as a function parameter or when you want to have a read-only version of a string, and then use a String variable when you want to have a mutable string that you want to own.

The good thing is that a function that accepts &str parameters can also accept String parameters. (You'll see such an example in the basicOps.rs program presented later in this article.) Additionally, Rust supports the char type, which is for representing single Unicode characters, as well as string literals, which are strings that begin and end with double quotes.

Finally, Rust supports what is called a byte string. You can define a new byte string as follows: