
Create handy command-line utilities in Rust.
This article is about text processing in Rust, but it also contains a quick introduction to pattern matching, which can be very handy when working with text.
Strings are a huge subject in Rust, which can be easily realized by the fact that Rust has two data types for representing strings as well as support for macros for formatting strings. However, all of this also proves how powerful Rust is in string and text processing.
Apart from covering some theoretical topics, this article shows how to develop some handy yet easy-to-implement command-line utilities that let you work with plain-text files. If you have the time, it'd be great to experiment with the Rust code presented here, and maybe develop your own utilities.
Rust and Text
Rust supports two data types for working with strings: String
and str
.
The String
type is for working with mutable strings that
belong to you, and it has length and a capacity property. On the other
hand, the str
type is for working with immutable strings that you want
to pass around. You most likely will see an str
variable be used as
&str
. Put simply, an str
variable is accessed as a reference to some
UTF-8 data. An str
variable is usually called a "string slice" or, even
simpler, a "slice". Due to its nature, you can't add and remove any
data from an existing str
variable. Moreover, if you try to call the
capacity()
function on an &str
variable, you'll get an error message
similar to the following:
error[E0599]: no method named `capacity` found for type
↪`&str` in the current scope
Generally speaking, you'll want to use an str
when you want to pass a string
as a function parameter or when you want to have a read-only version
of a string, and then use a String
variable when you want to have a mutable
string that you want to own.
The good thing is that a function that accepts &str
parameters can
also accept String
parameters. (You'll see such an example in the
basicOps.rs
program presented later in this article.)
Additionally, Rust supports the char
type, which is for representing
single Unicode characters, as well as string literals, which are
strings that begin and end with double quotes.
Finally, Rust supports what is called a byte
string. You can define a new
byte
string as follows: