Strings in Elixir
Strings were one of the first things that I found confusing about elixir. I’ve worked professionally in a bunch of different languages: ruby, javascript, swift, python, scala, clojure, java. And for the most part strings work the same in each of them. I mistakenly assumed that strings are the same in Elixir - actually they’re not.
First, there is no String
type in elixir. Stings don’t get their own type, but are represented using other builtin elixir/erlang types.
There are 2 different string representations in Elixir
- Binary
- Character lists
These two string representations in Elixir are quite different. You need to be cognisant of the string representation that you are using as this affects the operations that you can perform on the string and how you process it.
Strings as binaries
If you create a string using "
the string is represented as a UTF-8 encoded binary. Most of the common operations you’ll want to do on strings are contained in the in the String
module operate on the binary string representation.
This is generally the string representation you want to use.
Lets create a string and check its a binary
> s = "abc"
"abc"
> is_binary(s)
true
We can call any of the functions from the String
module on this binary
> String.capitalize(s)
"Abc"
> String.reverse(s)
"cba"
> String.split(s, "b")
["a", "c"]
We can’t use hd
to get the first element of the string
> hd s
** (ArgumentError) errors were found at the given arguments:
* 1st argument: not a nonempty list
:erlang.hd("abc")
because this isn’t a list - its a binary.
> i s
Term
"abc"
Data type
BitString
Byte size
3
Description
This is a string: a UTF-8 encoded binary. It's printed surrounded by
"double quotes" because all UTF-8 encoded code points in it are printable.
Raw representation
<<97, 98, 99>>
Reference modules
String, :binary
Implemented protocols
Collectable, IEx.Info, Inspect, List.Chars, String.Chars
We can see here that the raw representation is <<97, 98, 99>>
- i.e. its a binary with code points 97, 98, 99.
Since hd
doesn’t work we can get the first element of the string using
> String.first(s)
"a"
We can get the integer representation of a character using
> ?a
97
> ?b
98
> ?c
99
We can check the code points in the string
> String.codepoints(s)
["a", "b", "c"]
And we can get a list of the integer codes of each character using
> String.to_charlist(s)
'abc'
Note here that we get back a single-quoted string. Although this looks like a string it’s actually a character list.
We can call hd
on it
> String.to_charlist(s) |> hd
97
And we can’t use it with a function that expects a binary string
> String.to_charlist(s) |> String.first
** (FunctionClauseError) no function clause matching in String.first/1
The following arguments were given to String.first/1:
# 1
'abc'
Attempted function clauses (showing 1 out of 1):
def first(string) when is_binary(string)
(elixir 1.12.3) lib/string.ex:1876: String.first/1
Strings as character lists
Strings can also be represented as lists of characters. This is where things can get confusing if you’re not expecting it. If you create a string with '
you’ll get a character list. This is a list of the individual character codes.
> l = 'abc'
'abc'
iex(41)> hd l
97
iex(42)> l
'abc'
iex(43)> i l
Term
'abc'
Data type
List
Description
This is a list of integers that is printed as a sequence of characters
delimited by single quotes because all the integers in it represent printable
ASCII characters. Conventionally, a list of Unicode code points is known as a
charlist and a list of ASCII characters is a subset of it.
Raw representation
[97, 98, 99]
Reference modules
List
Implemented protocols
Collectable, Enumerable, IEx.Info, Inspect, List.Chars, String.Chars
So even though we see 'abc'
in iex the underlying representation is list of character codes [97, 98, 99]
. What’s happening is that when iex sees a list of integers, where each integer is a code for a printable character, then it prints the characters.
If we were to add a non-printable character code to the list we would see the underlying integers.
> [123456 | l ]
[123456, 97, 98, 99]
So what if you’re actually working with a list of Integers?
Well elixir will always treat a list of Integers as a list of Integers. But iex
may print it as a string, if all the integers are printable. This can be annoying.
You can disable this behaviour with
> IEx.configure(inspect: [charlists: :as_lists])
:ok
iex(51)> 'abc'
[97, 98, 99]
(You can add this to ~/.iex.exs
if you always want to treat lists this way)
Converting between binary strings and charlists
As we saw above you’ll get an error if you try to pass a character list on a function that expects a binary string representation and vice-versa.
These are the 2 functions you need to convert between the two representations.
> List.to_string([97, 98, 99])
"abc"
> String.to_charlist("abc")
'abc' # or [97, 98, 99] depending on your iex config
Pattern match on strings
And finally its worth mentioning pattern matching on binary strings. You can use <>
to pattern against a binary string.
"ab" <> final_char = "abc"