Fork me on GitHub


Strings, Char Lists, Graphemes and Codepoints.

Table of Contents


Elixir strings are nothing but a sequence of bytes. Let’s look at an example:

iex> string = <<104,101,108,108,111>>

NOTE: Using « » syntax we are saying to the compiler that the elements inside those symbols are bytes.

Char Lists

Internally, Elixir strings are represented with a sequence of bytes rather than an array of characters. Elixir also has a char list type (character list). Elixir strings are enclosed with double quotes, while char lists are enclosed with single quotes.

What’s the difference? Each value from a char list is the ASCII value from the character. Let’s dig in:

iex> char_list = 'hello'

iex> [hd|tl] = char_list

iex> {hd, tl}
{104, 'ello'}

iex> Enum.reduce(char_list, "", fn char, acc -> acc <> to_string(char) <> "," end)

When programming in Elixir, we usually use Strings, not char lists. The char lists support is mainly included because it is required for some Erlang modules.

Graphemes and Codepoints

Codepoints are just simple Unicode characters which are represented by one or more bytes, depending on the UTF-8 encoding. Characters outside of the US ASCII character set will always encode as more than one byte. For example, Latin characters with a tilde or accents (á, ñ, è) are typically encoded as two bytes. Characters from Asian languages are often encoded as three or four bytes. Graphemes consist of multiple codepoints that are rendered as a single character.

The String module already provides two functions to obtain them, graphemes/1 and codepoints/1. Let’s look at an example:

iex> string = "\u0061\u0301"

iex> String.codepoints string
["a", "́"]

iex> String.graphemes string

String Functions

Let’s review some of the most important and useful functions of the String module. This lesson will only cover a subset of the available functions. To see a complete set of functions visit the official String docs.


Returns the number of Graphemes in the string.

iex> String.length "Hello"


Returns a new string replacing a current pattern in the string with some new replacement string.

iex> String.replace("Hello", "e", "a")


Returns a new string repeated n times.

iex> String.duplicate("Oh my ", 3)
"Oh my Oh my Oh my "


Returns a list of strings split by a pattern.

iex> String.split("Hello World", " ")
["Hello", "World"]


Let’s walk through a simple exercises to demonstrate we are ready to go with Strings!


A and B are considered anagrams if there’s a way to rearrange A or B making them equal. For example:

If we re-arrange the characters on String A, we can get the string B, and vice versa.

So, how could we check if two strings are Anagrams in Elixir? The easiest solution is to just sort the graphemes of each string alphabetically and then check if they both lists are equal. Let’s try that:

defmodule Anagram do
  def anagrams?(a, b) when is_binary(a) and is_binary(b) do
    sort_string(a) == sort_string(b)

  def sort_string(string) do
    |> String.downcase
    |> String.graphemes
    |> Enum.sort

Let’s first give a watch to anagrams?/2. We are checking whether the parameters we are receiving are binaries or not. That’s the way we check if a parameter is a String in Elixir.

After it, we are just calling a function that orders the strings in alphabetical order, first doing the string lowercase and then using String.graphemes, which returns a list with the Graphemes of the string. Pretty straight, right?

Let’s check the output on iex:

iex> Anagram.anagrams?("Hello", "ohell")

iex> Anagram.anagrams?("María", "íMara")

iex> Anagram.anagrams?(3, 5)
** (FunctionClauseError) no function clause matching in Anagram.anagrams?/2
    iex:2: Anagram.anagrams?(3, 5)

As you can see, the last call to anagrams? caused a FunctionClauseError. This error is telling us that there is no function in our module that meets the pattern of receiving two non-binary arguments, and that’s exactly what we want, to just receive two strings, and nothing more.

Share This Page