Fork me on GitHub

Benchee

We can’t just guess about which functions are fast and which are slow - we need actual measurements when we’re curious. That’s where benchmarking comes in. In this lesson, we’ll learn about how easy it is to measure the speed of our code.

Table of Contents

About Benchee

While there is a function in Erlang that can be used for very basic measurement of a function’s execution time, it’s not as nice to use as some of the available tools and it doesn’t give you multiple measurments to get good statistics from, so we’re going to use Benchee. Benchee provides us with a range of statistics with easy comparisons between scenarios, a great feature that allows us to test different inputs to the functions we’re benchmarking, and several different formatters that we can use to display our results, as well as the ability to write your own formatter if desired.

Usage

To add Benchee to your project, add it as a dependency to your mix.exs file:

defp deps do
  [{:benchee, "~> 0.9", only: :dev}]
end

Then we call:

$ mix deps.get
...
$ mix compile

The first command will download and install Benchee. You may be asked to install Hex along with it. The second compiles the Benchee application. Now we’re ready to write our first benchmark!

An important note before we begin: When benchmarking, it is very important to not use iex since that behaves differently and is often much slower than how your code is most likely being used in production. So, let’s create a file that we’re calling benchmark.exs, and in that file we’ll add the following code:

list = Enum.to_list(1..10_000)
map_fun = fn(i) -> [i, i * i] end

Benchee.run(%{
  "flat_map"    => fn -> Enum.flat_map(list, map_fun) end,
  "map.flatten" => fn -> list |> Enum.map(map_fun) |> List.flatten end
})

Now to run our benchmark, we call:

$ mix run benchmark.exs

And we should see something like the following output in your console:

Operating System: macOS
CPU Information: Intel(R) Core(TM) i5-4260U CPU @ 1.40GHz
Number of Available Cores: 4
Available memory: 8.589934592 GB
Elixir 1.5.1
Erlang 20.0
Benchmark suite executing with the following configuration:
warmup: 2.00 s
time: 5.00 s
parallel: 1
inputs: none specified
Estimated total run time: 14.00 s


Benchmarking flat_map...
Benchmarking map.flatten...

Name                  ips        average  deviation         median
flat_map           1.03 K        0.97 ms    ±33.00%        0.85 ms
map.flatten        0.56 K        1.80 ms    ±31.26%        1.60 ms

Comparison:
flat_map           1.03 K
map.flatten        0.56 K - 1.85x slower

Of course your system information and results may be different depending on the specifications of the machine you are running your benchmarks on, but this general information should all be there.

At first glance, the Comparison section shows us that our map.flatten version is 1.85x slower than flat_map - very helpful to know! But let’s look at the other statistics that we get:

There are also other available statistics, but these four are frequently the most helpful and commonly used for benchmarking, which is why they are displayed in the default formatter. To learn more about the other available metrics, check out the documentation on hexdocs.

Configuration

One of the best parts of Benchee is all the available configuration options. We’ll go over the basics first since they don’t require code examples, and then we’ll show how to use one of the best features of Benchee - inputs.

Basics

Benchee takes a wealth of configuration options. In the most common Benchee.run/2 interface, these are passed as the second argument in the form of an optional keyword list:

Benchee.run(%{"example function" => fn -> "hi!" end}, [
  warmup: 4,
  time: 10,
  inputs: nil,
  parallel: 1,
  formatters: [&Benchee.Formatters.Console.output/1],
  print: [ 
    benchmarking: true,
    configuration: true,
    fast_warning: true
  ],
  console: [
    comparison: true,
    unit_scaling: :best
  ]
])

The available options are the following (also documented in hexdocs).

Inputs

It’s very important to benchmark your functions with data that reflects what that function might actually operate on in the real world. Frequently a function can behave very differently on small sets of data versus large sets of data! This is where Benchee’s inputs configuration option comes in. This allows you to test the same function but with as many different inputs as you like, and then you can see the results of the benchmark with each of those functions.

So, let’s look at our original example again:

list = Enum.to_list(1..10_000)
map_fun = fn(i) -> [i, i * i] end

Benchee.run(%{
  "flat_map"    => fn -> Enum.flat_map(list, map_fun) end,
  "map.flatten" => fn -> list |> Enum.map(map_fun) |> List.flatten end
})

In that example we’re only using a single list of the integers from 1 to 10,000. Let’s update that to use a couple different inputs so we can see what happens with smaller and larger lists. So, open that file, and we’re going to change it to look like this:

map_fun = fn(i) -> [i, i * i] end
inputs = %{
  "small list" => Enum.to_list(1..100),
  "medium list" => Enum.to_list(1..10_000),
  "large list" => Enum.to_list(1..1_000_000)
}

Benchee.run(%{
  "flat_map"    => fn(list) -> Enum.flat_map(list, map_fun) end,
  "map.flatten" => fn(list) -> list |> Enum.map(map_fun) |> List.flatten end
}, inputs: inputs)

You’ll notice two differences. First, we now have an inputs map that contains the information for our inputs to our functions. We’re passing that inputs map as a configuration option to Benchee.run/2.

And since our functions need to take an argument now, we need to update our benchmark functions to accept an argument, so instead of:

fn -> Enum.flat_map(list, map_fun) end

we now have:

fn(list) -> Enum.flat_map(list, map_fun) end

Let’s run this again using:

$ mix run benchmark.exs

Now you should see output in your console like this:

Operating System: macOS
CPU Information: Intel(R) Core(TM) i5-4260U CPU @ 1.40GHz
Number of Available Cores: 4
Available memory: 8.589934592 GB
Elixir 1.5.1
Erlang 20.0
Benchmark suite executing with the following configuration:
warmup: 2.00 s
time: 5.00 s
parallel: 1
inputs: large list, medium list, small list
Estimated total run time: 2.10 min

Benchmarking with input large list:
Benchmarking flat_map...
Benchmarking map.flatten...

Benchmarking with input medium list:
Benchmarking flat_map...
Benchmarking map.flatten...

Benchmarking with input small list:
Benchmarking flat_map...
Benchmarking map.flatten...


##### With input large list #####
Name                  ips        average  deviation         median
flat_map             6.29      158.93 ms    ±19.87%      160.19 ms
map.flatten          4.80      208.20 ms    ±23.89%      200.11 ms

Comparison:
flat_map             6.29
map.flatten          4.80 - 1.31x slower

##### With input medium list #####
Name                  ips        average  deviation         median
flat_map           1.34 K        0.75 ms    ±28.14%        0.65 ms
map.flatten        0.87 K        1.15 ms    ±57.91%        1.04 ms

Comparison:
flat_map           1.34 K
map.flatten        0.87 K - 1.55x slower

##### With input small list #####
Name                  ips        average  deviation         median
flat_map         122.71 K        8.15 μs   ±378.78%        7.00 μs
map.flatten       86.39 K       11.58 μs   ±680.56%       10.00 μs

Comparison:
flat_map         122.71 K
map.flatten       86.39 K - 1.42x slower

We can now see information for our benchmarks, grouped by input. This simple example doesn’t provide any mind blowing insights, but you’d be very surprised how much performance varies based on input size!

Formatters

The console output that we’ve seen is a helpful beginning for measuring the runtimes of your functions, but it’s not your only option! In this section we’ll look briefly at the three other available formatters, and also touch on what you’d need to do to write your own formatter if you like.

Other formatters

Benchee has a console formatter built in, which is what we’ve seen already, but there are three other officially supported formatters - benchee_csv, benchee_json and benchee_html. Each of them does exactly what you would expect, which is writing the results to the named file formats so you can work with your results further in whichever format you like.

Each of these formatters is a separate package, so to use them you need to add them as dependencies to your mix.exs file like so:

defp deps do
  [
    {:benchee_csv,  "~> 0.6", only: :dev},
    {:benchee_json, "~> 0.3", only: :dev},
    {:benchee_html, "~> 0.3", only: :dev},
  ]
end

While benchee_json and benchee_csv are very simple, benchee_html is actually very full featured! It can help you produce nice graphs and charts from your results easily, and you can even export them as PNG images. All three formatters are well-documented on their respective GitHub pages, so we won’t cover the details of them here.

Custom formatters

If the four offered formatters aren’t enough for you, you can also write your own formatter. Writing a formatter is pretty easy. You need to write a function that accepts a %Benchee.Suite{} struct, and from that you can pull whatever information you like. Information about what exactly is in this struct can be found on GitHub or HexDocs. The codebase is very well-documented and easy to read if you’d like to see what sorts of information could be available for writing custom formatters.

For now, I’ll show a quick example of what a custom formatter might look like below as an example of how easy it is. Let’s say we just want an extremely minimal formatter that just prints the average run time for each scenario - this is what that might look like:

defmodule Custom.Formatter do
  def output(suite) do
    suite
    |> format
    |> IO.write

    suite
  end

  defp format(suite) do
    Enum.map_join(suite.scenarios, "\n", fn(scenario) ->
      "Average for #{scenario.job_name}: #{scenario.run_time_statistics.average}"
    end)
  end
end

And then we could run our benchmark like this:

list = Enum.to_list(1..10_000)
map_fun = fn(i) -> [i, i * i] end

Benchee.run(%{
  "flat_map"    => fn -> Enum.flat_map(list, map_fun) end,
  "map.flatten" => fn -> list |> Enum.map(map_fun) |> List.flatten end
}, formatters: [&Custom.Formatter.output/1])

And when we run now with our custom formatter, we would see:

Operating System: macOS
CPU Information: Intel(R) Core(TM) i5-4260U CPU @ 1.40GHz
Number of Available Cores: 4
Available memory: 8.589934592 GB
Elixir 1.5.1
Erlang 20.0
Benchmark suite executing with the following configuration:
warmup: 2.00 s
time: 5.00 s
parallel: 1
inputs: none specified
Estimated total run time: 14.00 s


Benchmarking flat_map...
Benchmarking map.flatten...
Average for flat_map: 851.8840109326956
Average for map.flatten: 1659.3854339873628


Share This Page