A last year I created an Erlang cheatsheet. I goal was to keep it fairly simple and only include the things that I often forgot. I decided to limit it to a single page so I could print it out and pin it on the wall for reference. I’ve been doing a lot of Elixir recently, but the cheatsheet has still be useful to me and my coworkers. There are a lot of things I often have to lookup with working with supervisors and other OTP behaviors so this month I decided it would be nice to have a seperate cheatsheet for OTP. This new cheatsheet follows in the same vein as my Erlang cheatsheet. It is intended for those experienced with OTP and not beginners.

I’d really like to hear your thoughts on this new OTP cheatsheet. Is there something it is missing? Is there something that can be simplified or removed?

Let me know via email or twitter. If you want to contribute directly feel free to create an issue or pull request on the GitHub repository.

In my last post I wrote about various Fibonacci implementations in Elixir. I timed each implementation generating a list of the first N Fibonacci numbers, and compared the performance characteristics of each implementation. Based on what unwind said on Lobsters in response I decided to revisit Fibonacci implementations in Elixir. I’m going to update the implementatios I used in my last blog post so they use Erlang processes to store previously computed Fibonacci numbers. I’ll then benchmark each implementation against the Fibonacci implementation to see how much processes sped things up.

Rewriting the Algorithms

I needed to have a process for each algorithm to store the computed Fibonacci numbers. I chose to create a single generic GenServer that I could spawn for each algorithm I was testing. Below is the implementation I settled on. It’s fairly straightforward. The only thing special about this GenServer code is that it allows the caller to specify the server they want to use. This is necessary because I will have multiple processes running different instances of this GenServer. One for each implementation I am testing. I need to have a different FibStore server because each algorithm is different so I need to make sure I store and fetch Fibonacci numbers from the correct server.

Next I needed to update the existing functions to use this new GenServer for fetching and storing values. The previous implementations had a fib function that performed the actual computation. To avoid modifying a lot of code I added a new function named do_fib that only calls the existing fib function to do the computation if the Fibonacci number has not already been computed and stored in the GenServer instance. Below are the three new implementations:

My Implementation

In my updated implementation so that the fib clause for generating any number after the first two in the Fibonacci sequence invokes FibStore.do_fib/3. This ensures it will reuse already an computed number if there is one. Otherwise FibStore.do_fib/3 will invoke fib/1 to compute and store the number.

In much the same way I changed the Rosetta Code algorithm. It invokes FibStore.do_fib/3 when computing Fibonacci numbers so it uses pre-computed numbers if exist in the FibStore process. Otherwise it invokes fib/1 to compute the Fibonacci number. Note that unlike my implementation this one must recompute everything when generating a new number in the sequence. For example, when generating the fifth number it would not be able to reuse the 2 and the 3 cached in the FibStore process. I could not make it reuse pre-computed numbers without changing the way the fib/3 function works.

This implementation benefited the most from the FibStore caching of pre-computed numbers. It remains similar to the original implementation, but now it invokes FibStore.do_fib/3 when computing numbers. Due to the recursive nature of the fib/1 function, calls are made to FibStore.do_fib/3 for every number that must be computed. This implementation is able to use the pre-computed numbers as a starting point when computing new numbers. Unlike the Rosetta Code implementation which cannot.

Which function generates a list of Fibonacci numbers the fastest?

My bet was that my implementation would still perform better than the others, but I wasn’t sure which of the others would be the fastest. The Rosetta Code algorithm wasn’t able to leverage the FibStore caching as much as others; but then again it was already a fairly fast algorithm.

Benchmarking

I reused my benchmarking code from the first blog post. See the source file for the benchmarking code.

Since the new algorithms now maintain state their performance on the first run will differ from their performance on all subsequent runs. I decided I would run the benchmarks once to capture the performance on initial runs. Then I would run the benchmarks four more times to measure the performance on subsequent runs just as I did in my first blog post.

Results

Initial Run

I ran the benchmark once to capture the performance when no state. Time is in microseconds.

Number

Rosetta Code

Dave Thomas’

Mine

3

36

83

30

5

22

56

16

10

76

69

31

20

87

111

48

30

144

133

56

45

143

205

87

Four Subsequent Runs

After the first run I ran the benchmark again against each algorithm four times. The average run times are shown in the table below in microseconds.

Number

Rosetta Code

Dave Thomas’

Mine

3

13

22

5

5

17

15

3

10

21

25

5

20

55

40

4

30

64

65

3

45

78

94

4

For comparison here are the average run times of the original algorithms:

List Size

Rosetta Code

Dave Thomas’

Mine

3

4

543

1

5

2

3

0

10

8

11

0

20

5

880

4

30

8

97500

1

45

17

131900822

2

Conclusion

Looking at the data in these tables it’s clear the Dave Thomas algorithm benefited the most from the new process caching of computed numbers. This isn’t surprising due to the increasing number of recursive calls the original algorithm used when computing large numbers. It’s clear from the run times that the time complexity of the original algorithm was exponential. With the process caching in place the time complexity is no longer exponential. I didn’t take the time to figure it out, but I would guess the new Dave Thomas’ algorithm with process caching has linear time complexity.

The Rosetta Code algorithm doesn’t really benefit from the new process caching. The performance characteristics remain the same, and it runs nearly 5 times slower. My algorithm did not benefit from the process caching either. It ran about 4 times slower than the original algorithm. Even though processes are cheap on the Erlang VM, and local messages are fast it’s clear the overhead of caching the data in a separate process is taking a toll on these algorithms. Sending a message is cheap, but eventually the time it takes for the process to be scheduled after receiving a message adds up. When no data is cached these algorithms must send and receive at least 2 messages for every recursion needed to compute the final number. Even when the number has already been computed two messages are needed to fetch the number. The two original algorithms performed well by keep pre-computed numbers in process memory and reusing them when necessary, so this caching only reduces the arithmetic operations needed on subsequent calls. And even on subsequent calls the overhead of sending and receiving two messages is greater than the cost of computing the list of Fibonacci numbers all over again; at least for the first 45 Fibonacci numbers in the sequence.

When it comes to generating a list of the first N Fibonacci numbers my original algorithm still seems to be the fastest out of all the algorithms I’ve tested. My algorithm was designed from the ground up for specifically for generating a list of the first N Fibonacci numbers in the sequence so I think the take away here is that code written for a specific task may outperform code that is more general purpose.

I recently was asked to write a function in Elixir to generate a list Fibonacci numbers. I had forgotten the typical implementation of the algorithm for generating Fibonacci numbers but since my function was returning a list I knew I could just add the two previous numbers in the list to compute the next number. I quickly threw together an simple 8 line function and that took a number and returned a list with that many of first Fibonacci numbers in it. After writing it and testing it out I realized my function was somewhat different from the implementations I had seen. My code looked like this:

With my implementation we have a function called fibonacci_do/1 with three clauses, the first two are for the first and second Fibonacci numbers and the third generates all the rest of the numbers in the sequence by adding the previous two numbers in the list and returning a list with the new number added. The function actually generates a list with the numbers in reverse order, so I defined the fibonacci/1 function to reverse the list. This function can generate the first 100,000 Fibonacci numbers in less that a second. Not too bad I thought.

Common Fibonacci in Elixir

After doing this I went and looked at the other Elixir implementations. Here are two of them:

Now this variation is by far the most common. I’ve seen this implementation in several slide decks at Elixir conferences, and I’ve seen it used as example code many times, so I’m unsure of its origin. Dave Thomas presented this implementation at the first ElixirConf because it mirrored the mathematical formula for generating Fibonacci numbers. In this blog post I’ll call this the Dave Thomas implementation. If you were to naively use this function to generate a list of Fibonacci numbers you’d most likely do something just like the Enum.map/2 call in the Rosetta Code example above:

1

IO.inspectEnum.map(0..10,fni->fib(i)end)

Which function generates a list of Fibonacci numbers the fastest?

Looking at these three algorithms it’s clear they are have some similarities. All of them are recursive functions that start with the base cases for first Fibonacci numbers, 0 and 1. All of them take a single argument, the number in the Fibonacci sequence to generate.

But these algorithms have some key differences as well.

My function calls itself recursively and builds up a list of numbers in the sequence and returns the list directly.

The Rosetta Code function also recursively calls itself once until it has generated a number. Since it only generates a single number it must be executed multiple times to generate a list of numbers in the sequence.

The Dave Thomas algorithm differs from the two others in that it recursively calls itself not once but twice. Just like the Rosetta Code algorithm it also must be executed multiple times to generate a list of numbers in the sequence. At only 4 lines it’s by far the most succinct of the three.

Benchmarking

I opted to do simple benchmarking of these three algorithms. The benchmark script feeds the function a number N and expects the function to return a list containing the first N numbers of the first Fibonacci sequence. Only my algorithm returned a list directly, so for the other two algorithms I created a function that would map over the range and repeatedly call the fibonacci function:

I ran the benchmark against each algorithm four times. The average run times are shown in the table below in microseconds.

List Size

Rosetta Code

Dave Thomas

Mine

3

4

543

1

5

2

3

0

10

8

11

0

20

5

880

4

30

8

97500

1

45

17

131900822

2

I’m not sure why the Dave Thomas algorithm was so slow at computing a list of the first 3 Fibonacci numbers. My guess is that CPU core was busy with something else when that number was benchmarked and it skewed the results.

As you can see these three algorithms perform very differently as the length of the list they must generate grows. Up to around 10 items the Rosetta Code algorithm and Dave Thomas algorithm perform about the same. After 10 items the run time for the Dave Thomas algorithm quickly climbs, for a list of 30 it takes nearly 1/10 a second, and for a list of 45 it takes over two minutes. The Rosetta Code algorithm performs much better. Only taking around 17 microseconds to compute a list of the first 45 Fibonacci numbers. My algorithm appears to have similar performance characteristics to the Rosetta Code algorithm, but with times that averaged less than a quarter the run time of the of the Rosetta Code algorithm. Note that the timing code was rounding to the nearest microsecond, so some computations took less than half a microsecond for my algorithm.

I decided to do a little more benchmarking of my algorithm and the Rosetta code algorithm. Since both algorithms seemed pretty fast I tried using them to generate much larger lists of Fibonacci numbers. The results are shown in the table below. Again, time is in microseconds.

List Size

Rosetta Code

Mine

100

76

5

500

2892

36

1000

15058

78

5000

631680

870

10000

4152261

4767

100000

> 5 minutes (never returned)

1195938

Clearly my algorithm performs better as it doesn’t have to generate each number from scratch each time it computes a new number in the resulting list. As the list it must generate grows in size the Rosetta Code algorithm’s run time grows exponentially. For generating a list of Fibonacci numbers it’s clear my algorithm performs the best.

Conclusion

Clearly these three algorithms have very different performance characteristics. For generating a single Fibonacci number the Rosetta Code function will work fine. My function will also work fine for generating a single Fibonacci number, but will use more memory due to the list that it builds. The Dave Thomas algorithm performs poorly for anything beyond the first 30 Fibonacci numbers and probably shouldn’t be used for anything other than exercises like this.

My algorithm, which was designed to generate a list of Fibonacci numbers, turns out to be the best algorithm for generating a list of Fibonacci numbers. Looping over the Fibonacci functions for the other algorithms greatly degrades their performance. It’s better to design a new Fibonacci function that generates the full list in one recursive call rather than reusing an existing Fibonacci function that only generates one number at a time to build the list.