Software Engineering Blog by Trevor Brown

I saw this video about the Josephus problem yesterday and immediately I thought it might be a fun problem to solve in Erlang. First, here is the video that got me interested.

With Recursion and Pattern Matching

I thought this would be an interesting problem to solve with Erlang’s recursion and pattern matching. I decided to use a list of integers to represent the soldiers in the circle, and a recursive function to loop around the list and remove every second integer until only one remains. Pattern matching makes it easy to pick out every second integer. Here is my first attempt at solving the problem with recursion and pattern matching:

-module(josephus).

-export([survivor_in_circle/1]).

survivor_in_circle(Num) ->
    kill_to_left(lists:seq(1, Num), []).

kill_to_left([Survivor], []) ->
    % One person remaining, they are the survivor
    Survivor;
kill_to_left([], Survivors) ->
    % Start another loop around the circle
    kill_to_left(lists:reverse(Survivors), []);
kill_to_left([Killer], Survivors) ->
    % killer is last person in the loop, start another loop around the circle
    kill_to_left([Killer|lists:reverse(Survivors)], []);
kill_to_left([Killer, _Killed|Rest], Survivors) ->
    % Only the killer survives
    kill_to_left(Rest, [Killer|Survivors]).

1> josephus:survivor_in_circle(1).
1
2> josephus:survivor_in_circle(2).
1
3> josephus:survivor_in_circle(4).
1
4> josephus:survivor_in_circle(5).
3
5> josephus:survivor_in_circle(7).
7
6> josephus:survivor_in_circle(41).
19

While this solution only requires 4 function clauses, it would be nice if it only required 2 clauses. Ideally we should only have one clause for when one number remains in the list, and another clause for when there are two or more numbers in the list. We actually can do this in Erlang with only two clauses by using the ++ operator to concatenate surviving numbers onto the end of the list of remaining numbers, so that it is never empty.

-module(josephus).

-export([survivor_in_circle/1]).

survivor_in_circle(Num) ->
    kill_to_left(lists:seq(1, Num)).

kill_to_left([Survivor]) ->
    % One person remaining, they are the survivor
    Survivor;
kill_to_left([Killer, _Killed|Rest]) ->
    % Only the killer survives
    kill_to_left(Rest ++ [Killer]).

I usually try to avoid using ++ operator in Erlang because it creates an entirely new list each time it’s used, so while it’s more elegant than the first solution it’s less efficient. On small lists this won’t matter. The largest list I created above only contained 41 integers, so both solutions are fast enough for this.

With Binary Pattern Matching

I also thought binary method of solving the problem was interesting. All you need to do is move the first bit in the binary representation of the number to the end of the binary. This is also something that can be done easily with Erlang’s pattern matching, and since it only requires moving around a couple bits it should be by far the fastest way to solve the problem.

After a little tinkering, and little help from #erlang IRC channel, I got it working:

survivor(Num) ->
    <<First:8,Rest/binary>> = integer_to_binary(Num, 2),
    binary_to_integer(<<Rest/binary,First:8>>, 2).

Here I’m not actually moving bits around, I’m moving bytes, because the integer_to_binary/2 call returns a binary representing a sequence of ones and zeros, e.g. <<"101">>. Ideally I’d like move bits instead of bytes. This is easy in Erlang, but it took me a while to figure how to how to get this to work. I asked a quite a few questions on this before I finally got it working. Note here that I’m not using Erlang’s integer_to_binary/2 function, I’m using binary:encode_unsigned/1. encode_unsigned converts the number to binary notation, which is the representation required for the bit moving trick to work. After we convert the number to binary notation we remove all leading zero-bits. Then move the first bit to the end, and convert it back to an integer. You can see we have to specify the number of bits that should be used to construct the resulting integer. We count the number of bits in the original binary to get this number.

bit_solution(Num) ->
     <<First:1,Rest/bits>> = remove_leading_zero_bits(binary:encode_unsigned(Num)),
     Size = bit_size(Rest) + 1,
     <<Survivor:Size>> = <<Rest/bits,First:1>>,
     Survivor.

remove_leading_zero_bits(<<0:1, Rest/bitstring>>) ->
    remove_leading_zero_bits(Rest);
remove_leading_zero_bits(Bin) ->
    Bin.

Conclusion

When writing this I looked online for solutions to the Josephus problem in Erlang and found several different solutions. Some used recursive functions like I did and others used Erlang processes. Erlang’s pattern matching and recursion makes it an ideal tool for solving this sort of problem. It was a fun problem to solve, and with Erlang I didn’t have to spend any time thinking about how I was going to translate the problem into Erlang, I just jumped in and started coding. The binary solution is by far the most efficient, but all of them are expressive and relatively succinct.

References

I'v been working a lot with escripts recently and I’ve discovered quite a few features that I wish I had know about earlier. I’ve written several escripts over the last couple years, but it wasn’t until recently that I read through all the documentation on escript. There are a lot of powerful features tucked away in escript that would have never know about had I not dug deeper into documentation. I also learned a lot from Geoff Cant’s excellent talk on escript at Erlang Factory 2011. In this post I’ll try to cover all the features that I wish I would have known about from day one.

For all the examples in this post I’ll be using Erlang 18.2. Note that the example escripts in this post aren’t meant to be complete escripts but rather simple examples to showcase certain things. While all the scripts should work as is, they don’t handle all the possible return values from some function calls and they don’t handle errors at all. Ideally escripts like these should catch errors and print helpful error messages to the user informing them of their mistake.

Different Execution Modes

Interpreted mode

This is the default mode an escript runs in. In interpreted mode the code is executed very much like in the erl REPL. Unlike the REPL the escript is actually an Erlang module. Even though you don’t define a module in the file the Erlang code in the file is treated as if it’s inside a module. You can include header files, define and use macros, and do anything else you would do in a regular Erlang module. You can even use the ?MODULE in your escripts.

Compiled mode

In compiled mode the Erlang code is compiled down to bytecode before execution to improve performance. To do this set the mode attribute after the shebang and comment lines in the escript:

-mode(compile).

Native mode

Native mode compiles the escript down to native code for even faster execution:

-mode(native).

Of course you will want to make sure this actually improves performance. On a simple escript the overhead of compilation may make the escript slower. If you are concerned about performance you should benchmark the escript in each mode and go with the one that runs the fastest. You can use timer:tc to time the execution of the resource intensive code in your escript. You can also use the unix time utility to record the total run time of the escript.

Enumlator Arguments

Arguments can be passed to escript by having the second line (or third line if the emacs directive line is present) start with %%!. For example, the escript below uses the main/1 function in the my_main_module module (the -escript flag), and runs in interpreted mode (-i).

#!/usr/bin/env escript
%% -*- coding: utf-8 -*-
%%! -escript main my_main_module -i

...

These are the flags I find most useful:

-escript main <module> - use the main function defined in <module> when running the escript
-s only do a syntax check of the escript
-d Loads the module containing the main/1 function into the debugger, sets a breakpoint in main/1 and invokes main/1.
-c run the escript in compiled mode regardless of the mode attribute
-i run the escript in interpreted mode regardless of the mode attribute
-n run the escript in native mode regardless of the mode attribute

The emulator flags override other settings in the escript. In additional to these flags many of the flags that erl accepts can also be used as well. There are a lot of them. A few that I find useful in escripts are:

-setcookie set the Erlang cookie
-sname set the short name of the escript node (this also causes net_kernel to start)
-noinput ensures that the Erlang runtime system never tries to read any input.
-noshell starts the Erlang runtime system with no shell.
-path, -pa, -pz tinker with the escript’s Erlang path (this is where Erlang looks to find modules, I don’t recommend relying on this, and I’ll later show a better way of setting your escripts path)

You can also specify the encoding Erlang code in the escript on the second line using a slightly different syntax. This is the line that can contain the emacs directives:

%% -*- coding: utf-8 -*-

Connect to a running Erlang release

Escripts are regular Erlang nodes and they can connect to other nodes if net_kernel is running. There many things you can do once you have net_kernel up and running inside your escript. Below are a few examples.

Ping a Node

You can easily ping a node once net_kernel is running.

#!/usr/bin/env escript

-define(NODE_NAME, pinger).

main([Node, Cookie]) ->
    {ok, _} = net_kernel:start([?NODE_NAME, shortnames]),
    erlang:set_cookie(node(), list_to_atom(Cookie)),
    case net_adm:ping(list_to_atom(Node)) of
        ping ->
            io:format("~s ~p~n", [Node, ping]),
            halt(0);
        Else ->
            io:format("~s ~p~n", [Node, Else]),
            halt(1)
    end.

Note that instead of calling erlang:set_cookie/2 I could have just used the -setcookie flag in the second line of the escript:

%%! -setcookie cookie

To use the escript just run: $ ./pinger <node_name>@<host_name> <cookie>. Note that this script uses short names, so it will only be able to ping other nodes with short names.

Make an RPC Call to a Remote Node

The rpc module makes it easy to call functions on the remote node. This allows you escript to get statistics and performance information from a running node, check on job status, kick off a utility that needs to run, or change the configuration. In the example below I use rpc to get the statistics on the number of reductions that have been performed on the node:

#!/usr/bin/env escript

-define(NODE_NAME, reduction_stats).

main([Node, Cookie]) ->
    NodeName = list_to_atom(Node),
    {ok, _} = net_kernel:start([?NODE_NAME, shortnames]),
    erlang:set_cookie(node(), list_to_atom(Cookie)),

    {Total, _SinceLastCall} = rpc:call(NodeName, erlang, statistics, [reductions]),
    io:format("Number of reductions on node ~s: ~p~n", [NodeName, Total]).

To use the script just invoke it with the node name you want to check along with the cookie it uses:

$ ./reduction_stats <node>@<host> <cookie>
Number of reductions on node <node>@<host>: 534202

Load Config from a Running Application

Often you will want to verify a release is running with the correct configuration or load the configuration from a release and use it in your escript. Loading the config from a running node is trivial with the rpc module:

#!/usr/bin/env escript

-define(NODE_NAME, config_printer).

main([Target, App, Cookie]) ->
    TargetName = list_to_atom(Target),
    AppName = list_to_atom(App),
    {ok, _} = net_kernel:start([?NODE_NAME, shortnames]),
    erlang:set_cookie(node(), list_to_atom(Cookie)),

    Config = rpc:call(TargetName, application, get_all_env, [AppName]),
    io:format("Config from ~s on node ~s: ~p~n", [AppName, TargetName, Config]).

Same as before, just invoke the escript with the node you want to get config from and the cookie:

$ ./config_printer <node>@<host> <cookie>
Config from your_app on node <node>@<host>: [{param, value}]

Load Modules from a Running Application

You can also pull the Erlang path from a running application and set it as the Erlang path of your escript, making all the Erlang modules in a release available in your escript.

#!/usr/bin/env escript

-define(NODE_NAME, path).

main([Target, Cookie]) ->
    TargetName = list_to_atom(Target),
    {ok, _} = net_kernel:start([?NODE_NAME, shortnames]),
    erlang:set_cookie(node(), list_to_atom(Cookie)),

    Path = rpc:call(TargetName, code, get_path, []),
    % Print it
    io:format("Path on node ~s: ~p~n", [TargetName, Path]),
    % Add it to the path of the escript
    code:add_pathsz(Path).
    % Use the modules in the path directly in your escript...

Again, usage of the escript is the same:

$ ./path <node>@<host> <cookie>
Path from your_app on node <node>@<host>: [".", ...]

Remsh into a Node

Starting a remsh using erl can be a pain sometimes. You can start a remote shell inside an escript with the user_drv:

#!/usr/bin/env escript
%%! -noshell -noinput

-define(NODE_NAME, remsh).

main([Node, Cookie]) ->
    net_kernel:start([?NODE_NAME, shortnames]),
    erlang:set_cookie(node(), list_to_atom(Cookie)),
    Target = list_to_atom(Node),

    % Traps exits so when the remote shell crashes we can exit gracefully
    process_flag(trap_exit, true),

    % Start the remote shell
    Shell = user_drv:start(['tty_sl -c -e',{Target,shell,start,[]}]),

    % Link to the remote shell so we receive the exit message
    true = erlang:link(Shell),
    io:format("Grabbed a remote shell on ~p~n.", [Target]),

    % Return when we get the exit message
    receive
        {'EXIT', Shell, _} -> ok
    end.

Again, usage of the escript is the same, but this time a remote Erlang shell is started!

$ ./remsh <node>@<host> <cookie>
Erlang/OTP 18 [erts-7.2] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Grabbed a remote shell on 'server@trevor-ThinkPad-E550'
                                                    .Eshell V7.2  (abort with ^G)
(<node>@<host>)1>

You can issue commands as you normally would, and just enter <Ctrl>-C to exit.

Locks

Node names can be repurposed as locks. You can ensure only one instance of the escript is running per machine by always starting the Erlang node with the same name:

% ...

-define(NODE_NAME, escript_lock).

main(_) ->
    case net_kernel:start([?NODE_NAME, longnames]) of
        {error, _} ->
            error_exit("User state sync already in progress.");
        _ ->
            ok
    end,
    % continue
    % ...

If a second escript with the same name is started when one is already running with the same name the second will fail since the name is already taken. Once the first escript stops the name will become available again, allowing the escript to be invoked a second time.

Package Multiple Files in an Escript

If your writing non-trivial escript it’s usually best to organize the code into multiple modules to keep the project organized. Escripts can contain beam bytecode and a zip archive of other files that are needed in the escript. This makes it possible to package an entire Erlang application in a single, portable, escript.

Building An Escript by Hand

Let’s say we have an Erlang project with a directory structure that looks like this:

src/ % these are the modules we want to include in the escript
    module_1.erl % this module exports a main/1 function
    module_2.erl
priv/ % other resources in here
    index.html

Packaging multiple modules into an escript is done in several steps:

Compile Erlang code. One of the modules must export a main/1 function that the escript can call with the command line arguments. Something along these lines:
```
 mkdir ebin
 erlc -v -Werror +debug_info +warn_export_vars +warn_shadow_vars +warn_obsolete_guard  -o ebin/ src/module_1.erl src/module_2.erl
```
Build a zip archive of non-beam files:
```
 zip archive.zip priv/*
```
Create the escript with the beam files and the archive. There are a couple things Erlang’s escript module needs to build the escript, so for convience we can put the code to build the escript in another escript:

#!/usr/bin/env escript

-define(ESCRIPT, "archive_demo").
-define(SHEBANG, "/usr/bin/env escript").
-define(COMMENT, "").
-define(EMU_ARGS, "-pa . -sasl errlog_type error -escript main module_1").

main(_) ->
    Beams = [{F, read(F)} || F <- filelib:wildcard("ebin/*")],
    Files = [{"archive.zip", read("archive.zip")}|Beams],
    ok = escript:create(?ESCRIPT, [
       {archive, Files, [memory]},
       {shebang, ?SHEBANG},
       {comment, ?COMMENT},
       {emu_args, ?EMU_ARGS}
     ]),
    ok = file:change_mode(?ESCRIPT, 8#755),
    ok.
read(File) ->
    {ok, B} = file:read_file(filename:absname(File)),
    B.

Automation

Rebar, Rebar3, and Erlang.mk all have tools built in that make building multi-module escripts even easier, and are more flexible than a custom build script like the one above.

With Rebar3

With Rebar3 all you need to do is create a rebar.config file specifying the application name, the escript name, the other applications to include in the escript, and finally the emulator arguments:

{escript_main_app, module_1}
{escript_name, escript_demo}.
{escript_incl_apps, []}.
{escript_emu_args, "%%! -escript main module_1"}.

Note that the emulator arguments option must specify the module containing the main/1 function for the escript if the module name containing it differs from the application name. To build the escript just run:

rebar3 escriptize

You should find the completed escript in the _build directory.

With Erlang.mk

With erlang.mk all we need to do is add the escript.mk plugin in the project and include it in the Makefile. We also need to specify a few options for the escript:

PROJECT = escript_example

ESCRIPT_NAME = escript_demo
ESCRIPT_EMU_ARGS = "%%! -escript main module_1"

include erlang.mk
include escript.mk

Then just use the escript target to build the escript:

make escript

Conclusion

Almost anything that can be done in an Erlang release can also be done in an escript with a little work. Since escripts are regular Erlang nodes and can call other modules, there isn’t much you can’t do directly in an escript. In this blog post I’ve really only scratched the surface of what’s possible with escript. While it may be possible to do everything you need inside an escript it’s important to remember escripts aren’t meant to be a replacement for Erlang releases. If you need a long running application you should use an Erlang release. Escript’s real strength it’s ability to provide better interfaces for interacting with Erlang systems. This is especially helpful when you have engineers that aren’t experienced with Erlang maintaining Erlang systems.

Hopefully this blog post has show a few features of escript that you can use to build better escripts.

Oh, and when your writing larger escripts, getopt by Juan Jose Comellas is a essential. I couldn’t live without it.

Stratus3D

A blog on software engineering by Trevor Brown

The Josephus Problem in Erlang

With Recursion and Pattern Matching

With Binary Pattern Matching

Conclusion

References

Escript Essentials

Different Execution Modes

Interpreted mode

Compiled mode

Native mode

Enumlator Arguments

Connect to a running Erlang release

Ping a Node

Make an RPC Call to a Remote Node

Load Config from a Running Application

Load Modules from a Running Application

Remsh into a Node

Locks

Package Multiple Files in an Escript

Building An Escript by Hand

Automation

With Rebar3

With Erlang.mk

Conclusion

Resources

Why I Don't Use an IDE

Abstract

Slides