Stratus3D

A blog on software engineering by Trevor Brown

Sliding Window Data Store in Erlang

In a recent Erlang project I had a need for in-memory storage of events. I didn’t need events to persist for more than an hour. All I needed was a collection of events that had occurred over the last hour that I could query. Collections like these are often referred to as sliding window data stores. Only events within a window of time are kept. As the events age they eventually pass out of the window of time and are discarded. Since old events are discarded when they fall outside of the window, queries against the collection only return events within the sliding window.

###First Attempt

My first attempt at creating a sliding window resulted in a very primitive solution. All it does is create an empty list, and as new events come in they are added to the beginning of the list. Anytime an action is performed, whether adding events, querying events, or retrieving all the events, old events are removed by folding over the list with lists:foldl. When events are queried the entire list is filtered to find the matching events. When we want all the events within the window we simply return the list. Events are stored internally as tuples of three values - The event ID, the time added, and the event itself.

Here is the module source code:

``` erlang sliding_window_list.erl -module(sliding_window_list).

-behaviour(sliding_window).

-include_lib(“include/sliding_window.hrl”).

-export([new/1, add/2, events/1, event_info/2, reset/1]).

% API new(Size) -> % Since we are using a plain old list to store events all we need to keep % track of is the size of the window (in seconds), the last event ID (starts % at 1), and the array of events. We store all these in a tuple and pass it % around to each API call. {Size, 1, []}.

add({Size, Id, Window}, Event) -> % Remove old events as some might still exist in the list NewWindow = remove_old_events(Size, Window), % Add a new event to the list and return the updated window state and event % id. {{Size, Id+1, [{Id, sliding_window:timestamp(), Event}|NewWindow]}, Id}.

events({Size, Id, Window}) -> % Remove old events as some might still exist in the list NewWindow = remove_old_events(Size, Window), % Return all events in the window {{Size, Id, NewWindow}, NewWindow}.

event_info({Size, Id, Window}, EventId) -> % Remove old events as some might still exist in the list NewWindow = remove_old_events(Size, Window), % Find the event in the list of events case lists:filter(fun({CurrentEventId, _Time, _ExistingEvent}) -> CurrentEventId =:= EventId end, NewWindow) of [{EventId, EventTimeAdded, Event}] -> % If event exists return it along with the time remaining in the % window and the time elapsed. Also return the new window state. TimeElapsed = timer:now_diff(sliding_window:timestamp(), EventTimeAdded) / ?MEGA, TimeRemaining = timer:now_diff(EventTimeAdded, sliding_window:window_end_timestamp(Size)) / ?MEGA, {{Size, Id, NewWindow}, {EventId, EventTimeAdded, TimeElapsed, TimeRemaining, Event}}; [] -> % Return none if the event is not found {{Size, Id, NewWindow}, none} end.

reset({Size, Id, _Window}) -> {{Size, Id, []}, ok}.

% Private Functions remove_old_events(Size, WindowEvents) -> % Remove events that are outside the window of time and return the updated % list of events. WindowEndTime = sliding_window:window_end_timestamp(Size), lists:takewhile(fun({_, TimeAdded, _}) -> TimeAdded > WindowEndTime end, WindowEvents).



Using this module to store events is straightforward:


``` erlang
% Create a new window to hold events for 40 seconds
1> Window = sliding_window_list:new(40).
{40,1,[]}
% Add an event to the window
2> {Window2, Id1} = sliding_window_list:add(Window, foobar_event).
{{40,2,[{1,{1436,42881,562920},foobar_event}]},1}
% Add another event to the window
3> {Window3, Id2} = sliding_window_list:add(Window2, {some, other, event}).
{{40,3,[{1,{1436,42893,241849},foobar_event}, {2,{1436,42991,830545},{some,other,event}}]},2}
% List all events
4> {Window4, Events} = sliding_window_list:events(Window3).
{{40,3,[{1,{1436,42893,241849},foobar_event}, {2,{1436,42991,830545},{some,other,event}}]}, [{1,{1436,42893,241849},foobar_event}, {2,{1436,42991,830545},{some,other,event}}]}
5> Events. % Events current stored in the window
[{1,{1436,42893,241849},foobar_event}, {2,{1436,42991,830545},{some,other,event}}]
% Get details on a specific event by ID
5> {Window5, Event} = sliding_window_list:event_info(Window4, Id1).
{{40,3,[{1,{1436,42893,241849},foobar_event}, {2,{1436,42991,830545},{some,other,event}}]}, {1,{1436,43127,793403},24.73498,15.262321,foobar_event}}
% Event details
7> Event.
{1,{1436,43127,793403},24.73498,15.262321,foobar_event}
% After waiting 40 seconds we can repeat the event_info and events calls.
% Now the events have expired and are no longer stored
14> {Window6, EventInfo} = sliding_window_list:event_info(Window5, Id1).
{{40,2,[]},none}
9> {Window7, Events} = sliding_window_list:events(Window6).
{{40,3,[]},[]}

####List Implementation Benchmarks Since I knew this wasn’t going to be an efficient solution I decided to write a couple functions to benchmark the performance of several operations. There are three main operations that can be performed on a sliding window data store:

  1. Adding events
  2. Fetching information on events (Querying for specific events within the window)
  3. Retrieving all events currently in the window

I wrote three benchmark functions. The first adds 10,000 events to the sliding window. The second fetches information on 1,000 random individual events The third fetches all events from the window 100 times. Events are generated on the fly and are tuples composed of a unique ID, an atom, and a binary with a length of 45, 90, 135, or 180 bytes.

``` erlang An example event {load_testing_event, 123, «“The quick brown fox jumped over the lazy dog.The quick brown fox jumped over the lazy dog.”»}


I ran all these functions three times on my MacBook Pro and took the average of the three runs. For my naive list implementation the results were:

* Adding 10,000 events: 8.726298 seconds
* Fetching 1,000 events: 2.835538 seconds
* Fetching all events 100 times: 0.279025 seconds

Not too bad for just being a list, but still not good enough for a system expected to be able to handle thousands of events. When using a list to store events a new list must be constructed each time old events are removed. This is due to the fact that old events are at the end of the list, and since Erlang's lists are stored as linked-lists and all data is immutable in Erlang, an entirely new list must be constructed when the last item in the list is removed. Since old events are at the end of the list and are removed each time an event is added or retrieved it has a significant impact on performance.

After seeing the benchmark results I realized my benchmark functions were completely unrealistic. The data store would likely experience a mix of operations over time. Calls to add new events along with queries for existing events and calls to retrieve all events in the window would be randomly mixed together. I decided to write another benchmark function to interleave all three of these operations. The completed function adds 100 events, then fetches 10 random events, and then fetches all events once. In these benchmarks I ran the function 1000 times. I set the window to only store events for 10 seconds so events would expire during the test. At the end of the test run I counted the events still in the window. Since all events still in the window had been added in the last 10 seconds I simply divided the number of events by 10 to compute the number of events per second. While not perfect this served as my throughput benchmark. The throughput on the list implementation was 700.8 events per second.

I also created an Erlang behavior named `sliding_window` that defines all the functions the sliding window modules must export. All the modules shown in this post comply with the `sliding_window` behaviour:

``` erlang sliding_window.erl
-module(sliding_window).

-type event() :: list(tuple()).

-type id() :: any().

-type seconds() :: integer().

-type milliseconds() :: integer().

-type window_ref() :: any().

-callback new(Size :: seconds()) -> window_ref().

-callback add(Window :: window_ref(), Event :: event()) ->
    {window_ref(), Id :: id()}.

-callback events(Window :: window_ref()) ->
    {window_ref(), list({Id :: id(), Event :: event()})}.

-callback event_info(Window :: window_ref(), EventId :: id()) ->
    {window_ref(), {TimeElapsed :: milliseconds(), TimeRemaining :: milliseconds(), event()} | none}.

-callback reset(Window :: window_ref()) ->
    {window_ref(), ok}.

###Erlang’s Queue Module

I knew using an ETS table would probably be the best solution to this problem. But before I developed the ETS table implementation I came across some stuff on the erlang-questions mailing list about sliding window data structures from back in 2005. I then found Erlang’s queue module. The queue module implements a FIFO queue using two lists. Erlang’s queue implementation is based on double ended queues from Chris Okasaki’s Purely Functional Data Structures. Here is the implementation I came up with that uses the queue module:

``` erlang sliding_window_queue.erl -module(sliding_window_queue).

-behaviour(sliding_window).

-include_lib(“include/sliding_window.hrl”).

-export([new/1, add/2, events/1, event_info/2, reset/1]).

% API new(Size) -> % We can use a queue similar to a list. We create a queue and then we can % pass it along to each call. {Size, 1, queue:new()}.

add({Size, Id, Window}, Event) -> % Remove old events as some might still exist in the queue NewWindow = remove_old_events(Size, Window), % Add a new event to the queue and return the updated window state and event % id. {{Size, Id+1, queue:in({Id, sliding_window:timestamp(), Event}, NewWindow)}, Id}.

events({Size, Id, Window}) -> % Remove old events as some might still exist in the queue NewWindow = remove_old_events(Size, Window), % Return all events in the window {{Size, Id, NewWindow}, queue:to_list(NewWindow)}.

event_info({Size, Id, Window}, EventId) -> % Remove old events as some might still exist in the queue NewWindow = remove_old_events(Size, Window), % Find the event in the queue of events FilterResults = queue:filter(fun({CurrentEventId, _Time, _ExistingEvent}) -> CurrentEventId =:= EventId end, NewWindow), case queue:to_list(FilterResults) of [{EventId, EventTimeAdded, Event}] -> % If event exists return it along with the time remaining in the % window and the time elapsed. Also return the new window state. TimeElapsed = timer:now_diff(sliding_window:timestamp(), EventTimeAdded) / ?MEGA, TimeRemaining = timer:now_diff(EventTimeAdded, sliding_window:window_end_timestamp(Size)) / ?MEGA, {{Size, Id, NewWindow}, {EventId, EventTimeAdded, TimeElapsed, TimeRemaining, Event}}; [] -> % Return none if the event is not found {{Size, Id, NewWindow}, none} end.

reset({Size, Id, _Window}) -> {{Size, Id, queue:new()}, ok}.

% Private Functions remove_old_events(Size, WindowEvents) -> % Remove events that are outside the window of time and return the updated % queue of events. WindowEndTime = sliding_window:window_end_timestamp(Size), remove_old_events_from_queue(WindowEndTime, WindowEvents).

remove_old_events_from_queue(WindowEndTime, WindowEvents) -> % Get the oldest event in the queue try queue:get(WindowEvents) of {_, TimeAdded, _} -> case TimeAdded > WindowEndTime of true -> % If the oldest event is inside the window of time we can % just return the remaining WindowEvents queue. We know all % events from here on are newer. WindowEvents; _ -> % Remove the event from the queue and invoke % remove_old_events_from_queue again to check the next event remove_old_events_from_queue(WindowEndTime, queue:drop(WindowEvents)) end catch error:empty -> % We are at the end of the queue. Return the queue WindowEvents end.



Note that functions and macros that were reused by queue implementation have been pulled out into the `sliding_window` module and header file. The functions and macros are the same as they were in the `sliding_window_list` module.

I was surprised at how fast the queue implementation performed:

* Adding 10,000 events: 0.036307 seconds
* Fetching 1,000 events: 1.21013 seconds
* Fetching all events 100 times: 0.119682 seconds
* Throughput benchmark: 2783.333333 events per second

Overall it was about three times faster than the list implementation. The improved performance is likely due to queue being represented as two Erlang lists. Lists are essentially stacks with two efficient operations - push and pop. Items can be pushed onto the top of the list to create a new list. The first item in a list can be popped off the top of the list (removed) and returned.

The queue is represented underneath as two lists - one list for incoming (new) events and the other for outgoing (old) events. The newest event is the first item in the incoming list and the oldest event is the first item in the outgoing list. What makes the queue so efficient is that everything is done using only the push and pop operations. When an event is added to the queue it is pushed onto the front of the incoming event list. When an item is removed from the queue it is popped off of the front of the outgoing list. When the outgoing list is empty each of the items in the incoming list is popped off one by one and pushed onto the outgoing list. This results in the last and oldest item in the incoming list becoming the first item in the outgoing list. Each item that passes through the queue is only pushed twice and popped twice. It is pushed onto in the incoming list, then popped off the incoming list and pushed onto the outgoing list. Finally it is popped off the outgoing list. This results in very efficient FIFO queue.

###With An ETS Table

Next I tried implementing the sliding window using an ETS table. ETS stands for Built-in Term Storage. An ETS table is a key-value lookup table that is stored in memory. It is very different than the list and queue implementations, which are basically just stacks. While the event storage is different the code was similar to the other implementations:


``` erlang sliding_window_ets.erl
-module(sliding_window_ets).

-behaviour(sliding_window).

-include_lib("include/sliding_window.hrl").

-define(TABLE_NAME, ?MODULE).

-export([new/1, add/2, events/1, event_info/2, reset/1]).

% API
new(Size) ->
    % We need our ETS table to be an ordered set. Otherwise we will have to
    % inspect every event when checking for old events. With an ordered set we
    % can be sure the oldest event is at the end.
    EtsOptions = [ordered_set, protected],
    Tid = ets:new(?TABLE_NAME, EtsOptions),
    {Size, 1, Tid}.

add({Size, Id, Window}, Event) ->
    % Remove old events as some might still exist in the table
    remove_old_events(Size, Window),
    % Add a new event to the table and return the event id.
    ets:insert(Window, {Id, sliding_window:timestamp(), Event}),
    {{Size, Id+1, Window}, Id}.

events({Size, Id, Window}) ->
    % Remove old events as some might still exist in the table
    remove_old_events(Size, Window),

    % Match all events in the table and map them to remove the surrounding lists
    % for each event. Return all events.
    Events = ets:match(Window, '$1'),
    Result = lists:map(fun([Event]) ->
                               Event
                       end, Events),
    {{Size, Id, Window}, Result}.

event_info({Size, Id, Window}, EventId) ->
    % Remove old events as some might still exist in the table
    remove_old_events(Size, Window),

    % Find the event in the ETS table
    case ets:lookup(Window, EventId) of
        [{EventId, EventTimeAdded, Event}] ->
            % If event exists return it along with the time remaining in the
            % window and the time elapsed. Also return the new window state.
            TimeElapsed = timer:now_diff(sliding_window:timestamp(), EventTimeAdded) / ?MEGA,
            TimeRemaining = timer:now_diff(EventTimeAdded, sliding_window:window_end_timestamp(Size)) / ?MEGA,
            {{Size, Id, Window}, {EventId, EventTimeAdded, TimeElapsed, TimeRemaining, Event}};
        [] ->
            % Return none if the event is not found
            {{Size, Id, Window}, none}
    end.

reset({Size, Id, Window}) ->
    true = ets:delete(Window),
    {{Size, Id, Window}, ok}.

% Private Functions
remove_old_events(Size, Window) ->
    % Remove events that are outside the window of time
    WindowEndTime = sliding_window:window_end_timestamp(Size),

    % The last event is the oldest, so we start with it
    Key = ets:first(Window),
    ok = remove_old_events_from_table(WindowEndTime, Window, Key).

remove_old_events_from_table(_WindowEndTime, _Window, '$end_of_table') ->
    ok;
remove_old_events_from_table(WindowEndTime, Window, Key) ->
    % Lookup the event in the ETS table and check the time it was added.
    case ets:lookup(Window, Key) of
        [{_, TimeAdded, _}] = [Event] ->
            case TimeAdded > WindowEndTime of
                true ->
                    % If the oldest event is inside the window of time we can
                    % return. We know all events from here on are newer.
                    ok;
                false ->
                    % Remove the event from the table and invoke
                    % remove_old_events_from_table again to check the next event
                    true = ets:delete_object(Window, Event),
                    NewKey = ets:next(Window, Key),
                    remove_old_events_from_table(WindowEndTime, Window, NewKey)
            end;
        [] ->
            % We are at the end of the set. Return
            ok
    end.

####ETS Table Benchmark Results

  • Adding 10,000 events: 0.058912 seconds
  • Fetching 1,000 events: 0.937192 seconds
  • Fetching all events 100 times: 0.108906 seconds
  • Throughput benchmark: 2196.666667 events per second

Surprisingly the ETS table is actually slower than using the queue module. It’s still significantly faster than using a plain old list however. Since we are using an ordered set table to store the events we know the oldest event is the first item in the table. So we start checking for old events at the beginning of the table and work our way forward. Once we get to an event that is still within the window of time we stop checking for old events, since we know that the remaining events are closer to the end of the table, and therefore, they are newer. This allows us to ignore most of the events that are inside of the window of time. The slight decrease in performance over the queue implementation is likely due to the key lookup required to traverse the items in the table (e.g. the ets:first/1 and ets:next/2 calls).

###Mnesia

Since we are doing all this in Erlang this comparison wouldn’t be complete without testing Erlang’s own database, Mnesia. Mnesia uses ETS and DETS tables underneath so I didn’t expect it to be any faster than the ETS implementation. After seeing the results from ETS table implementation I was curious how Mnesia would perform in situation like this. Since Mnesia is built on top of ETS and DETS I didn’t expect it to be any faster. There are other things that might justify using an Mnesia database to store events. Mnesia has the ability to have events persisted on disk as well as stored in memory. An Mnesia table can also be replicated across multiple Erlang nodes, making fault tolerance possible. Both of these things would be very difficult to do with plain ETS table or queue. For the Mnesia sliding window I opted for a simple in memory table on one node. This makes the implementation similar to the others:

``` erlang sliding_window_mnesia.erl -module(sliding_window_mnesia).

-behaviour(sliding_window).

-include_lib(“stdlib/include/qlc.hrl”). -include_lib(“include/sliding_window.hrl”).

-define(TABLE_NAME, event).

-record(event, {id, added_time, event}).

-export([new/1, add/2, events/1, event_info/2, reset/1]).

% API new(Size) -> % Create the schema mnesia:create_schema([node()]), % Start Mnesia mnesia:start(), Attrs = {attributes, record_info(fields, event)}, % Only store table in ram on one node mnesia:create_table(?TABLE_NAME, [Attrs, {ram_copies, [node()]}]), {Size, 1, ?TABLE_NAME}.

add({Size, Id, Window}, Event) -> % Remove old events as some might still exist in the table remove_old_events(Size, Window), % Add a new event to the table and return the event id. {atomic, _} = mnesia:transaction(fun() -> EventRecord = #event{id=Id, added_time=sliding_window:timestamp(), event=Event}, mnesia:write(EventRecord) end), {{Size, Id+1, Window}, Id}.

events({Size, Id, Window}) -> % Remove old events as some might still exist in the table remove_old_events(Size, Window),

% Match all events in the table and map them to tuples. Return all events.
Result = do(qlc:q([X || X <- mnesia:table(Window)])),
TupleResults = lists:map(fun({event, EventId, TimeAdded, Event}) ->
                  {EventId, TimeAdded, Event}
          end, Result),
{{Size, Id, Window}, TupleResults}.

event_info({Size, Id, Window}, EventId) -> % Remove old events as some might still exist in the table remove_old_events(Size, Window),

% Find the event in the ETS table
Result = do(qlc:q([{CurrentEventId, EventTimeAdded, Event} || #event{id=CurrentEventId, added_time=EventTimeAdded, event=Event} <- mnesia:table(Window),
                                              CurrentEventId =:= EventId
         ])),
case Result of
    [{EventId, EventTimeAdded, Event}] ->
        % If event exists return it along with the time remaining in the
        % window and the time elapsed. Also return the new window state.
        TimeElapsed = timer:now_diff(sliding_window:timestamp(), EventTimeAdded) / ?MEGA,
        TimeRemaining = timer:now_diff(EventTimeAdded, sliding_window:window_end_timestamp(Size)) / ?MEGA,
        {{Size, Id, Window}, {EventId, EventTimeAdded, TimeElapsed, TimeRemaining, Event}};
    [] ->
        % Return none if the event is not found
        {{Size, Id, Window}, none}
end.

reset({Size, Id, Window}) -> % Empty the database mnesia:clear_table(Window),

% Stop mnesia
mnesia:stop(),
{{Size, Id, Window}, ok}.

% Private Functions remove_old_events(Size, Window) -> % Remove events that are outside the window of time WindowEndTime = sliding_window:window_end_timestamp(Size),

OldEventsIds = do(qlc:q([Id || #event{id=Id, added_time=TimeAdded} <- mnesia:table(Window),
                                    TimeAdded < WindowEndTime
         ])),
mnesia:transaction(fun() ->
                           lists:map(fun(Id) ->
                                             mnesia:delete({event, Id})
                                     end, OldEventsIds)
                   end).

do(Q) -> F = fun() -> qlc:e(Q) end, {atomic, Val} = mnesia:transaction(F), Val.

```

####ETS Table Benchmark Results

  • Adding 10,000 events: 10.498559 seconds
  • Fetching 1,000 events: 4.26676 seconds
  • Fetching all events 100 times: 0.442445 seconds
  • Throughput benchmark: 563.933333 events per second

The Mnesia table performed far worse than the ETS table. The extreme differences in performance between the ETS table and the Mnesia table is due to the fact that I must query the entire table to find old events. This is detrimental to performance. There is a way to use an ordered_set table just like in the ETS table implementation but I wasn’t able to get it to work properly. Checking every event in the table on every call is very expensive. Querying for old events is the primary cause of the bad performance.

###Benchmark Comparison

Here are the benchmark results for all four sliding window modules:

Implementation Adding 10,000 Events Fetching 1,000 Events Fetch All Events 100 Times
Values in Seconds - lower is better
List8.7262982.8355380.279025
Queue0.0363071.210130.119682
ETS0.0589120.9371920.108906
Mnesia10.4985594.266760.442445


And the more realistic throughput benchmark:

Implementation Throughput Benchmark (events per second, higher is better)
List700.8
Queue2783.333333
ETS2196.666667
Mnesia563.933333


###Conclusion

Erlang provides all the tools needed to quickly build sliding window data stores. The development of all four implementations was relatively straightforward and they were all less than 100 lines of code. They all had very different performance characteristics but overall the one that used Erlang’s queue module performed the best in my tests. The ETS table implementation was a close second. If you need the sliding window to be accessible to other processes on the Erlang node an ETS table may be a good option as it allows for sharing between processes, unlike a list or queue, which is only accessible by one process. The sliding window that used Mnesia was the slowest implementation out of the four. However there are still many advantages to using Mnesia to store events. You can spread the Mnesia table across multiple Erlang nodes and unlike the other three implementations you can use a disk backed Mnesia table to persist events on disk. Overall the list implementation was probably the worst. Though even it might be fine in situations where you don’t plan on handling more than a couple hundred events at a time.

I hope this information was helpful to you. If you notice anything here is that incorrect or you see something I missed please let me know.

The benchmark tests I wrote are far from perfect. I only wrote them to get a feel for the performance of the various sliding window implementations I came up with. In all the benchmarks I ran the window size was less than 2 minutes and in the throughput benchmark the window size was only 10 seconds. In real scenarios events would likely be kept for at least an hour. If you need a sliding window data store it would be wise to write your own benchmarking functions that exercises these implementations in ways that match your specific use case.

###Resources

Extensible Rails 4 Form Object Design

Keeping models and controllers simple as Rails applications grow can often be a challenge. Controllers begin handling business logic in addition to view logic. Models grow large with methods responsible for the presentation of the data and the business logic that surrounds the data. With the addition of things like accepts_nested_attributes_for the problem gets even worse. accepts_nested_attributes_for tethers your models together, making one model aware of the attributes of the other. In the view this problem is compounded by the knowledge the form markup contains about the nested structure of the models. Form objects are one of the ways to simplify models and controllers. In this post I am going to show what I have found to be an effective form object design. It will eliminate the need for accepts_nested_attributes_for.

There are many solutions to this problem. Form objects are one solution. There are many form object implementations. There are gems like reform, simple_form_object and virtus. And while these work well in many cases I found some of the things they did undesirable. Most gems require some duplication of logic. They usually require the validation logic to be present in the form object, even if they are already present in the model. With my solution the validation logic remains in the original models where it should be. Of course additional validations can be added to the form object itself to verify the validity of the objects as a group.

###The Basic Form Object Class

Most of this solution came from Thoughtbot’s Upcase forum where @derekprior provided a simple form object example. His example had elegant solutions to validation and persistence of multi-model forms. It fell short in some other areas. His solution didn’t provide a way to initialize the form with existing objects. Using existing objects is necessary when creating edit forms. My form class can be initialized with the objects it updates when it is created. This makes any controller action that uses the form object very simple.

###An Example

To illustrate how my form object works let’s suppose we have an online store that has a Customer model and an Address model. Customers might not have an address when they sign up. Keeping the data in two separate models makes sense. Store administrators might want to edit all the information associated with a customer in one form. And we must remember that the Address record for a customer may not exist yet. This is where form objects come in. Our form object will manage creating and updating of customer records and their associated address records. First here are the two models:

``` ruby customer.rb class Customer < ActiveRecord::Base has_one :address end


``` ruby address.rb
class Address < ActiveRecord::Base
  belongs_to :customer
end

###The Form Object

Next we create a form class. I like to keep mine in app/forms/ but you can put them wherever you like. The form class is by far the most complex of the three classes. Don’t worry though, I will go through it and explain what each section of it does.

``` ruby customer_form.erb class CustomerForm # include ActiveModel so we have all the nice ActiveModel methods available include ActiveModel::Model # All the models that are apart of our form should be part attr_accessor. # This allows the form to be initialized with existing instances. attr_accessor :customer, :address

def self.customer_attributes Customer.column_names.push(Customer.reflections.keys) end

def self.address_attributes Address.column_names.push(Address.reflections.keys) end

customer_attributes.each do |attr| delegate attr.to_sym, “#{attr}=”.to_sym, to: :customer end

address_attributes.each do |attr| delegate attr.to_sym, “#{attr}=”.to_sym, to: :address end

delegate :id, :persisted?, to: :customer

validate :validate_children

def self.model_name customer.model_name end

def assign_attributes(params) customer_attributes = params.slice(self.class.customer_attributes) customer.assign_attributes(customer_attributes) address_attributes = params.slice(self.class.address_attributes) address.assign_attributes(address_attributes) setup_associations end

def save if valid? ActiveRecord::Base.transaction do customer.save! address.save! end end end

def customer @customer ||= Customer.new end

def address @address ||= Address.new end

private

def setup_associations address.customer = customer end

def validate_children setup_associations

if customer.invalid?
  promote_errors(customer.errors)
end

if address.invalid?
    promote_errors(address.errors)
end   end

def promote_errors(child_errors) child_errors.each do |attribute, message| errors.add(attribute, message) end end end


At the very top of the class we include `ActiveModel::Model`. This provides all the nice `ActiveModel` methods that you have come to expect on your model instances. This also allows us to treat our class more like a model instance in the controller. Next we have the `attr_accessor` call. This is where we specify the names of the models that make up our form. These are the names, not the classes, of those models.

``` ruby
  # include ActiveModel so we have all the nice ActiveModel methods available
  include ActiveModel::Model
  # All the models that are apart of our form should be part attr_accessor.
  # This allows the form to be initialized with existing instances.
  attr_accessor :customer, :address

Next come the attributes and the delegation logic. First we define class methods that return the attributes of each model listed in attr_accessor call. In our case we need two such methods - one for the Customer model and one for the Address model. Next we take the array of attributes from each model and delegate getter and setter methods for each of the attributes back to the original model. Again, in our case this is :customer, and :address (Note that if the two models contain attributes with the same name only one will be delegated correctly due to the name conflict. If you run into this issue try using delegate’s :prefix option). This means that when we assign :first_name to our form object (@form_object.first_name = 'Fred') we are actually setting the :first_name attribute of the Customer instance. This delegation is what allows us to easily aggregate all the attributes from multiple models into one class without having to duplicate attribute names. Also note the last line. We delegate :id and :persisted? to Customer, as these are methods that must be present on our form object (Note that this may not be needed in all cases).

  def self.address_attributes
    Address.column_names.push(Address.reflections.keys)
  end

  def self.customer_attributes
    Customer.column_names.push(Customer.reflections.keys)
  end

  address_attributes.each do |attr|
    delegate attr.to_sym, "#{attr}=".to_sym, to: :address
  end

  customer_attributes.each do |attr|
    delegate attr.to_sym, "#{attr}=".to_sym, to: :customer
  end

  delegate :id, :persisted?, to: :customer

Next comes the model_name class method and a few other methods. The model_name class method is used by simple_form to determine the appropriate create and update urls the form should use. Next we have the assign_attributes method. This method is responsible for setting the attributes on all the model instances (in our case just Customer and Address). We call assign_attributes on each instance and pass in the appropriate portion of the attributes hash. I chose to slice the attributes hash based on model attributes and pass the resulting hashes to each model. There might be a better way of doing this but I found this method readable and relatively simple. After setting the attributes assign_attributes calls the setup_associations method which sets up the model associations properly (more on that in a minute). Next is the save method. It invokes valid? before attempting to save the records. If valid? returns true each model’s save! method is invoked inside a transaction. We also define functions for each of the models contained in the form (again, in our case this is Customer and Address). These methods are important as they will initialize new Customer and Address instances if we didn’t provide our own when initializing the form. Finally, we have a call to validate, which invokes our custom :validate_children method (which is described in the next paragraph).

  validate :validate_children

  def self.model_name
    customer.model_name
  end

  def assign_attributes(params)
    customer_attributes = params.slice(*self.class.customer_attributes)
    customer.assign_attributes(customer_attributes)
    address_attributes = params.slice(*self.class.address_attributes)
    address.assign_attributes(address_attributes)
    setup_associations
  end

  def save
    if valid?
      ActiveRecord::Base.transaction do
        customer.save!
        address.save!
      end
    end
  end

  def customer
    @customer ||= Customer.new
  end

  def address
    @address ||= Address.new
  end

Now all we have left are the private methods. setup_associatons is a method that is responsible for setting up any necessary associations prior to saving or validating the form data. In our case all it does is assign the customer to the customer attribute of the address. Next is validate_children. This is our validation callback. This method is responsible for all validations. First it invokes setup_associations to ensure the associations are setup, then it checks if each model in the form is valid, and if it is not, it passes that instance’s errors to the promote_errors method. The promote_errors method takes all the errors that it is passed and adds each one to the errors array on the CustomerForm instance. This in turn renders the form itself invalid. Furthermore, the errors in the errors array correspond with delegated attributes on the form object. These are the same attributes that will be rendered as form fields in view. Since the attributes match, simple_form will be able to handle the display of errors in the form the way it normally would - with messages above or below the corresponding form field.

  private

  def setup_associations
    address.customer = customer
  end

  def validate_children
    setup_associations

    if customer.invalid?
      promote_errors(customer.errors)
    end

    if address.invalid?
        promote_errors(address.errors)
    end
  end

  def promote_errors(child_errors)
    child_errors.each do |attribute, message|
      errors.add(attribute, message)
    end
  end

All of these methods allow us to treat our form object very much like an ActiveRcord instance. And other than class names and associations, we didn’t have to duplicate anything that already existed in our models. Existing model validations are used automatically. This makes for a very DRY form object. All of this will make our controller and view code simpler than what would be required with accepts_nested_attributes_for.

###The Controller and Views Now our form object isn’t much use unless we actually use it in our controllers. The CustomerController is where we are most likely going to need our CustomerForm. Here all we need to do to build working new and edit forms:

``` ruby CustomerController.rb class CustomerController < ApplicationController def new @customer_form = CustomerForm.new() end

def create @customer_form = CustomerForm.new(customer_form_params)

if @customer_form.save
  redirect_to customer_path(@customer_form.customer)
else
  render :new
end   end

def edit
  @customer = Customer.find(params[:id])
  @customer_form = CustomerForm.new(customer: @customer, address: @customer.address)   end

def update @customer = Customer.find(params[:id]) @customer_form = CustomerForm.new(customer: @customer, address: @customer.address) @customer_form.assign_attributes(customer_form_params)

if @customer_form.save
  redirect_to customer_path(@customer_form.customer)
else
  render :edit
end   end

def customer_form_params params.require(:customer).permit(%w{email password password_confirmation last_name first_name address address_2 city city_type state postal_code country}) end end


Since our form object handles everything we can treat our form object like we would treat any other model in our controller. The only difference is that when editing we must first retrieve the customer and the address and pass them into the `CustomerForm.new` call. We then call `assign_attributes` with the customer form parameters like normal.

The view is also very simple. All we need is a form partial like the one below, which we can include in our `new.html.erb` and `edit.html.erb` view templates:

``` erb _form.html.erb
<%= simple_form_for @customer_form do |f| %>
  <%= f.input :first_name %>
  <%= f.input :last_name %>
  <%= f.input :email %>
  <%= f.input :password %>
  <%= f.input :password_confirmation %>

  <%= f.input :address %>
  <%= f.input :address_2 %>
  <%= f.input :city %>
  <%= f.input :city_type %>
  <%= f.input :state %>
  <%= f.input :postal_code %>
  <%= f.input :country %>

  <%= f.button :submit %>
<% end %>

erb new.html.erb and edit.html.erb <%= render 'form' %>

###Summary Hopefully you can see how simple and extensible this solution is compared to some of the alternatives. We didn’t have to duplicate a single attribute name or validation method. We could easily add a Profile object to the form with needing to add more than 4 small blocks of code.

I have thought about putting this in a gem. But for now I think that isn’t necessary. If you have any thoughts about my form class I would love to here them. Feel free to contact me.

###9/4/2015 Update

When I first posted this I accidentally left out the assign_attributes method of the CustomerForm class. Obviously that’s a pretty critical portion of the form class. I updated the relevant portions of this post with the missing information

###Resources:

  • https://forum.upcase.com/t/form-objects/2267
  • http://stackoverflow.com/questions/14001353/displaying-validation-errors-on-delegate-attributes

Variable Hoisting in Golang

I had the opportunity to work with the Go programming language about a month ago. Since I prefer functional programming over object oriented programming I wanted to see if I could create things like closures and first class functions. It turns out all of this is possible in Go. In this post we will write a function that takes function as one of its arguments and invokes the function on every element of an array (much like the map function). Then we will create a closure by defining a function within a function and use the closure to hoist a variable up into scope of the parent function. In the end we can combine the two functions into some code that computes the sum of all numbers in an array. The code in this post is meant to showcase Go’s closures and first class functions. There are much easier ways of summing the values of all the numbers in an array than this. If you need the sum of all the numbers in an array just write a for loop that adds each of the numbers to a sum variable one by one.

The since the first function takes another function and applies it to each item in an array we will can call it ApplyToArray. The complete function is below:

func ApplyToArray(fun func(int) error, numbers []int) (err error) {
  for i := 0; i < len(numbers); i++ {
    err := fun(numbers[i])
    if err != nil {
      return err
    }
  }
  return nil
}

The first thing to note about this function are the arguments that it expects. The first argument must be a function that takes an integer and returns an error (fun func(int) error) and second is an array of integers. The function that is passed in is bound to the fun variable in the function. The body of the functions is simply a for loop that iterates over all the numbers in the numbers array. Inside the for loop the function that was passed in (fun) is invoked on each number. Since the return value is an error we check to see if the error is nil. If the error is not nil we stop iteration and return the error. Otherwise the loop continues. This allows us to pass any function in and have it invoked with each element in the array, but due to Go’s strongly typed nature it limits what the array can contain and what types function passed in must receive and return. The ApplyToarray function above will only be able to handle a function that expects an integer and an array containing integers as arguments. Anything else will raise an exception.

Now that we have our ApplyToArray function complete let’s build a function that computes the sum of all the values in an array. As you probably noticed the ApplyToArray function invokes the given function with every element in the array, but it doesn’t provide a way for the function that is passed in (sum) to return an intermediate value and pass it to the next invocation of the sum function. We can easily get around this by using a closure to “hoist” the variable out to the sum function we pass into the ApplyToArray function. We will call this new function Sum. Here is the complete Sum function:

func Sum(numbers []int) (sum int) {
  sum = 0
  fun := func(number int) (err error) {
    sum = sum + number
    return nil
  }
  ApplyToArray(fun, numbers)
  return sum
}

Inside Sum we set the sum variable to 0 and then create a function and assign it to the variable fun. fun expects an integer as it’s only argument, so it matches the type specified for the first argument of ApplyToArray. Inside fun is where things get more interesting. We take the number passed in and add it to the sum variable. Then we assign the result of the addition back to the sum variable. But the sum variable is not defined inside fun. It is defined outside fun in the Sum function. Since the variable sum is not defined inside fun we are actually referencing the sum variable defined in the Sum function scope. The fun function we defined inherited the scope of its parent function and that allows us to make changes to sum from inside fun. This means that if we were to invoke fun with the integer 1 the value of the sum variable inside the Sum function would equal 1 as well. This is variable hoisting.

After defining the fun function in Sum we invoke ApplyToArray with fun and the array of numbers. Once ApplyToArray returns the sum variable contains the sum of all the integers in the numbers array, which we simply return to the caller of the function.

The complete example code using both functions looks like this:

package main

import "fmt"

func main() {
  sum := Sum(numbers)
  fmt.Println(sum)
}

func Sum(numbers []int) (sum int) {
  sum = 0
  fun := func(number int) (err error) {
    sum = sum + number
    return nil
  }
  ApplyToArray(fun, numbers)
  return sum
}

func ApplyToArray(fun func(int) error, numbers []int) (err error) {
  for i := 0; i < len(numbers); i++ {
    err := fun(numbers[i])
    if err != nil {
      return err
    }
  }
  return nil
}

It’s nice to see that see things like first class functions are present in a newer compiled language like Go. While I don’t think I will be using Go very often on projects in the future I enjoyed learning it and it was definitely worth it. Check it out at golang.org.

If you have any questions feel free to email me.