Today I was working with some pipe delimited data in Ruby and stumbled across some things in Ruby’s String#split method that I thought I would share. My code needed to take data like this and parse it into an array of strings.
Each line of the data looked something like this and had a varying number of fields. Sometimes data would have missing fields:
I figured I would simply split the string based on the pipe character using something like this:
'Some|Fields|of random|data|||3'.split('|') #=> ["Some", "Fields", "of random", "data", "", "", "3"]
This works seems to work fine. Blank fields come across as empty strings. I used the length of the array to determine the number of fields contained in the string. In this case, seven fields of data, the 5th and 6th of which were empty. I thought this would work but I quickly ran into an issue:
'Some|Fields|of random|data|||'.split('|') #=> ["Some", "Fields", "of random", "data"]
Now I had a problem. The string of data contains seven fields, but the last three are empty. After splitting the string on the pipe character we get an array of only 4 items(fields). My program needed to know that there were three empty fields at the end. I did some research and it turns out that Ruby’s String#split method takes an optional second parameter. The second parameter is an integer which limits the number of fields returned. If omitted, all trailing null fields are ignored and not added to the array. If the second argument is a positive integer it limits the number of fields returned.
split('|', 0) would return an array with the entire string as the only element. But if the second argument is a negative integer, trailing null fields are added to the array as empty strings. That allows us to do this:
'Some|Fields|of random|data|||'.split('|', -1) #=> ["Some", "Fields", "of random", "data", "", "", ""]
Perfect! Now even with trailing empty fields in our data we are still able to tell how many fields are present.
Ruby’s String#split method makes parsing strings of data very easy. In my case all I needed to do was split the data on every newline, then take the array of strings returned and split each one on the pipe character.
Hope you found this useful!