Fun With Ruby Hash Defaults

I recently reviewed a pull request wherein a coworker initialized a Hash to return an array as a default value and appended items to the arrays returned by hash keys as the program ran. When appending items to a Ruby array, using the shovel operator is usually preferred over plus-equals. The shovel operator is orders of magnitude faster because it appends a value to an existing array whereas plus-equals creates a new array every time (explored in more detail in this post by Alec Jacobson). However, using the shovel operator to populate our hash led to some “interesting” results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
my_hash = Hash.new([])
my_hash[:a]
# => []
my_hash[:b]
# => []
# so far so good...
my_hash[:a] << 1
my_hash[:a] << 2
my_hash[:a]
# => [1, 2]
# awesome
my_hash[:b]
# => [1, 2]
# less awesome

What the heck is going on here? When our hash can’t find a value, it does not simply return any array – it returns the array, the exact object we gave it when we called Hash.new with a default. When we use the shovel operator, we mutate that default object in place – hence we can alter it with any key not set with a new object.

How should we overcome this? We could just resort to plus-equals:

1
2
3
4
5
6
7
8
my_hash = Hash.new([])
my_hash[:a]
# => []
my_hash[:a].object_id
# => 70278248129880
my_hash[a] += [2]
my_hash[:a].object_id
# => 70278260560380

This works, but as previously mentioned, comes with a severe performance penalty. The better option is to pass a block to Hash.new:

1
2
3
4
5
6
7
8
my_hash = Hash.new { |hash, key| hash[key] = [] }
my_hash[:a]
# => []
my_hash[:a] << 1 << 2
my_hash[:a]
# => [1, 2]
my_hash[:b]
# => []

This approach gives us a new array object for every key so we can shovel to our heart’s content.