Sorting on time alone might be insufficient to guarantee a repeatable ordering

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the algorithms category.

Last Updated: 2024-11-21

In my invoice generation code, I discovered a lack of idempotency. When I ran it again on already processed data, it tried to assign different (but nearby) invoice numbers to the same orders from a CSV file.

The code was:

orders = csv.map {|row|
  order[:date]
}

orders.sort_by { |order| [order[:date]] }

When I inspected the date field, I noticed that it was granular only to the second and sometimes multiple orders happened during that same second. This caused indeterminacy, as seen below.

items = [{date: 1, thing: 1}, {date: 1, thing:2}, {date: 1, thing: 3}, {date:1, thing: 4}]

items.shuffle.sort_by {|d| d[:date]}
=> [{:date=>1, :thing=>2}, {:date=>1, :thing=>1}, {:date=>1, :thing=>3}, {:date=>1, :thing=>4}]

items.shuffle.sort_by {|d| d[:date]}
# !! Different ordering
=> [{:date=>1, :thing=>2}, {:date=>1, :thing=>1}, {:date=>1, :thing=>4}, {:date=>1, :thing=>3}]

This indeterminacy was a problem for invoice generation and also for general debugging, since I would get everything set up to tackle one bug and then another one would randomly show up. It's much nicer to debug any linear processing code with data that has a stable order.

Lesson