This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the algorithms category.
Last Updated: 2024-11-21
In my invoice generation code, I discovered a lack of idempotency. When I ran it again on already processed data, it tried to assign different (but nearby) invoice numbers to the same orders from a CSV file.
The code was:
orders = csv.map {|row|
order[:date]
}
orders.sort_by { |order| [order[:date]] }
When I inspected the date field, I noticed that it was granular only to the second and sometimes multiple orders happened during that same second. This caused indeterminacy, as seen below.
items = [{date: 1, thing: 1}, {date: 1, thing:2}, {date: 1, thing: 3}, {date:1, thing: 4}]
items.shuffle.sort_by {|d| d[:date]}
=> [{:date=>1, :thing=>2}, {:date=>1, :thing=>1}, {:date=>1, :thing=>3}, {:date=>1, :thing=>4}]
items.shuffle.sort_by {|d| d[:date]}
# !! Different ordering
=> [{:date=>1, :thing=>2}, {:date=>1, :thing=>1}, {:date=>1, :thing=>4}, {:date=>1, :thing=>3}]
This indeterminacy was a problem for invoice generation and also for general debugging, since I would get everything set up to tackle one bug and then another one would randomly show up. It's much nicer to debug any linear processing code with data that has a stable order.