Be mindful of race conditions between long running cron processes

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the dumb-mistakes-and-gotchas category.

Last Updated: 2024-11-21

I accidentally paid some of my sellers twice once month at Oxbridge Notes causing an accounting nightmare.

How did this happen? There are multiple failures, but the most relevant one here is that I had a cron-like process call my script every 10 minutes, however this script had insufficient filters on time to ensure single exeuction.

# Executed once every 10 mins
namespace :accounting do
  desc 'Generate seller invoices for the previous month and pay them'
  task pay_monthly_commissions: :environment do
    if Time.current.day == 1
      PayMonthlyCommissionsService.build.call(1.month.ago.beginning_of_month) 
    end
  end
end

This filter on Time.current.day == 1 is too coarse. Sure it prevent execution on all days of the month except for the 1st. But on the 1st, it causes the payments loop to be run - AGAIN - ten minutes later and ten minutes later again etc.

Now I did have some checks deeper within the invoicing system to prevent double execution, but the issue was that this mechanism only went into action after the pay_monthly_commissions finished.

I never encountered any double payment issue for the first 10 years of my project... but one month, due to slow API responses, the script took longer than 10 minutes to finish... and that's when the problems started due to a race condition with a second invocation of this script.

The full fix had two components

  1. The most important bit was more granular locking of invoices to prevent double payment (not depicted here).
  2. From a scheduling point of view, I also ensured execution was within the first ten minutes of the hour, so as to prevent the situation where an 11-minute running-time would cause parallel execution (note that this solution only works up to an execution time of 1 hour so isn't infinitely scalabe.)
if Time.current.day == 1 && (Time.current.min  <= 10)
  ...
end