Multiprocessing and servers

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the concurrency category.

Last Updated: 2024-11-23

Be careful initialization code does not run N times for each worker

I had some code that downloaded 1000 markdown files from a GitHub repo at boot and cached them in RAM. I noticed, however, that this was happening once in each of my worker processes, not only wasting resources, but causing conflicting behavior.

The solution was to switch based on whether it was a server or a worker process. In rails this entailed doing the following

config.after_initialize do
  CodeDiary.load_articles_into_memory if defined?(::Rails::Server)
end

In Django with gunicorn, I needed a custom solution, where I touched a file during worker 1 then relied to on whether that file was touched or not to branch (such that workers 2-12 did not actiate it).

With multiple dynos on Heroku, I also set an ENV var per dyno and used this to prevent excess work.