Use parallelism to speed up maintenance mode deploys

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the web-development category.

Last Updated: 2024-11-21

I had a difficult deploy that involved changing how binary files, the core of my product, are stored. I decided to take production off-line for the deploy, in order to prevent entries in in-between states from occurring.

Basically my migration was a giant loop through 8 types of a upload, followed by persisting the file using the new system.

upload_types = [
  # Format: table_name, upload_file_type
  %w[notes_files data],
  %w[notes_files backup_data],
  %w[notes_files sample],
  %w[sellers doc],
  %w[taxons icon],
  %w[temporary_s3_files download],
  %w[tutors photo],
  %w[zip_files zip]
]

upload_types.each do |table_name, paperclip_attachment_name|
   save_in_the_new_way(..)
end

As I ran this script I realized it was very slow - unacceptably slow given that production was offline. Moreover, I realized that it was a mistake to plan this only for one thread/process. It would be better to have 8 (or more) workers running though this process simultaneously, especially since it was IO-bound code.

One easy way would be to parameterize, such that each run accepts an index in upload_types and runs with that. Another could be to have competing scripts working backwards on the same collection at the same time. Or a queue system.