This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the web-development category.
Last Updated: 2024-11-21
I had a difficult deploy that involved changing how binary files, the core of my product, are stored. I decided to take production off-line for the deploy, in order to prevent entries in in-between states from occurring.
Basically my migration was a giant loop through 8 types of a upload, followed by persisting the file using the new system.
upload_types = [
# Format: table_name, upload_file_type
%w[notes_files data],
%w[notes_files backup_data],
%w[notes_files sample],
%w[sellers doc],
%w[taxons icon],
%w[temporary_s3_files download],
%w[tutors photo],
%w[zip_files zip]
]
upload_types.each do |table_name, paperclip_attachment_name|
save_in_the_new_way(..)
end
As I ran this script I realized it was very slow - unacceptably slow given that production was offline. Moreover, I realized that it was a mistake to plan this only for one thread/process. It would be better to have 8 (or more) workers running though this process simultaneously, especially since it was IO-bound code.
One easy way would be to parameterize, such that each run accepts an index in
upload_types
and runs with that. Another could be to have competing scripts
working backwards on the same collection at the same time. Or a queue system.