This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the bash category.
Last Updated: 2024-11-21
How would you parallelize a slow operation in a bash loop?
I started off with the following:
while read -r key; do
aws s3api put-object-acl --bucket mybucket --key "$key" --grant-full-control 'emailaddress="someone@example.com"'
done < s3-keys.txt
This was too slow so I wanted a way to parallelize. I came up with the
combination of backgrounding the slow command with ampersand, limiting the total
number of processes, and waiting for all subprocesses to finish with the wait
command once a certain number were running.
N=50 # set number of processes
while read -r key; do
echo "Processing: $key"
# The double parentheses construct permits arithmetic evaluation, and we do
# some modulo math to determine whether to call `wait`
((i=i%N)); ((i++==0)) && wait
aws s3api put-object-acl --bucket mybucket --key "$key" --grant-full-control 'emailaddress="someone@example.com"' &
echo "Done\n"
done < s3-keys.txt