Be sure to empty cyclical data buckets before reusing them

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the dumb-mistakes-and-gotchas category.

Last Updated: 2024-11-23

I had the following code to show how many visitors were currently on a website as part of some social-proof marketing stuff. Basically a little overlay would say "21 other people are currently browsing".

def track_hit(request)
  # Add ip to a set so that I don't count the same IP twice
  redis.sadd(current_visitors_key(now), request.ip)
end

def current_visitors_key(time)
  "Traffic::current_visitors::#{time.hour}/#{time.min}"
end

def current_visitor_count
  previous_time = now - 1.minute
  # scard — gets the cardinality of the set — i.e. how many members it has
  [redis.scard(
    current_visitors_key(previous_time)
    # The 1 is present below because, at the very least, whoever is seeing the pop-up is there
  ).to_i, 1].max
  end
end

A few days after deploying, I noticed two issues: the figures were too high and my redis memory usage was inexplicably growing. Why? Because I never cleared out yesterday's IP addresses from the cache keys, yet these same keys got cycled around again the next day, at the same hour and minute, accumulating ever more IP addresses.

What I should have done was have a background job to empty the keys daily.

class ResetTrafficKeys < ApplicationJob
  def perform
     # Loops through keys and empties the set. To be run daily
    Traffic.new.reset_traffic_keys
  end
end

Lesson

If you gather data into cyclical time buckets (e.g. times of the day), be sure to empty them before adding data on the next cycle.