Sidekiq Metrics Monitoring

Redis and Sidekiq are popular libraries used by Ruby applications to handle features like caching, queues, and background processing. Capabilities like these are important for growing web applications, and as such require monitoring to ensure they are running smoothly.

There are several solutions in the market that you can use to monitor your Redis instance and check for potential problems. But did you know that Redis and Sidekiq already provide useful APIs you can use to roll out your own monitoring solution?

For our simple monitoring system we will focus on three main areas:

Memory usage
Queue health
Largest keys

A problem with any of these areas could result in your background processing getting clogged. Worse, your instance could run out of memory and crash, causing data loss.

Memory Usage

When you add a new key in Redis, it first checks the current memory usage. If it goes beyond the maxmemory setting in the Redis configuration, then it will trigger what is known as the eviction policy. This is a predefined algorithm that Redis uses to decide what to do with the key you want to add so it can be saved.

Eviction policies

From the official documentation we can set one of these eviction policies:

noeviction: New values aren’t saved when memory limit is reached. When a database uses replication, this applies to the primary database
allkeys-lru: Keeps most recently used keys; removes least recently used (LRU) keys
allkeys-lfu: Keeps frequently used keys; removes least frequently used (LFU) keys
volatile-lru: Removes least recently used keys with the expire field set to true.
volatile-lfu: Removes least frequently used keys with the expire field set to true.
allkeys-random: Randomly removes keys to make space for the new data added.
volatile-random: Randomly removes keys with expire field set to true.
volatile-ttl: Removes keys with expire field set to true and the shortest remaining time-to-live (TTL) value.

How your application uses Redis (as a cache, as a background queue, as a messaging system) should determine what eviction policy works best. If you don’t want any of your keys to be yanked when memory runs out though, then a noeviction policy makes sense. The other policies could result in data loss as it removes existing keys automatically.

However, there is an important caveat for the noeviction policy. In case the memory actually reaches the maxmemory setting, Redis will return an error anytime your application tries to add a new key. If you are not able to handle this error, then the data that should be stored in that key will be lost.

Regardless of the eviction policy that you choose for your application, it is important to monitor memory usage to prevent loss of data. Redis provides a way to get the current memory usage using the info command.

require "redis"

redis = Redis.new(host: 'localhost', port: 6379)
redis.info

This returns several statistics, a subset of which looks like this:

{
     "redis_version"=>"6.2.14",
     "used_memory"=>"16687624",
     "used_memory_human"=>"15.91M",
     "used_memory_rss"=>"27144192",
     "used_memory_rss_human"=>"25.89M",
     "used_memory_peak"=>"31275128",
     "used_memory_peak_human"=>"29.83M",
     "used_memory_peak_perc"=>"53.36%",
     "used_memory_overhead"=>"2398998",
     "used_memory_startup"=>"583528",
     "used_memory_dataset"=>"14288626",
     "used_memory_dataset_perc"=>"88.73%",
     "total_system_memory"=>"16456404992",
     "total_system_memory_human"=>"15.33G",
     "maxmemory"=>"52428800",
     "maxmemory_human"=>"50.00M",
     "maxmemory_policy"=>"noeviction"
 }

Memory stats

Data returned by this command include server details, clients, CPU, memory, command statistics, and error statistics. For memory usage monitoring, we will focus on memory-related information, such as:

total_system_memory – The total amount of memory that the Redis host has
maxmemory – The value of the maxmemory configuration directive
used_memory – Total number of bytes allocated by Redis using its allocator
used_memory_peak – Peak memory consumed by Redis (in bytes)

Each of these has a human readable counterpart, e.g.
total_system_memory (16585551872) has a corresponding total_system_memory_human (15.45G) value.

One important thing to consider is that it is possible for Redis to consume memory greater than the maxmemory setting. The maxmemory setting determines the threshold when the eviction policy kicks in, and is based on the memory consumed by Redis keys. However, as there are additional memory overhead during processing, the total consumed memory can be greater than the maxmemory setting. As of this writing, this is being discussed in this Github issue thread.

Thus it is useful to monitor the memory usage (used_memory) as a percentage of two metrics:

Current memory usage as a percentage ofmaxmemory in order to preemptively check the instance before and during the application of the eviction policy.
Current memory usage as a percentage oftotal_system_memory to check if Redis usage is nearing the system limit and prevent an out of memory (OOM) condition. This is especially important if maxmemory is not enabled (set to 0).

Sidekiq Queue Health

Sidekiq jobs run within a queue (with default as the default queue). To make sure there are no processing delays, we need to ensure that these jobs don’t get clogged up in their queue. For monitoring queue health, it is important to have visibility on the following:

Number of jobs for each queue
Latency per queue

If there are a large number of jobs enqueued in a particular queue, this can cause delays in processing, not just on the queue itself, but potentially on other queues if they share the same or of higher priority. Additionally, even if the number of jobs in a queue looks reasonable, delays can still occur if jobs take a lot of time to finish.

Sidekiq already provides an endpoint that returns some basic information regarding queues. To fetch this data, just send a GET request to /sidekiq/stats. It returns a JSON payload that looks something like:

{
  "sidekiq": {
    "processed": 75000000,
    "failed": 68000,
    "busy": 1,
    "processes": 2,
    "enqueued": 181,
    "scheduled": 43,
    "retries": 34,
    "dead": 438,
    "default_latency": 0
  },
  "redis": {
    "redis_version": "6.2.14",
    "uptime_in_days": "14",
    "connected_clients": "21",
    "used_memory_human": "18.44M",
    "used_memory_peak_human": "80.15M"
  },
  "server_utc_time": "10:49:51 UTC"
}

This already lists the total number of jobs in each queue, but it only returns the default_latency, which is the latency of the default queue. Unless your application is simple, chances are the default queue isn’t being used at all. Usually, more specific queues are created to separate high priority jobs to lower priority ones, or to group jobs by feature, etc.

Sidekiq stats

To retrieve more details on the queues, we can use a class that Sidekiq provides called Sidekiq::Stats. To return the current number of jobs for each queue:

Sidekiq::Stats.new.queues

which returns a hash with the queue as the key and the value is the number of jobs enqueued:

{
     "low_priority" => 441,
     "medium_priority" => 100,
     "high_priority" => 36
 }

In order to get the latency for each queue, we need to use the Sidekiq::Queue class instead and iterate through each queue to get the latency value:

Sidekiq::Queue.all.map do |queue|
  [queue.name, queue.latency]
end.to_h

This returns a similar hash where the value is the latency. This value measures how long jobs have been waiting in the queue before being processed by a worker (in seconds):

{
     "low_priority" => 796.2188236713409,
     "medium_priority" => 89.05578780174255,
     "high_priority" => 5.320014587405
 }

Note that if a queue has no jobs, its latency is 0.

Largest Keys

Another metric to look out for is the presence of large keys. These keys contain a large amount of data, and when multiple keys are like this, consume much needed memory. This can result in your application hitting the maxmemory limit and trigger the Redis eviction policy.

Redis monitoring services also give information about largest keys. Using Redis’ API, we can also obtain this list within the application and allow us to build custom monitoring, alerting, and mitigation services as well.

The key to getting the largest keys is by using the MEMORY USAGE Redis command, as described in the documentation. This command returns the number of bytes used by a specific key.

Given a Redis connection, first we get all the keys:

redis = Redis.new(host: 'localhost', port: 6379)

keys = redis.keys('*')

And then using the MEMORY USAGE command, get the usage for each key using map:

key_sizes = keys.map { |key| [key, redis.memory('usage', key)] }

To be able to see the largest keys, we need to sort them by the number of bytes:

largest_keys = key_sizes.sort_by { |_, size| -size.to_i }

For an application with thousands of keys (which is common), this data is going to be a bit unwieldy. We are only interested in the first few largest keys to dig deeper into, and so we can just take the first 50 largest keys to start our analysis:

first_50_largest_keys = largest_keys.take(50)

first_50_largest_keys.each do |key, size|
   puts "Key: #{key}, Size: #{size} bytes"
 end

From here, you will have visibility on which keys consume most of the memory and begin to optimize them. If you find that it is no longer possible to reduce their memory footprint, then it could be a signal that you need to increase the available memory for your Redis instance to avoid losing data.

Photo by Brett Sayles