Slightly off Rails

Slightly off Rails

I use Rails both professionally and personally for more than a decade now. Through the years, my usage of Rails also evolved together with new concepts and trends in web development. Here are some of the personal changes I am applying moving forwards on how I work with the framework.

Authentication using JWT

JWT.io
https://jwt.io/

JSON Web Tokens (JWT) provides a way to effectively secure API endpoints. As web applications move more to the front-end with the popularity of Javascript frameworks like React and Angular, JWT ensures that the connection between the front-end and the back-end remains secure.

However, I don’t believe that we should reinvent the wheel, especially when it comes to authentication. Thus I will still continue to use the devise gem. This library has been a staple in Rails applications for many years now and as a result has already gone through a lot of patching and bug fixes.

There’s also a gem called devise-jwt that makes it easy to integrate devise authentication using JWT, and I am using this library for present and future projects.

What this gem does is that after a user signs in, it automatically adds a JWT token to the Authorization header. This token can then be used by the front-end code to submit requests back to the server. The server checks the token and validates if the user has access to the endpoint or resource. Using the gem, this is being done automatically, and that user is available through the current_user variable.

You can still use the (default) session authentication as well, so you can have both a web application and a front-end client application (like a mobile app) using the same authentication mechanism.

dry-rb

Rails is an opinionated framework. This means it has its own way of doing things, and by following them it can make your work simpler and faster. These methods utilize the full features of the framework and lets the framework do the heavy lifting for you. Some people do not like this approach as it makes things less transparent and “magical”.

While Rails is opinionated, it is not restrictive. You can ignore built-in features or build your own without having the need to abandon the framework altogether.

https://dry-rb.org/

There are several libraries that enable you to work outside the Rails conventions. Some of the most popular ones are Trailblazer and dry-rb. Moving forward, I intend to use dry-rb more as I found that their modular approach to libraries is a great way to implement projects.

Here are some of the concepts that I found useful in dry-rb:

Transactions and Operations

Using the transaction concept allows us to think of a feature in terms of steps. This in turn enables us to break down a feature into smaller parts. It also allows for easier debugging and refactoring of code as we can create steps that essentially just perform one specific task of the whole.

As an example, let’s say we want to create an event and send an email to its participants. In a transaction, we can break this down into three steps: checking the event form parameters, creating the record, and then sending the emails.

require "dry/transaction"

class CreateEvent
include Dry::Transaction

step :validate_parameters
step :create_event
step :send_email, with: 'event.operations.send_email'

private

def validate_parameters(params)
# returns Success(valid_params) or Failure(errors)
end

def create_event(params)
# returns Success(event)
end
end

You can see that the validate_parameters and create_event methods are simply private methods of the transaction class. The send_email method however is not present in the class itself but is invoked as an external operation. For this example, we can have an operation class called Event::Operations::SendEmail.

Using operation classes allows us to create reusable functions. These operations can be called on any other transaction (or any other code really). In addition to being reusable and modular, it is also easy to test as each operation performs only one specific function.

Monads

Notice that the return values of the methods are Success() and Failure(). These are called monads and they help structure your responses so that methods communicate with each other more effectively. The value I get from this is it forces me to think of failure scenarios in the code. I can then handle success and failure conditions effectively without resorting to complex logic or nested if-else conditions.

Validations

dry-rb also has a library called dry-validation. Using this library you can define your own custom schema and validation rules. In Rails, the default schema and validation is usually tied to your database models. dry-validation provides the same feature that you can use on any code in your application.

It also enables you to define types in each of the fields in your schema. This can simplify your code as you are guaranteed that each field has the correct type. This also reduces potential bugs in your application.

Here is an example found in their official documentation. It describes how to use dry-validation in setting up the schema and the validation for a new User record:

class NewUserContract < Dry::Validation::Contract
params do
required(:email).filled(:string)
required(:age).value(:integer)
end

rule(:email) do
unless /\A[\w+-.][email protected][a-z\d-]+(.[a-z\d-]+)*.[a-z]+\z/i.match?(value)
key.failure('has invalid format')
end
end

rule(:age) do
key.failure('must be greater than 18') if value < 18
end
end read more

Rubyist’s intro to Python web development

I have been using Ruby professionally for more than a decade now. Until recently, I haven’t explored much outside of the Ruby and Rails community. That changed however after I completed a course in Foundations of Data Science. This made me curious about Python and how to build applications using it.

Python and Ruby have many similarities. Both are interpreted, high-level programming languages. Python also has support for Object-Oriented Programming and Functional Programming. In terms of syntax, they have a similar look and feel, aside from some fundamental differences such as Python being indent-driven.

You may find this article to be very similar to the Ruby on Rails guide I posted years ago. This is not accidental since my goal is to introduce Python web application development to someone who is already familiar in the Ruby space.

Installing Python

The very first step is to install Python itself in your computer. I recommend using pyenv to manage your Python versions. pyenv is a Python version manager, like rbenv. In fact, pyenv is a fork of rbenv and is re-purposed for Python. To install pyenv:

curl https://pyenv.run | bash

After installing, update your login shell configuration by adding the following, e.g. in ~/.zshrc

export PATH="/home/user/.pyenv/bin:$PATH" eval "$(pyenv init -)" eval "$(pyenv virtualenv-init -)"

Now we are ready to use pyenv to install a specific Python version. But first, we need to install some dependencies so we can compile Python from source:

sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev \
libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev \
xz-utils tk-dev libffi-dev liblzma-dev python-openssl

Then, we can easily install a Python version, like 3.6.8 in this example:

pyenv install 3.6.8

If you are having trouble installing Python, it could be related to the OpenSSL version installed in your machine.

On Debian stretch (and Ubuntu bionic), libssl-dev is OpenSSL 1.1.x,
but support for that was only added in Python 2.7.13, 3.5.3 and 3.6.0.
To install earlier versions, you need to replace libssl-dev with
libssl1.0-dev. This is being tracked in
https://github.com/pyenv/pyenv/issues/945.

Once Python has been installed, you can opt to set the version (e.g 3.6.8) as your global version. This makes the Python executable available to all terminal sessions:

pyenv global 3.6.8

Package Management

pip is Python’s package manager, like rubygems or npm. If you installed Python, pip should also be available for you. If for some reason it is not installed, you can install it using the guide here.

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py $ python get-pip.py

Application-specific Packages

In Ruby/Rails we use the awesome library Bundler to handle application package management. We manage the packages using a Gemfile, and it gets converted into Gemfile.lock.

pipenv is similar to bundler, but its functionality extends beyond package management in the application. In this article we will use it similar to bundler so it will handle all the application package dependencies. To install pipenv, just use pip!

pip install pipenv

To specify the application packages, pipenv uses a Pipfile. An example is given below:

[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]

[packages]
requests = "*"
flask = "*"
python-dotenv = "*"
flask-sqlalchemy = "*"
flask-migrate = "*"

[requires]
python_version = "3.6" read more

Rubocop + vim

Rubocop + vim

Code linters such as Rubocop ensure consistent, clean code throughout your application. If all developers are using the same linter configuration, then you can be sure that any code that you encounter is organized in the same way regardless of the author.

One challenge in using linters is that it should have immediate feedback, or else it may disrupt the coding flow of the user. For example, you can run the linter manually after you make your changes, but as this is a manual process it can be easily forgotten.

Commit Hooks

To solve this, we can use pre-commit hooks (such as in git) to automatically run the linter script whenever we try to commit our changes. In this manner, it is an automated process and so will not be skipped in development.

For example, here is a git pre-commit hook for Rubocop (from
https://gist.github.com/mpeteuil/6147292 and
https://github.com/grosser/dotfiles/pull/7 )

#!/usr/bin/env ruby
#
# put this file into your path and use `<file> install` to add a new hook
# or use it as a binary to check changed files

require 'shellwords'

if ARGV == ["install"]
exec "ln", "-sf", __FILE__, ".git/hooks/pre-commit"
else
raise unless ARGV == []
end

changed = `git status --porcelain`.
split("\n").
map { |l| l.split(" ", 2) }.
select { |status, _| ["A", "AM", "M"].include?(status) }.
map { |_, file| file.delete('"') }

exit if changed.empty?

result = `bundle exec rubocop --force-exclusion #{changed.shelljoin}`
puts result unless $?.success?
exit $?.exitstatus read more

DelayedJob Survival Guide

DelayedJob Survival Guide

One day at work I noticed that emails were taking much longer to be sent out from our app. I narrowed down the problem to our background queue which is responsible for sending out the emails. The solution prompted me to write this DelayedJob “survival guide”to help others who may encounter this issue in the future.

Asynchronous processing/background processing is an important part of a web application. This ensures that running code is not blocking the rest of the process if that code does not need to run synchronously. Common examples are sending emails or code that depends on a third-party API or service.

There are many solutions for this, such as Redis-backed programs like Sidekiq or Resque. There are also database-backed programs like DelayedJob. The advantage of using a database-backed solution is its simplicity: you don’t need an external dependency (such as Redis) to run it. Instead, you can use your existing database to manage your background processing.

This simplicity also has a disadvantage: you are now constrained by your database and database issues can directly affect your background processing.

The Problem

We had a new feature which required processing old data in the system. This uses the background queue as it takes a few seconds to process each individual task. Eventually these tasks accumulated, resulting in more than half a million jobs in the DelayedJob queue.

As I noticed that the queue is not getting processed as fast as I expected, I looked at the database logs. In the MySQL slow query logs, I noticed that almost all entries look like this:
UPDATE delayed_jobs
SET `delayed_jobs`.`locked_at` = '2018-06-05 11:48:28',
`delayed_jobs`.`locked_by` = 'delayed_job.2 host:ip-10-203-174-216 pid:3226'
WHERE ((run_at <= '2018-06-05 11:48:28'
AND (locked_at IS NULL OR locked_at < '2018-06-05 07:48:28') OR locked_by = 'delayed_job.2 host:ip-10-203-174-216 pid:3226')
AND failed_at IS NULL)
ORDER BY priority ASC, run_at ASC
LIMIT 1;
DelayedJob updates the locking information (timestamp and PID) when processing jobs. However, this UPDATE call in the database does not use the index in the table, at least for old MySQL versions (5.6 or below). As the number of entries in the queue increases, this UPDATE call becomes much slower.

This is the problem with database-backed asynchronous queues: the database is used both as a state manager and the queue storage/retrieval, resulting in locking.

Emergency Processing

Since the queue processing is becoming really slow, some critical tasks were not being performed. Thus we needed to run some jobs manually (using the Ruby/Rails console). We can invoke a DelayedJob worker manually using this command:

Delayed::Worker.new.run(delayed_job_object)

However, we may want to run all tasks in a given queue, let’s say the important_queue. We can query the database for all tasks under the queue and invoke the worker manually for each:

Delayed::Job.where(queue: "important_queue").find_each do |dj|
  Delayed::Worker.new.run(dj)
end

In this manner we were able to quickly resolve some critical tasks that needed to be run immediately. However, this is not a scalable solution as everything is done manually. This also won’t solve the problem of having hundreds of thousands of tasks in the backlog.

Queue “Storage”

Searching the internet, I found that there were others who also encountered this problem. Their solution was documented here and here. The main gist of the solution is to temporarily remove most (or all) of the items in the delayed_jobs table into a separate table to “unclog” the background queue.

In this example, we will create a new table called delayed_jobs_storage with the same columns as the original delayed_jobs table. The examples also assume we are using MySQL as our database:

CREATE TABLE delayed_jobs_storage LIKE delayed_jobs;

Once the “storage” table has been created, we can now move the jobs into that new table. In this example, we will limit the query to only move jobs that are under the huge_queue queue.

INSERT INTO delayed_jobs_storage (SELECT * FROM delayed_jobs WHERE queue='huge_queue');

Then we remove the jobs that we moved from the original delayed_jobs table:

DELETE FROM delayed_jobs WHERE queue='huge_queue';

At this point, the background processing speed returns to normal as the size of the table is now greatly reduced. The next step is to gradually move back some jobs from the delayed_jobs_storage table into the delayed_jobs table so they are processed.

This involves some trial and error as we want to determine the optimal number of jobs that we can transfer. We want it so that we can move the largest amount of jobs without slowing down the queue. In my experiment, I determined that we can transfer up to around 100k jobs back to the queue without impacting the performance.

To move the first 100k jobs back into the delayed_jobs table:

INSERT INTO delayed_jobs (SELECT * FROM delayed_jobs_storage ORDER BY id ASC LIMIT 100000);

Then we need to remove those jobs from our “storage” table:

DELETE FROM delayed_jobs_storage ORDER BY id ASC LIMIT 100000;

We wait until all the jobs have been processed and the queue goes back to its minimal state. After which we repeat the process again until all of the jobs stored in delayed_jobs_storage have been moved back to the delayed_jobs table.

Afterthoughts

While this workaround will get you out of a bind when your backround queue is clogged, it is not a long-term solution. As much as possible we want to avoid this scenario happening in the first place!

Here are some ideas that you can implement:

  • Analyze each background job to see areas of optimization. If the code that is running in a job is not optimized, it will run slower and will consume more resources. Check your database queries and your code performance to make sure they are running as fast as possible. For example, add table indexes and remove N+1 queries.
  • Reorganize how you add jobs to the background queue. Sometimes we just add tasks to the queue without thinking about how it impacts the rest of the jobs. Can you make your code add less to the queue by removing redundancy? Does combining smaller jobs into a larger job make sense? Are longer-running jobs of lower priority than faster ones?
  • Consider moving to a Redis-based solution such as Sidekiq. This will make sure that your dependency to your main database is eliminated and allows you to use a separate (and more efficient) storage of your background jobs.

Photo by James Pond on Unsplash read more