Evrim.

Invalidating caches when using many-to-many associations in Rails

When declaring a one-to-many association, invalidating caches in Rails is mostly as simple as using the touch parameter1:

class Article < ApplicationRecord
  belongs_to :category, touch: true
end

Now, whenever an article is updated, the timestamp of its category will also be updated.

This works great with a simple belongs_to association, but if we want to use a many-to-many mapping (has_and_belongs_to_many or has_many on both ends of the association), we can’t use the touch method, as it is not supported. We need another method of propogating changes from an object to its associated objects.

Updating timestamps of associated objects in a many-to-many mapping

Let’s say we have articles which can have multiple categories (and vice-versa), and we use a join model named Categorization:

class Article < ApplicationRecord
  has_many :categorizations
  has_many :categories, through: :categorizations
end

class Categorization < ApplicationRecord
  belongs_to :article
  belongs_to :category
end

class Category < ApplicationRecord
  has_many :categorizations
  has_many :articles, through: :categorizations
end

Now, let’s say whenever an article changes, we want to update the timestamp on its categories. As I already mentioned, we cannot do this:

class Article < ApplicationRecord
  has_many :categorizations

  # This does NOT work!
  has_many :categories, through: :categorizations, touch: true
end

So, we will have to manually write the method that updates the timestamp on associated objects, and then make sure it is called.

Let’s start with writing a method that would update the timestamp on all referenced categories:

class Article < ApplicationRecord
  # ...

  private

    def touch_categories
      categories.find_each(&:touch)
    end
end

When this method is called, we are calling the touch method on all the categories that are referenced.

Why not use each instead of find_each? When you call each, all the records are loaded into the memory, which might cause issues if you have many associated objects. find_each on the other hand loads these objects in batches.

Keep in mind that, this method will still load all the associated categories, and then call touch on them individually. Here are the SQL queries that would be produced by an article that has two categories:

Category Load (0.8ms)  SELECT  "categories".* FROM "categories" INNER JOIN "categorizations" ON "categories"."id" = "categorizations"."category_id" WHERE "categorizations"."article_id" = $1 ORDER BY "categories"."id" ASC LIMIT $2  [["article_id", 28], ["LIMIT", 1000]]

  (0.1ms)  BEGIN
Category Update (0.4ms)  UPDATE "categories" SET "updated_at" = $1 WHERE "categories"."id" = $2  [["updated_at", "2018-11-30 10:06:31.097594"], ["id", 2]]
  (1.0ms)  COMMIT

 (0.1ms)  BEGIN
Category Update (0.4ms)  UPDATE "categories" SET "updated_at" = $1 WHERE "categories"."id" = $2  [["updated_at", "2018-11-30 10:06:31.100250"], ["id", 5]]
 (0.3ms)  COMMIT

You can see that Rails first loads the two categories, and then updates their timestamps one by one.

There is another method we can use that would create a more performant SQL query:

class Article < ApplicationRecord
  # ...

  after_save :touch_categories

  private

    def touch_categories
      categories.update_all(updated_at: Time.now)
    end
end

And this produces the following SQL query:

Category Update All (54.9ms)  UPDATE "categories" SET "updated_at" = '2018-11-30 10:05:22.302699' WHERE "categories"."id" IN (SELECT "categories"."id" FROM "categories" INNER JOIN "categorizations" ON "categories"."id" = "categorizations"."category_id" WHERE "categorizations"."article_id" = $1 ORDER BY "categories"."title" ASC)  [["article_id", 28]]

Here, we updated the timestamp of all categories in one swoop. This is a simpler and relatively more performant query.

Now that we have our method, how do we make sure this is called whenever an article is updated? Luckily, this is a perfect use-case for Rails’ after_save callback:

class Article < ApplicationRecord
  # ...

  after_save :touch_categories

  private

    def touch_categories
      categories.update_all(updated_at: Time.now)
    end
end

So far so good, whenever an article is updated, its categories will be updated as well.

But what happens when an article is added to a category? Notice that, when this happens, the only thing that changes is a Categorization object. Our articles don’t have a database column referencing any categories, and vice-versa. The only thing that creates the association between articles and categories is the Categorization model. This means, we need to update our categories when a Categorization object that references it is updated. Here is how:

class Categorization < ApplicationRecord
  belongs_to :article
  belongs_to :category, touch: true
end

How about has_and_belongs_to_many?

What if we were using a has_and_belongs_to_many association, in which case we wouldn’t have a Categorization model? For this, Rails provides after_add and after_remove parameters:

class Category < ApplicationRecord
  # ...

  has_and_belongs_to_many :articles,
                          after_add: :touch_updated_at,
                          after_remove: :touch_updated_at

  private

    def touch_updated_at
      self.touch if persisted?
    end
end

In our touch_updated_at method, we are first checking if our record is persisted, then we are touching it. The reason for that is if we try to touch a new object (one that is not yet persisted in the database), Rails gives an error.

There is one small gotcha though. after_add and after_remove parameters only work if you add a new object to the association directly through the object that provides these parameters. It sounds a bit confusing, so I’ll just show what I mean in code:

# This will call the `after_add` callback.
category.articles << some_article

# This won't call the `after_add` callback.
Resource.new(categories: [some_category])

This is all I have on how you can invalidate caches when you are using many-to-many associations. Good luck!

  1. I’m using the term “invalidating caches” in a very loose sense here. Updating timestamps on an object doesn’t directly invalidate a cache, but as Rails by default looks at the timestamp of an object to check if it’s stale, updating timestamps generally “invalidates caches,” albeit indirectly. 


Sign up to my newsletter

Get notified when I publish something new.

Powered by ConvertKit