Menu Icon
< back to main
 • 4 min read

ActiveRecord’s Ghost Queries — a sneak attack on your DB performance

ORMs are awesomely dangerous to your DB health. One that has caught my attention lately is Ruby on Rail's ActiveRecord. Let's explore how…

ActiveRecord’s Ghost Queries — a sneak attack on your DB performance
Cameron Clifford
Cameron Clifford

Cameron Clifford works as a senior engineering manager for Cresta where he helps the team deploy secure code. With over a decade's experience in the cybersecurity sector, Cameron offers a plethora of expertise for cybersecurity best practices.

Join our community
learn moreArrow Right
Join our community
learn moreArrow Right

ORMs are awesomely dangerous to your DB health. One that has caught my attention lately is Ruby on Rail’s ActiveRecord. Let’s explore how ActiveRecord represents models and communicates with databases through a simple example. Here is a table representing people.

Person
-------------
person_id (primary_key)
family_id (foreign_key references family.family_id)
school_id (foreign_key references school.school_id)
name
age

Representing this table as a ruby object with ActiveRecord is as simple as:

class Person < ActiveRecord::Base  
  belongs_to :family
  belongs_to :school
end

Notice the belongs_to attribute. That is ActiveRecord magic for describing the foreign key relationship of family to person.

Now if a developer wants to gather information about people; commonly, this would happen:

people = Person.all
for person in people do
  family_name = person.family.name
  school_state = person.school.state
end

This is where ActiveRecord works its Ghost Queries. These are Ghost Queries because the ActiveRecord paradigm guides developers into calling these helper methods without the caution of performance. When the developer uses person.family they are actually calling a method that issues a SQL select statement instead of just pulling an attribute from memory of the prefetched record. Unlike other languages, when calling a method in ruby, developers are encouraged to leave off the parens. Transforming what would be person.family() in many languages to person.family. This distinction is important because without the parens a developer may believe that family is an attribute on the person and not a foreign relationship that requires a lookup call.

Who Gives?

So… why even bother to write about this? In most every Rails app, code patterns like the above can be found; however, many instances go unnoticed because of the small amount of data flowing through these code paths.

But imagine the performance costs at scale:

If the line from above, Person.all gathered 1000 records then the code would make 2000 queries (1000 from person.family.name and 1000 from person.school.state). Referencing the example from above:

people = Person.all # gets 1000 records
for person in people do
  family_name = person.family.name # will run 1000 times
  school_state = person.school.state # will run 1000 times
end

person.family.name runs this select:

select * from family where family_id = <person.family_id>;

Granted some of the queries would be cached (if two people came from the same family or school). This is still 2001 queries, even if they aren’t all unique. This is where your efficiency radar should be deafening.

Think if we used the ActiveRecord paradigm a bit smarter. We could use the belongs_to relationship where it makes sense — like when looking up one or two records. Say we have a thousand records though, let’s not exercise our database for no reason. Let’s be lazy and make the fewest amount of queries:

people = Person.all

# Gather all family ids
family_ids = people.map { |person| person.family_id }

# Make one query for all the families
Family.find(family_ids)

To put a few numbers into the mix on why this matters at scale here is some Postgres analysis.

SELECT family.* FROM family WHERE family.family_id = 149;

Planning time: 0.112 ms
Execution time: 0.088 ms

Running that query with 1000 different family_id‘s we would have a run time of 1000 * .120ms = 120ms

Even though in queries aren’t the fastest this is still 30 times faster:

SELECT  family.* FROM family WHERE family_id in (1, 2, 3, ...# all the way to 1000);

Planning time: 2.661 ms
Execution time: 1.464 ms

Total run time of this is about 4.1ms versus the 120ms alternative. Multiply the scale and the performance difference yielded by the two approaches is glaring.

Don’t be a victim, use your tools properly

ActiveRecord is a powerful tool and as such is packed with ways to shoot yourself in the foot. Make sure you know how ActiveRecord is querying your database. Chances are if you have a Rails app with some data you are probably a victim of Ghost Queries.

Pentester GuidesCobalt Core

Related Stories

Cybersecurity Statistics for 2021
Cybersecurity Statistics for 2021
What's new in ransomware, social engineering, and many other security threats
Read moreArrow Right
The State of Pentesting 2021: Common Vulnerabilities, Findings, and Why Teams Struggle With Remediation
The State of Pentesting 2021: Common Vulnerabilities, Findings, and Why Teams Struggle With Remediation
Each year, we publish The State of Pentesting report to provide a detailed overview of vulnerabilities and identify the trends and hazards that impact the cybersecurity community.
Read moreArrow Right
How to Build Resilience in Cybersecurity: 4 Lessons Learned From Military Experience
How to Build Resilience in Cybersecurity: 4 Lessons Learned From Military Experience
What better group to turn to for advice than security leaders who have worked on the front lines of risk and uncertainty?
Read moreArrow Right
New Ebook: Beginner’s Guide to Compliance-Driven Pentesting
New Ebook: Beginner’s Guide to Compliance-Driven Pentesting
Find out more about the role of pentesting in your company’s compliance effort.
Read moreArrow Right

Never miss a story

Stay updated about Cobalt news as it happens