Log in

No account? Create an account
30 January 2016 @ 09:38 am
Stupid ActiveRecord Tricks  
Per Denise's request...

So, I have this Service. One of the endpoints loads basically /all the data/ in the database, constructs a big json object, and send it over the wire.
As of Thursday morning, it was taking 179 seconds to generate that json on my laptop.
As of Thursday evening, it was taking between 4 and 5 seconds.

In /theory/, if you just /look/ at the code, and look at how things were being loaded, you might think it should have been that fast to start with. I was doing everything right, right? Here's what we were doing (massively simplified and slightly redacted)...

class Plan
  scope :with_components, ->{
                                  schedules: :tiers,
                                  statements: [

class ReportController
  def index                          
    plans = Plan.not_deleted.with_components.order(:id)
    render json: PlanReportDecorator.decorate_list(plans).as_json

Simple enough. Load all the plans, include all the associations that we're going to use to generate the json, and generate the json. Boom.

Thing is, when you actually generate the json, inside those decorators (and there are several levels of decorators here, but only two matter), it was loading more data from the database. And it would do that, loading counts or sums or individual records, about a dozen queries each, for every statement (dozens), for every plan (hundreds).

All those individual queries, taking fractions of a second each, add up quickly.

But WHY is it making all these queries? We eager-loaded EVERYTHING WE NEED.

Ok, so. Yes, we did. And ActiveRecord is really pretty clever as far as it goes, but sometimes it doesn't go far enough.

Let's take a specific example. Statements can be approved? or not. They are approved if there exists an Approval that has not been revoked. Simple enough accessor:

def approved?

And since we eager-loaded the approvals for all statements, this should be crazy fast, right?

Nope. Because we're accessing the not_unapproved scope on our approvals association, it makes a new db query. In theory, when the scope is known to be a simple where(), there's no /real/ reason that AR couldn't translate that into a .select{} on the loaded data, but it doesn't. So, new query. How do we fix that? If we /have/ all the approvals, don't load them again. Just look at what we have. If we do NOT have them, then do this scoped query, as it's the fastest option.

def approved?
  if @approved.nil?
    @approved = approvals.loaded? ? !!approvals.find(&:not_unapproved?) : approvals.not_unapproved.exists?

Memoize the result in @approved. If we haven't done so, it'll be nil. We can't use ||=, because that will still run when the value is false, we only want to set it when it's undetermined. Check if we have the association loaded. If so, use .find/.detect, using an accessor on the Approvals that implements the same logic as the scope. If we don't have them loaded, use the scoped query, and use .exists? instead of .any?, because it might possibly be ever so faintly faster. (on pg, it's SELECT 1 AS one FROM instead of SELECT count(*) FROM)

Changing things like that cut my load time from three minutes to about 50 seconds. The remainder was calculated sums on other associated records, displaying payment totals and such. It was causing three database queries per statement. Rather than loading those models each time, I added a json column to the database and stored the computed values at the time the statement is processed, and wrote a little caching wrapper I can tie to any method on the model to turn it into a write-once accessor.

def self.cached_detail_methods ; @cached_detail_methods ||= [] ; end
def self.cache_detail(*methods)
  methods = methods.flatten.map(&:to_s)
  methods.each do |method|
    class_eval <<-"EOF"
      def #{method}_with_detail_cache
        return self.detail_cache[#{method.inspect}] if self.detail_cache.has_key? #{method.inspect}
        self.detail_cache[#{method.inspect}] = #{method}_without_detail_cache
    alias_method_chain method, :detail_cache
  @cached_detail_methods ||= []
  @cached_detail_methods |= methods

Then added a before_save callback that will clear and repopulate the cache anytime we modify the statement (generally when it's being reprocessed).

And that change got rid of the last 45 seconds. At this point, there are less than a dozen database queries, all at the very start, and the json generation itself takes hardly any time at all.
sprockets, sockets, grommets & gasketsrandomdreams on January 30th, 2016 07:46 pm (UTC)
Totally makes sense when you walk through it but I don't know that I ever would have figured out why. That's one reason I'm reluctant to use more modern languages that have tons of stuff in libraries and calls. I don't actually know what they're doing and can't keep all that stuff in my head.
Martin Tithoniumtithonium on January 30th, 2016 07:58 pm (UTC)
Yeah, even by rails standards I'd consider this arcane. This is the first time I'd ever really looked at this endpoint, despite having owned it for the past nine months or so. All I'd done before is convert it from jbuilder views to decorator classes, without really analyzing what it was /doing/.