iNat started crashing periodically this weekend, and after numerous attempts to figure out what was wrong, I finally found the culprit with some new-to-me tools: gdb and gdb.rb. For each crash load average was huge (like 10), and it was clearly the rack processes that were eating all the CPU (memory and swap were fine). Restarting nginx and/or killing the rack processes brought things back up, so clearly it was those hung processes that were killing things. Strangely, the rails logs seemed to be humming along normally (probably b/c not all procs were hung). There was some uptick in traffic around each crash, but not like a DOS-level uptick. After ruling out memory leak, DOS, and Rackspace issues (they claimed everything was fine), I figured there was an app-level issue, but none of the changes immediately preceding the crashes looked like they could be causing infinite loops or the like, sooooo I was at a loss.

I talked to n8 and he got me thinking about tools that look directly at the current state of the server processes, strace, gdb, and the like. gdb, which lets you look at what a live process is doing (if it was written in c, c++, or a number of other languages) ended up being the most useful, when combined with gdb.rb, which actually lets you execute ruby commands within the context of a live ruby process. I suck at interpreting C code from the interpreter, but with ruby I’m home. Inspection went something like this:

# install deps on Ubuntu and gdb.rb gem
sudo apt-get install gdb python-dev ncurses-dev
gem install gdb.rb
 
# wait for crash to happen again... and wait... and wait...
 
# CRASH! Attach to hung process by PID (use rvmsudo if you use rvm)
rvmsudo gdb.rb PID
 
# in gdb get a ruby stacktrace with file names and line numbers
# here I'm filtering by files that are actually in my app dir
(gdb) ruby eval caller.select{|l| l =~ /app\//}

That outed some infinite recursion from a totally unexpected place that I was able to fix in a few minutes. Unfortunately running ruby eval in gdb occasionally crashed the process, I think when the proc received a signal while gdb was attached, but stubborn repetition eventually got me my trace. Also, not all of gdb.rb’s commands seemed to be available, like ruby trace and ruby threads. I basically only had ruby objects and ruby eval, not sure why, but ruby eval was all I needed.

Anyway, there are tons of posts about using gdb to inspect ruby procs out there, but almost all of them are really old, and it took me a while to find the right tool for this job, so maybe this post will help someone. The best overview I found was this post from Big Nerd Ranch. This gist by tmm1 (author of gdb.rb) also collects a bunch of useful information about a hung process by combining output from multiple tools.

Recently started using RGeo a bit (I know, way late, but hey, it’s pretty awesome!), and ran into a production problem when using contains?

RGeo::Error::UnsupportedOperation: Method Geometry#contains? not defined.

RGeo depends on some kind of underlying geometry library, usually GEOS. In Ubuntu, the headers and source files for GEOS don’t get installed by default when you install geos, so the solution is to install them, as it is with so many of these issues:

sudo aptitude install geos-dev
bundle exec gem uninstall rgeo
bundle update rgeo

Update from April24, 2014

In Ubuntu 13.10 (saucy) package names are a bit different. Not sure if both of these are required, but the ++ one certainly is:

sudo aptitude install libgeos-dev
sudo aptitude install libgeos++-dev

bundler cheat

Dependency management that barely solves more problems than it creates.

# reinstall a single gem (why can't this be gem reinstall gemname?!)
bunde exec gem uninstall gemname
bunde update gemname