Not a new problem, but something I thought would have been easier using open source software, thus I’m documenting my solution. With some research and experimentation, I adapted this script into something that will take a collection of images of text (e.g. pages from a book or a paper) and convert them into a PDF you can search. You will need to install some other packages, and my instructions here assume you’re using homebrew on a Mac, but the script should be adaptable to any platform that can run tesseract, imagemagick, and ghostscript.

I will say that it’s is WAY slower than the hard-coded OCR functionality on some scanner / printers I’ve seen. Not sure why.

Just tried to set up cocoapods and ran into this error

checking for -std=c99 option to compiler... yes
checking for CoreFoundation... no
checking for main() in -lCoreFoundation... no
CoreFoundation is needed to build the Xcodeproj C extension.

Dislike. Finally found an answer that worked for me and didn’t resort to circumventing my homebrew / rvm setup, though it did involve installing a new ruby:

brew link autoconf
rvm install ruby-2.0.0-p353 --with-gcc=clang --verify-downloads 1
rvm use ruby-2.0.0-p353
gem install cocoapods

windshaft on ubuntu

Just set up Windshaft on Ubuntu for the first time, and since there were a few hiccups I figured I’d give a rough outline of my installation process:

# Use mapnik 2.2 (
# I was getting some errors with interactivity requests on an older version of
# mapnik
sudo add-apt-repository ppa:mapnik/v2.2.0
sudo apt-get update
sudo apt-get install libmapnik libmapnik-dev mapnik-utils python-mapnik
# Uninstall system gyp, it was causing problems for me when compiling mapnik
# node extensions, e.g. gyp 'module' object has no attribute 'script_main'
sudo aptitude remove gyp
# Use nvm ( instead of the package nodejs,
# which didn't seem to work for me, though that may have been tangled up with
# the mapnik 2.2 issue. Either way, nvm should install a working version of
# nodejs w/ npm.
curl | sh
nvm install 0.10.26 # or whatever works
# install windshaft with npm
npm install windshaft

I’m also using nginx and Passenger to serve the web app. I’ll assume you know how to install those two things and skip to my nginx conf:

http {
    passenger_root /path/to/passenger;
    passenger_ruby /path/to/ruby;
    passenger_nodejs /home/inaturalist/.nvm/v0.10.26/bin/node;
    server {
      listen 80;
      passenger_enabled on;
      passenger_app_root /path/to/app;
      passenger_document_root /path/to/app/public;
      error_log /var/log/nginx/your-error.log;
      access_log /var/log/nginx/your-access.log;

One thing that threw me for a while is that Passenger won’t work with older versions of node (see, so make sure you’re using 1.0 or higher.

Also note that console.log in Node will write to /var/log/nginx/error.log, not to one of your server’s custom log files.

ridiculous rails boot times

I recently ran safe-upgrade on Ubuntu, which involved updates to a bunch of stuff, including Linux headers and postgres, but now my Rails boot time is now 3x longer. I still don’t know why, which is frustrating, but I did learn about a few things along the way.

The first is Bumbler, a tool for inspecting gem load times, among other things.

The second is Passenger’s passenger_start_timeout setting, which is how I’m addressing my problem without really addressing my problem.

The third is that Rackspace now has a “Performance” VPS product that seems to be both faster and cheaper than their old VPSs. Unfortunately transitioning is non-trivial, since you can’t do it for 1st gen cloud servers and if you want to create a Performance Cloud Server from an Next Gen image you can only do it from a 1GB Next Gen server.

# reproject into WGS84 lat/lon
gdalwarp -t_srs EPSG:4326 -dstnodata 0 input.tif output.tif
gdal_translate -of vrt -expand rgba output.tif output.vrt -p geodetic -k output.vrt

This mostly works, but the nodata from the original GeoTIFF doesn’t get preserved as a PNG alpha channel in the KMZ tiles. Still need to figure that out.

visualizing a rails schema

Really didn’t find a perfect solution, which would allow me to optionally specify a model or a set of models and show all the attributes and relationships for only those models. railroady comes pretty close though, especially when using graphviz output and OmniGraffle‘s layout engines. rails-erd made a decent full-model diagram too.


  gem "rails-erd"
  gem "railroady"


railroady -M --hide-through -i \
  -s app/models/*observation*,app/models/photo.rb,app/models/sound.rb,app/models/taxon.rb \


Hierarchical layouts seemed to work best, with some tweaking.

iNat observation model and some associated models.

I’m sure this is all irrelevant if you use cocoapods, but I don’t (yet). My approach to installing SSZipArchive was to add it as a git submodule and drag the folder including SSZipArchive.h and minizip into my project, adding it by reference. This caused RestKit compilation to barf, mostly a lot of parse issues in RKURL.h like Expected identifier or '(', which is ridiculous b/c nothing changed in RestKit.

I eventually found my solution at change the file type of the minizip .c files to “Objective-C Source.”

Screen Shot 2013-09-24 at 3.15.51 PM

Now my project builds and everything works normally. Why this solution works is beyond my extremely limited knowledge of C and Xcode. These files look like C and/or C++, so why does telling Xcode to treat them like Objective-C even work?

I have a bunch of iOS apps that use the same Google API identity with different client IDs for sign-in through Google. I just made another one but kept running into this problem where auth would work, but when the browser redirect occurred, I’d get a message reading “Cannot Open Page: Safari cannot open the page because the address is invalid.”

Photo Sep 18, 9 15 00 PM

That didn’t seem right b/c it was redirecting to an internal URL based on the bundle ID I specified when generating the client ID, and the bundle ID for the app was right.

Screen Shot 2013-09-18 at 9.12.49 PM

Turned out I was forgetting that you can specify a URL identifier and URL scheme in the *-Info.plist file. Changing that to match the bundle ID fixed the problem… after suffering hours of madness and having my life expectancy shortened by several years.

AREL where() to sql

Sometimes you just want the SQL for a WHERE clause:

 > Post.send(:sanitize_sql, :title => "foo", :body => "bar")
 => "\"posts\".\"title\" = 'foo' AND \"posts\".\"body\" = 'bar'"

iNat started crashing periodically this weekend, and after numerous attempts to figure out what was wrong, I finally found the culprit with some new-to-me tools: gdb and gdb.rb. For each crash load average was huge (like 10), and it was clearly the rack processes that were eating all the CPU (memory and swap were fine). Restarting nginx and/or killing the rack processes brought things back up, so clearly it was those hung processes that were killing things. Strangely, the rails logs seemed to be humming along normally (probably b/c not all procs were hung). There was some uptick in traffic around each crash, but not like a DOS-level uptick. After ruling out memory leak, DOS, and Rackspace issues (they claimed everything was fine), I figured there was an app-level issue, but none of the changes immediately preceding the crashes looked like they could be causing infinite loops or the like, sooooo I was at a loss.

I talked to n8 and he got me thinking about tools that look directly at the current state of the server processes, strace, gdb, and the like. gdb, which lets you look at what a live process is doing (if it was written in c, c++, or a number of other languages) ended up being the most useful, when combined with gdb.rb, which actually lets you execute ruby commands within the context of a live ruby process. I suck at interpreting C code from the interpreter, but with ruby I’m home. Inspection went something like this:

# install deps on Ubuntu and gdb.rb gem
sudo apt-get install gdb python-dev ncurses-dev
gem install gdb.rb
# wait for crash to happen again... and wait... and wait...
# CRASH! Attach to hung process by PID (use rvmsudo if you use rvm)
rvmsudo gdb.rb PID
# in gdb get a ruby stacktrace with file names and line numbers
# here I'm filtering by files that are actually in my app dir
(gdb) ruby eval{|l| l =~ /app\//}

That outed some infinite recursion from a totally unexpected place that I was able to fix in a few minutes. Unfortunately running ruby eval in gdb occasionally crashed the process, I think when the proc received a signal while gdb was attached, but stubborn repetition eventually got me my trace. Also, not all of gdb.rb’s commands seemed to be available, like ruby trace and ruby threads. I basically only had ruby objects and ruby eval, not sure why, but ruby eval was all I needed.

Anyway, there are tons of posts about using gdb to inspect ruby procs out there, but almost all of them are really old, and it took me a while to find the right tool for this job, so maybe this post will help someone. The best overview I found was this post from Big Nerd Ranch. This gist by tmm1 (author of gdb.rb) also collects a bunch of useful information about a hung process by combining output from multiple tools.