Parsing Rails logs with Beaver
Rails’s production logs contain a wealth of information, but I have found them to be rather parsimonious with it. Aside from manually reading through them, I have never found a good tool for parsing them into programmatically usable data (speed analyzers excepted).
“Google Analytics you moron!” Honestly I’ve never used it, because 1) it seems like a beast to use effectively, 2) I’m not sure it answers the questions I’m asking, and 3) the information I want is already sitting on my server, if I could only get at it.
“Apache/Nginx logs you moron!” No, they only contain a subset of the information in the Rails logs.
I decided I should be able to express what I wanted in a declarative fashion, using a simple DSL. I came up with this:
hit :bad_login, :path => '/login', :method => :post, :status => 401 do
puts "Bad login with #{params[:username]} from #{ip}"
end
After I figured out how to actually write a DSL in Ruby, it turned out to be pretty trivial. Trivial enough that I’m wondering why it didn’t exist years ago and gain tremendous popularity. Please let me know if I’m missing anything.
Beaver
I’m calling the resulting project Beaver. It’s still what I’d call experimental, but I’m using it regularlly without any real problems. I’d love to hear some feedback from the Rails community about how others are making good use of their log files, because I can’t find any chatter about it.
Writing an Ajax long polling server in Ruby, Part 2
Hopefully in Part 1 I wrote about a means of writing and deploying a long-polling server in Ruby. Of course that’s useless by itself, so here we’ll hook it up to a browser client via Ajax.
We’ll touch on
CORS (Cross-Origin Resource Sharing)
Unfortunately you can’t just fire an Ajax request at polling.myapp.com and go. Actually it’s not unfortunate, because it’s an important part of browser security known as the Same Origin Policy . For our purposes, it states that our script loaded from myapp.com can’t make an Ajax request to any other domain.
Obviously that’s quite limiting, so the Sort of People Who Think Up These Kinds of Things thought up CORS. Here’s an article about it and here’s another one .
In short, if your client is IE 8+, Firefox 3.5+, Safari 4+, or a recent Chromium derivative, then you can use CORS without any extra work on the client side. There’s a little server side work, but we already covered that. Remember under the ‘My App’ section of Part 1 where I added the Access-Control-Allow-Origin header? That’s it. Just set that header to a ‘myapp.com’ to allow access. To allow access from anywhere, just use *. (Unfortunately *.myapp.com isn’t supported, which irks me.)
Javascript and browser issues
Using jQuery, I’m putting some example Javascript below. Theoretically it’s quite straightforward, but there’s always that one thing that ruins your day. (Yes, it’s what you’re expecting.) While version 8+ of Microsoft’s “browser-like program” claims to support CORS, it does not do so through the XMLHttpRequest object like every other browser on the freaking planet. They had to go and create a whole new object just for use with cross-domain requests. Why? I suspect someone couldn’t find a puppy to kick that morning.
With that unpleasantness in mind, here are some codes:
// Assume we're in a sane, modern browser
$(function() {
try {
// Test the waters with a "ping" to the root of the polling app
$.ajax({
url: 'polling.myapp.com',
type: 'GET',
dataType: 'json',
success: function(data, textStatus, jXHR) {
// Ping went through, initiate long polling
if ( data.ack && data.ack == 'huzzah!' ) long_poll_walls(walls)
// Ping failed, fall back to simple polling every 20 sec
else setInterval(function() { simple_poll_walls(walls) }, 20000)
},
error: function(jXHR, textStatus, errorThrown) {
// Ping failed, fall back to simple polling every 20 sec
setInterval(function() { simple_poll_walls(walls) }, 20000)
}
})
// Guess we aren't
} catch (e) {
// It's Microsoft's browser-like program!
if ( jQuery.browser.msie && window.XDomainRequest ) ie_long_poll_walls(walls)
// Fall back to simple polling every 20 sec
else setInterval(function() { simple_poll_walls(walls) }, 20000)
}
})
// For normal browsers
function long_poll_walls(walls) {
d = {last_post_id: {}, session_id: 'some token to prove who you are'}
// Record the last post under each wall.
$(walls).each(function() {
var wall_id = $(this).attr('data-id')
d.last_post_id[wall_id] = $('.post:last', this).attr('data-id') || 0
})
$.ajax({
url: 'polling.myapp.com',
async: true,
timeout: 180000, // 3 minutes
type: 'POST', // Using POST because it makes the polling app a little harder to abuse
data: d,
dataType: 'json',
success: function(data, textStatus, jXHR) {
var success = wall_polling_callback(walls, data, textStatus, jXHR)
if ( success ) long_poll_walls(walls)
else setTimeout(function() { long_poll_walls(walls) }, 5500)
},
error: function(jXHR, textStatus, errorThrown) { long_poll_walls(walls) } // Assume it just timed out and continue
})
}
// For IE
function ie_long_poll_walls(walls) {
// Build params to send with request
var params = ''
$(walls).each(function() {
var wall_id = $(this).attr('data-id')
var last_post_id = $('.post:last', this).attr('data-id') || 0
params += 'last_post_id['+wall_id+']='+last_post_id+'&'
})
params += 'session_id=your happy session token'
// Set up request
var xdr = new XDomainRequest()
xdr.open('post', 'polling.myapp.com')
xdr.onload = function() {
var success = wall_polling_callback(walls, JSON.parse(this.responseText), 'success', xdr)
if ( success ) ie_long_poll_walls(walls)
else setTimeout(function() { ie_long_poll_walls(walls) }, 5500)
}
xdr.send(params)
}
Short-polling fallback
Older browsers without CORS support will need a simpler short polling scheme, every n seconds. For simplicity’s sake, I just make this a part of my Rails app.
function simple_poll_walls(walls) {
// Just a normal Ajax request to your app with callbacks.
// You can figure it out.
}
Why write your own?
There are several good Ruby long-polling servers out there (e.g. Goliath) so why would you want to write your own? For starters there’s a small sense of pride. But mostly, it’s the best way to understand what’s going on. Sure, Goliath may be superior to my attempt in many respects. But really, isn’t this more fun?
Writing an Ajax long polling server in Ruby, Part 1
I spent weeks researching ways to build an Ajax chat server for a Rails app. The info is out there, but very fragmented. Nowhere did I find a single resource that explained the whole picture, each piece I needed, and why. Hopefully this can be that resource for you. I do not claim to be an expert, but I did build one, and it works very well. I guess all I claim is, “here’s what I did, hope it gives you some ideas.” Please point out errors and make suggestions. YMMV. I assume you have a good grasp of Ruby, Javascript, and Web-related technologies in general.
Because it’s so lengthy, I’ve broken this up into two parts. Part 1 deals with the server side, while Part 2 deals with tying it into the client side.
Up front I’ll tell you that you’ll need a separate app (ideally on a subdomain) to handle the long polling. It probably should not be Rails, and it should not run on Apache. What we’ll end up with is an Async Sinatra app running on Thin, reverse-proxied through Nginx.
We’ll touch on
- WebSockets will one day save us all
- Short vs. long polling
- Nginx
- Thin
- EventMachine
- Sinatra and async_sinatra
- My App
WebSockets will one day save us all
WebSockets are the future, providing full-duplex asynchronous push communications between Web browser and Web server, as well as food, shelter and love to everyone on Earth. Unfortunately the future isn’t here yet. Ajax long-polling is a cobbled-together hack until WebSockets and flying cars arrive. When they do, I recommend em-websocket .
Short vs. long polling
Regular, or what I’ll call “short” polling is the youngest kid on Christmas morning pestering, “Can I open my presents now? Can I open my presents now?? Can I open my presents now???!!?” The webserver parents get so overwhelmed that they eventually shut down and stop responding. Long polling is the disinterested teenager who, with headphones on, requests “Tell me when we’re going to open presents.” That one question just hangs there until the parents are ready.
Nginx
Apache’s a great Web server. It’s a wunderkind of the Open Source world, probably only rivaled in success by GNU/Linux. But it is not an asynchronous server . It’s a process-based server, and long polling will bring it to its knees as Apache forks after process for the onslaught of long-running requests.
Apache can do almost anything imaginable; Nginx does the six things you need, and 20x faster . Nginx handles reverse-proxies and load-balancing through TCP or Unix sockets , URL rewrites , SSL , gzipping , easy Cache-Control headers – everything most Web apps need. And of course it’s a great general-purpose Webserver. Here’s a good intro to the basics.
But the killer feature here is that it can easily handle thousands of concurrent requests while using only MB’s of memory. Your Nginx virtual host would look something like below. Brilliantly simple, isn’t it?
upstream polling-app {
server unix:/path/to/thin.sock;
# Sockets are faster than your TCP/IP stack. Use them if you can!
#server 127.0.0.1:3000;
}
server {
listen 80; ## listen for ipv4
server_name polling.myapp.com;
access_log /var/log/nginx/polling.access.log;
error_log /var/log/nginx/polling.error.log;
location / {
root /path/to/root/not/sure/it/matters/because/no/files/are/served;
proxy_pass http://polling-app;
}
}
Personally, I’ve dropped Apache and switched everything over to Nginx. But if you’re not comfortable doing that, I’d recommend running Nginx in front on 80/443. Your polling app would look like the above example. For everything else, you can switch Apache to use port 8080 (or whatever), and have n Nginx virtual hosts reverse-proxying to your n Apache virtual hosts over port 8080.
Thin
Mongrel? Passenger? Unicorn, Rainbows! or Zbatery? Thin is one of those guys . It’s a Rack app server, running your app’s code behind your Webserver. (Heck, it can even act as a full Webserver with SSL support!) Thin handles requests asynchronously with EventMachine. It can handle a lot at once, which is why it and Nginx are a great pair for this. (Rainbows! and Zbatery might work as drop-ins, but I have more experience with Thin.) Thin’s also a Ruby gem, making it wicked-super-easy to install. In fact it’s probably the least complicated piece of this whole thing. Just install it, write a small config file for your polling app, and you’re done.
Here’s a brief tutorial I wrote on configuring your Thin apps and getting them to start automatically when your system boots up.
Thin is a little unique in that it can be bound to a port or a socket. Since Nginx can reverse-proxy to a socket, and sockets are generally faster than climbing through the TCP/IP stack, I’d
recommend using them if possible. You can find details in thin -h .
EventMachine
EventMachine is what makes all of this possible; a working understanding is critical. The main page of their docs is very good, so give it a read .
EventMachine::run do
puts "There's a job to do!"
job = lambda do
i = 0
while i < 10000
i += 1
end
i
end
callback = lambda do |num|
puts "Job done; it counted to #{num}!"
end
puts "Starting job..."
EventMachine::defer job, callback
puts "Job started!"
puts "Let's do other stuff while that's running!"
puts "Other stuff..."
end
That’s a dump example, but it should get the point across. While it’s counting to 10,000, you can do other stuff. Whenever it’s done it will print out the result. Read over the docs for a whole lot more info.
Sinatra and async_sinatra
Sinatra will be the meat (or tofu, if that’s your thing) of our polling server. It’s a micro-framework written in Ruby. Comparing it to Rails you might say it handles routes and controllers, but everything else is up to you or optional Rack Middleware. It has a great intro and docs , so I’ll let you peruse those at your leisure. But because I’m such a good chap, here’s a small example:
# Defines a GET action at "/hello"
get '/hello' do
sleep 10
'Hello!'
end
# Defines a POST action at "/bienvendidos"
post '/bienvendidos' do
'Bienvendidos!'
end
Notice sleep 10 . Pretend that’s instead a very important, intensive operation that takes about 10 seconds. If you GET /hello and then immediately POST to _/bienvendidos, your bienvendidos request will have to wait on /hello to finish. Put a pin in that.
Async Sinatra is a small yet powerful gem allowing Sinatra to dip down into Thin’s eventmachine-driven innards and deliver responses asynchronously. In short, this means we can easily handle a whole bunch of long-running connections at once. Converting the above example, we would have
# Defines a GET action at "/hello"
aget '/hello' do
big_job = lambda { sleep 10 }
result = lambda { 'Hello!' }
EM.defer big_job, result
end
# Defines a POST action at "/bienvendidos"
apost '/bienvendidos' do
'Bienvendidos!'
end
Pull that pin out. If you try the same test here, /bienvendidos will return right away while /hello works in the background. As you may have guess, EM is just a handy alias to EventMachine .
My App
Now that you have all the pieces, I’ll show you how I put them together. To understand where my code is coming form, and where yours may want to differ, a brief explanation of what I’m polling and how it’s being used is in order. The larger purpose of the Rails app is unimportant, but one requirement was a real-time, persistent group chat/message board/notification area which I called “Walls.” Groups of users have access to certain Walls. Messages posted to these Walls are stored in the database and can be reviewed at any time. But when users are signed in, they should be able to communicate in real time (long-polling).
For efficiency, I have only one job hitting the database every 1 sec. This job stores a hash like {1 => 57, 2 => 67, 3 => 355}, where 1, 2 and 3 are Wall ids and 57, 67 and 355 are the latest message ids from those Walls. For even more efficiency, this job only runs when any clients are connected. We’ll call this The Global Hash.
Each browser connection (polling request) sends a similar hash containing the latest message ids it has. We’ll call this The Local Hash. While the client’s connected, every 0.5 sec, the polling request sees if The Global Hash has any newer message ids than The Local Hash. If so, it grabs those messages from the db and returns them to the browser.
In the code below, you’ll notice the AppPoller class does most of the heavy lifting. That code is very application-specific and probably wouldn’t do you much good. With that in mind, I’m only showing you the Sinatra code, which should be more than enough to give you some ideas.
config.ru
require './app' run Pollster
app.rb
require 'rubygems'
require 'sinatra/async'
require './config/boot' # Requires files with custom classes like AppPoller, sets up db connection, etc.
# When the reactor starts...
EM.next_tick do
# Run this every 1 second
EM.add_periodic_timer(1) do
# IF anyone's connected, poll the database for new messages.
# Take the last message id from each wall and store it in a hash like {1 => 56, 2 => 77}
AppPoller.poll! if AppPoller.has_clients?
end
end
class Pollster < Sinatra::Base
register Sinatra::Async
# Create a new HTTP verb called OPTIONS.
# Browsers (should) send an OPTIONS request to get Access-Control-Allow-* info.
def self.http_options(path, opts={}, &block)
route 'OPTIONS', path, opts, &block
end
# Ideally this would be in http_options below. But not all browsers send
# OPTIONS pre-flight checks correctly, so we'll just send these with every
# response. I'll discuss what some of them mean in Part 2.
before do
response.headers['Access-Control-Allow-Origin'] = 'myapp.com' # If you need multiple domains, just use '*'
response.headers['Access-Control-Allow-Methods'] = 'GET, POST, OPTIONS'
response.headers['Access-Control-Allow-Headers'] = 'X-CSRF-Token' # This is a Rails header, you may not need it
end
# We need something to respond to OPTIONS, even if it doesn't do anything
http_options '/' do
halt 200
end
# The root path will serve as a kind of "ping" for our clients.
# We'll respond to everything with JSON.
aget '/' do
response.headers['Content-Type'] = 'application/json'
body '{"ack": "huzzah!"}'
end
# Technically we should use GET, but POST makes it less susceptible to abuse
apost '/' do
response.headers['Content-Type'] = 'application/json'
# Find the user. This is left as an exercise for you.
user = AppPoller.get_user(params[:session_id])
unless user
body '{"errors": ["Invalid user!"]}'
halt 400
end
# Find/parse the last post ids.
# This is a hash like {1 => 56, 2 => 77} where 1 and 2 are Wall id's,
# and 56 and 77 are the latest message id's this user has for those
# walls.
# user.resolve_last_post_ids is for security, stripping out any
# walls the user isn't supposed to have access to. Another exercise for you.
last_post_id = user.resolve_last_post_ids(params[:last_post_id])
unless last_post_id.any?
body '{"errors": ["Invalid parameters!"]}'
halt 400
end
# This is the job that will keep checking for new messages for this
# user's walls
pollster = proc do
AppPoller.add_client user
time, new_posts = 0, false
# After a minute, most browsers or proxies will have severed the connection,
# and we don't want this job running forever.
until time > 60
# This just compares the user's latest post_id's to the global hash, so it's very cheap.
new_posts = AppPoller.posts_since?(last_post_id)
break if new_posts
sleep 0.5
time += 0.5
end
# If there were new posts, grab them from the database
new_posts ? AppPoller.posts_since(last_post_id) : []
end
# This job takes the new posts (if any), converts them to JSON,
# and sends the response.
callback = proc do |new_posts|
AppPoller.drop_client user
walls = {:walls => {}}
new_posts.each do |p|
walls[:walls][p.wall_id] ||= {:posts => []}
walls[:walls][p.wall_id][:posts] << p.to_hash
and
body walls.to_json
end
# Begin asynchronous work
EM.defer(pollster, callback)
end
end
Stay tuned for Part 2!
How to deploy a multi-threaded Rails app
Multi-threaded Rails apps have to be the best kept secret in the Rails community. “What do you mean? It’s been discussed at length for years!” Yes, the benefits, drawbacks, and limitations have been. But, JRuby aside, I have never ever seen someone saying “This is how you do it.”
It’s easy to find instructions for turning threaded mode on in Rails. Just comment out config.threadsafe! in config/initializers/production.rb . But that’s only half the story. Normally, Rails has a mutex around request handling, allowing only one at a time. threadsafe! removes that, telling Rails to handle requests concurrently as they come in. But your app server (Passenger, Mongrel, etc.) is what needs to send your app those concurrent requests in threads. Most never will.
Are you sure?
Go ahead, turn threadsafe! on in one of your apps. Then create an action called foo .
...
def foo
n = params[:n].to_i
sleep n
render :text => "I should have taken #{n} seconds!"
end
..
Open a broswer window and pass it ?n=15. Quickly open another window and pass it ?n=2. Window 1 will render in 15 seconds. With config.threadsafe! on, you’d expect Window 2 to finish well before Window 1. But no, Window 2 takes 17 seconds, because it’s waiting on the first request, which is taking 15 seconds. You may have told Rails it’s safe to be threaded, but your app server isn’t threading .
An aside
I will note, as have many others, that Rails thread-safety means little if your gems are not thread-safe or are doing lots of blocking. You’ll have to deal with that on your own. My understanding is that the mysql2 gem used in Rails 3 took care of this for the average app.
Most Rails deployments can’t go multi-threaded, no matter what
Passenger and Unicorn definitely can’t. Most Mongrel deployments can’t or won’t (not sure which). These are all process-based request servers, like Apache, where a process handles only one request at a time. If you want to handle n concurrent requests, you need n copies of your app running. Rainbows! and Zbatery I believe have event-based concurrency, but that is not the same as multi-threading. So what does that leave us?
I got Thin
The only Rack server I’ve found that mentions multi-threading is Thin . It uses EventMachine to handle concurrent requests. Yay! Everything solved, right? Not quite. As is the case with Rainbows! and Zbatery, event-based processing is not the same as multi-threading, and buys config.threadsafed!‘d Rails apps nothing. But Thin does have a threaded mode which can be enabled by passing “—threaded” on the command line or by setting “threaded: true” in your YAML config file. That’s it! Try the above test now, and Window 2 will finish long before Window 1.
Follow this simple bug work-around if you’re using Ruby 1.9.2. Otherwise all your requests could take almost a minute to complete!
You also may want to read my post on Thin config and managment .
Is Thin the only way?
I hope not. If so, that means Rails has an incredibly powerful features that everyone’s excited about, but at the same time, that no one really cares about. Thin is the only multi-threaded Rack server on which I can find any meaningful discussion or documentation (and even that is scant). If you know of another, please let me and everyone else know!
Look into Nginx
To really take advantage of all these multi-threading and asynchronous goings-ons, you should look into dropping Apache and switching to Nginx. In fact if you’re trying to run as efficiently as possible on a VPS, I insist you look into it! There’s plenty out there, but I’d start with a good comparison of the two.
