Ruby Weekly is a weekly newsletter covering the latest Ruby and Rails news.

Starling: A Ruby Persistent Queue Server That Speaks Memcached

By Peter Cooper / July 22, 2008

starling.png

It's been around for a while now, but Starling is a "light-weight persistent queue server that speaks the MemCache protocol." Starling makes it ridiculously easy to set up a network-accessible queue (or many queues) for, say, asynchronous job processing between multiple processes and machines. It was developed by Twitter to handle the heavy amount of queueing necessary to keep their service ticking over. Starling is proven in production, with not only Twitter using it in anger, but FiveRuns too. FiveRuns have even created their own fork that, they say, is significantly faster.

Why the sudden interest in Starling? Well, Glenn Gillen has written an excellent introductory guide to setting up Starling over at RubyPond.com. He walks through the process of using Starling (and Workling, a Rails plugin to make using Starling easier) from installation, through to actually adding things to the queue and processing them.

An interesting alternative to Starling is also presented within the comments on Glenn's post - RudeQ. RudeQ uses the same API as Starling but is ActiveRecord / database based, meaning there's no extra process to monitor. I suspect it's nowhere near as fast, but if you'd rather avoid the headache of monitoring another persistent process or don't have the option of having a persistent process at all (shared hosting, perhaps) it's worth checking out.

Post supported by Brightbox: Brightbox is a specialist European Rails hosting company. Each Brightbox server includes an optimised Ruby on Rails stack, SAN storage and access to a managed MySQL database cluster. They also manage dedicated clusters for large scale Rails deployments. Click here to learn more...

Comments

  1. Mike Perham says:

    We (FiveRuns) do have our own fork of Starling, but it based 99.9% on the work of others to make it faster than 0.9.3. See the Readme for credit where credit is due!

    As for the database-based queue, we wrote something very similiar to RudeQ and delayed_job. It scaled horribly and was about 10x slower than Starling. The database is not a good queue.

  2. Aman Gupta says:

    Would you mind commenting on the specific changes that were made to increase performance?

  3. Luke Redpath says:

    I highly recommend Beanstalkd as an alternative lightweight (albeit non-persistent queue) - its been rock solid for us at Reevoo; its lack of persistance has not proved to be an issue with its current uptime of 185 days, i.e. since we first started it up in production, and its very fast too.

    http://labs.reevoo.com/plugins/beanstalk-messaging

  4. Luke Redpath says:

    Oh, and for stats freaks: we currently run one instance of beanstalkd per queue (we've been using it since it added support for multiple queues per instance, but I actually prefer the redundancy in only have one queue per instance) with each instance using between 0.3 and 1.5 MB of memory on average an CPU time rarely going above 0.3%.

    Our busiest queue has processed over 48 million messages since we started using it 3 months ago (after replacing our buggy, slow ActiveMQ setup) and continues to process hundreds of thousands of messages a day.

  5. steve says:

    is this the queue server that cause a lot of down time in twitter?

  6. Peter Cooper says:

    I was waiting for someone to pick up on Twitter's downtime ;-) I suspect that the queue server is not to blame. Indeed, I believe Starling was developed after the first significant bout of outage in order to make Twitter more scalable? Perhaps someone could confirm that though..

  7. Alex Payne says:

    Peter: I'm an engineer at Twitter, and I can confirm that Starling was written after our first major traffic growth spurt in Spring 2007. Our scalability issues have more frequently had to do with database than with Starling, but it has limits and oddities. Particularly, we see its performance come a crawl at around 400,000 items in a queue. Replaying its on-disk journal (for example, after a crash or reboot) can also be extremely slow.

    Generally, we don't expect to use Starling in perpetuity at Twitter. At the time it was developed, the Ruby libraries to interface with mature, robust queueing systems were underdeveloped. Now that so many Rubyists have taken an interest in queues, those libraries have improved. Given that, I'd take a good look at ApacheMQ, RabbitMQ, and so forth before deploying Starling. Starling is only suitable for applications that require zero transactional support from their queuing system (ie, all reads are destructive).

    Another of our engineers has an implementation of Starling in Scala, called Scarling, that's more suited to certain operational scenarios: http://github.com/robey/scarling/. We may deploy Scarling, but we expect to move to a robust queuing system within the next several months.

  8. Peter Cooper says:

    Thanks for the incredibly useful (and authoritative!) info, Alex. Scarling sounds interesting (I own ScalaInside.com, and was going to jump into doing some Scala but somehow got put off). I haven't played with RabbitMQ yet, but being Erlang based, I get the impression it'll be the ultimate "winner" in this area.

  9. steve says:

    Thanks for the explanation Alex. It is an excellent insider tips

  10. alexis says:

    Hello ruby people, thanks for mentioning RabbitMQ in this context.

    There is a shiny new ruby client that you might like to play with. No need to know any erlang code, but you still get all the scalability and stability of the platform. Plus you can turn on persistence, do pub-sub, etc - these are all config options.

    The ruby client is described here: http://hopper.squarespace.com/blog/2008/7/22/simple-amqp-library-for-ruby.html

    Please give it a try and contact us if you have questions!

    BTW - messaging geeks may also enjoy this - click through to see a presentation explaining how a top grade messaging system this can all be written with less than 5k lines of code: http://www.lshift.net/blog/2008/07/01/slides-from-our-erlang-exchange-talk

    cheers

    alexis

  11. Matthew Rudy says:

    Yeah.
    RudeQ/RudeQueue is clearly a lot slower than Starling.
    I certainly wouldn't suggest it as a replacement for Twitter.

    But for boring day-to-day passing normal amounts of data to an asynchronous process, it certainly works.

    We use it at JobsGoPublic.com to queue up and asynchronously process updates to Job listings, and it handles that very well.

    I did some benchmarks vs Starling
    http://github.com/matthewrudy/rudeq/tree/master/performance/benchmark_vs_starling.rb

    Maybe it's 5 times slower than starling.
    But on my dev machine it still handled more than 100 a second.
    (albeit without simulated contention, but...)

    I challenge most Rails sites to need to more queueing operations than that.

    Will take a look at RabbitMQ no doubt!

  12. dubek says:

    Just wanted to add a few more links to the party here. First, these post also deal with the queuing issues:

    http://nubyonrails.com/articles/about-this-blog-beanstalk-messaging-queue

    http://blog.thinkrelevance.com/2008/6/1/small-things-loosely-joined-written-fast (updated code samples: http://blog.thinkrelevance.com/2008/6/3/updated-code-sample-from-small-things-talk )

    Besides Starling, RabbitMQ, RudeQ and beanstalkd already mentioned in the post+comments, there are also:

    Apache ActiveMQ
    http://activemq.apache.org/
    (needs Java J2EE, and speaks many protocols, including the simple STOMP protocol which has a ruby client / command-line)

    Sparrow
    http://code.google.com/p/sparrow/
    (persistent in files/SQLite, speaks memcached)

    RQ (by Ara T. Howard)
    http://www.codeforpeople.com/lib/ruby/rq/
    (based on NFS?)

    StompServer
    http://stompserver.rubyforge.org/
    (speaks STOMP, of course)

    Seems like a mess. The first criterion for deciding, as Luke Redpath mentioned in a comment above, is whether you need queue persistence. If not, simpler solutions such as beanstalkd are probably best. In addition, I think that RudeQ is the only one that doesn't require a constant running process, so if it's an issue (like Peter said) you might wanna choose it.

  13. Dan Mayer says:

    We did a ruby queue systems benchmarking post awhile ago. It is a little out of date, as apparently both starling and beanstalk had faster versions that aren't packed in gems (then again I don't know if I can count non battle tested versions). In this case Benastalk won out a bit. Starling was very close though... We have been using beanstalk for Devver and been very happy with it.

    http://devver.net/blog/2008/06/ruby-messaging-shootout/

    Dan

Other Posts to Enjoy

Twitter Mentions