Ruby Weekly is a weekly newsletter covering the latest Ruby and Rails news.

How to Sanitize HTML and CSS in Ruby

By Peter Cooper / January 1, 2009

sanitize.png

If you've developed an application that displays user-supplied text in a Web browser, it's always possible that the user has entered some crazy HTML (or even CSS) that will break your site's layout. While it's easy to remove all HTML from a piece of text, you might want them to use certain subsets of HTML to format their content, so you need to sanitize the user supplied HTML and CSS. Luckily, two Ruby libraries have been released in the last couple of days to sanitize HTML and CSS respectively.

HTML

Sanitize (or Github repo) by Ryan Grove is a new HTML sanitization library for Ruby. Install the sanitize gem and then it's crazily simple from there:

require 'rubygems'
require 'sanitize'

html = %{<strong><a href="http://foo.com/">foo</a></strong><img src="http://foo.com/bar.jpg" alt="" />}

Sanitize.clean(html) # => 'foo'

As Ryan explains in his blog post, Sanitize removes all HTML by default, but you can specify options to allow certain elements, attributes, protocols, and so forth - read his post to get the full scoop. Sanitize also closes tags that are left open - excellent!

CSS

Allowing users to specify custom CSS can be.. interesting (see MySpace) but potentially damaging if it gets sent to third parties. Browsers can, in many circumstances, execute JavaScript in CSS or otherwise be given nefarious CSS to parse. Courtenay Gasking's css_file_sanitize (or Github repo) helps prevent some of these issues by sanitizing the CSS provided. It's still in its earliest stages, and the README contains no documentation (but if you see his test file, you'll get the idea), so Courtenay's open for feedback, patches, etc.

Comments

  1. Eddie May says:

    Happy New Year!

  2. Jonathan Soeder says:

    I was in the process of converting an HTML sanitizer I wrote in perl to ruby and had to shelve it. This is noice.

  3. Markus Jais says:

    Happy New Year. This is a very useful small library. Thanks for showing.
    I just played a little with it and it works like a gem :-)

  4. Eric Davis says:

    The CSS sanitizer might come in handy. I just had a customer come up with an idea of allowing custom CSS per user.

  5. Pete Forde says:

    I prefer a lovingly massaged version of the sanitize.rb library, originally by Jacques Distler. I've taken the liberty of adding a few convenient string mixins:

    http://pastie.org/351431

    This library takes the slightly unconventional approach of parsing the input using the html5 gem. Genius!

    I've tested it against most of the hacks on this page of XSS vectors. I consider it a litmus test for declaring your shit "secure":

    http://ha.ckers.org/xss.html

    Your mileage will vary. I like this because it doesn't try to process every string; you call it explicitly on what you need.

  6. rick says:

    The rails sanitize helper (from my old whitelist_helper plugin) was written using that same hackers page. The nice thing about this lib is that it uses Hpricot and is probably a lot faster than any ruby based html parser.

Other Posts to Enjoy

Twitter Mentions