Essays

Rebuild mail-notification to support SSL under Ubuntu/Debian

Sunday, September 7th, 2008  

Because of a four-year-old disagreement on the legal interpretation of the GPL and OpenSSL licenses, Debian is shipping a neutered and useless version of mail-notification without SSL support. Ubuntu hasn’t resolved the issue, so they’re shipping the same broken package too. People arguing about why they can’t fix bugs bores me. So here is a script to download the source packages, and rebuild them with SSL enabled. It even bumps the version number so that the package manager doesn’t try to overwrite your working package with the broken one in the repository. It also keeps track of the (on my system) thirty-seven megabytes of build dependency packages that it installs and removes them once the package is installed.

(more…)

Announcing: The Periodic Table of the Europeans

Saturday, August 30th, 2008  

Take heed, chemists! The forty-nine countries of Europe have finally been organized in their very own Periodic Table of the Europeans!

I came up with this early in my most recent trip, somewhere in Turkey, and finally got the chance to make it. Read all about it and see a bigger version here, or get it on a poster, or on light and dark t-shirts.

SOAP and REST, explained

Friday, August 22nd, 2008  

SOAP, and REST, explained

I wish XML were becoming obsolete faster

Monday, August 18th, 2008  

XML sucks at (almost) everything it’s used for. Google has open-sourced Protocol Buffer, a typed, backwards compatible, compact, binary data-interchange format. Combined with YAML for configuration and data-persistence files that need to be human readable, there’s even less reason to use XML for any data-serialization.

XML, in the form of (x)HTML, seems ok for markup, and the XML-based (x)HTML templating in Genshi is the best templating language I’ve ever used (and I’ve used XSLT, Mako, Mighty, PHP, and a few others). I wonder if the reason that HTML (and XML) templating is so difficult, and templating language code is often so ugly, is because XML is actually a poor solution for markup too. HTML is obviously here to stay, but it would be an interesting thought experiment to design a successor markup language that is not strictly hierarchical, more human-readable, and designed with templating in mind.

What reading Tufte won’t teach you: Interface design guidelines

Monday, August 11th, 2008  

Edward Tufte’s books do a beautiful job of illustrating how to present huge amounts of information clearly and simply. Well presented information is critical to good interface design, but it’s not the whole story. Guidelines on how to present complex functionality clearly and simply are harder to find.

I’ve just spent two months carrying a terrible, ancient cellular phone and a mediocre non-Apple music player around the planet, and interacting almost exclusively with Windows XP terminals at internet cafes and hostels. As my frustration with these poor interfaces grew, I started a rough list of interface design guidelines. Here they are:

Read on for explanations and examples of good and bad design related to each one of these rules. There is a great deal of overlap between some of them, and that’s OK — they’re just guidelines. (Perhaps I, or someone else, will someday condense them into eight or nine fundamental principles of good interface design.)

The application interface should be fast and non-blocking. If it cannot be fast or non-blocking, it should appear fast and non-blocking by being immediately responsive. Many old-school web applications had disclaimers next to submit buttons saying “Click this button only once.” Submitting data over the network wasn’t fast back in the days of 56.6kbps modems, and it’s still not fast enough today. Most web applications now deal with this by disabling the submit button (good) or by changing it to a progress spinner (better) with JavaScript the first time it’s clicked. The application isn’t fast or non-blocking, but it imitates immediate responsiveness.

The application interface should be consistent. The phone I traveled with had the standard five buttons between the screen and the numeric keypad. When navigating through and controlling various applications, sometimes the left top button meant “select”, and sometimes it was the right top button. And sometimes the center menu button also meant “select,” but not always. This lack of consistency made it just plain impossible to develop a motor memory for “select” on the phone — I constantly had to think about how to select an item or confirm an action. The cross-platform widget toolkit wxWidgets ensures that dialog boxes it creates match the standard order of OK, Cancel, Yes, and No buttons on whichever platform the application is running on.

Don’t interrupt users in the middle of common, nondestructive tasks. The basic, core functionality of the application should be free from confirmations, interruptions, dialog boxes, configuration questions, multiple steps, wizards, and other garbage. Get out of the way and let users do what they need to do. Windows XP’s information bubbles that pop up out of the system tray, on top of other windows, are particularly egregious violators of this rule. Dialog boxes in Windows and on Linux also break this rule, as an error in a background application can interrupt the user mid-task in another application. Mac OS X’s design that puts dialogs into “sheets” attached to the parent window ensures that a dialog box never interrupts a task in a different application. If one application, or the operating system, violates this rule, it can interrupt users while they are using a different application and therefore affect the usability of the entire system. Websites for DTF publications like the New Yorker violate this rule when they split their articles up into multiple pages — maybe this is motivated by some weird desire to mimic turning of a page, but more likely it’s to track readers and increase ad views. Don’t do it. Reload ads and track readers with JavaScript if necessary.

Avoid notifying users of success. In general, an application should allow users to assume that everything is successful unless they hear otherwise. If a delayed or background process or command has completed, and notification of its success will help users to continue their work, then that notification must be radically different than failure notification. Windows’ information bubbles are a serious violator of this rule too. When plugging a new device into a Windows machine, it will often emit three or four info bubbles, with a sequence of messages like this: “New hardware connected,” “CanonSony XJ-4000 PowerCyberShot found,” and then “Your new hardware is ready to use.” OS X gets this right; plug in a camera and iPhoto launches, ready to import the photos. It doesn’t say, three times, that a camera has been connected. iPhoto launching is an implicit indication of a successful connection with the camera. Lots of web applications are doing the right thing by putting success notifications in the page, in little unobtrusive boxes, and putting error messages in red, in a different place on the page. Using dialog boxes for errors, confirmations, and informative messages, as most applications did for years, just trained users to always click “OK.”

Avoid giving users information that they cannot use. Users still must read, think about, and decide that the information is useless. If the information is useless to begin with, why risk confusing them? Why slow them down to read it? Every time I plug my USB CF card reader into a Windows machine, it gives me an info bubble that says “This device could function faster if it were plugged into a USB 2.0 port.” And it says this even if the computer has no USB 2.0 ports at all. What are users supposed to do with this information? Run out and buy a USB 2.0 card? Or a new computer? How many non-technical users actually know what “USB 2.0″ means, and can correctly decide to discard this information? The information bubbles in the previous example fail here too. Users don’t need to know that new hardware has been connected, or what model of camera they just connected, because, he or she is the person who plugged the camera in. When people are working together on a project, they don’t call each other every thirty seconds to tell the rest of the team that they successfully typed another word into the report or entered another figure into the spreadsheet.

Rare, destructive actions should be harder to complete than nondestructive ones, but always possible. Closing a file without saving or emptying the trash are examples of destructive actions. If most of the actions in an application are destructive, consider building an action history with an “undo” command or a back button into it. Make as many actions as possible nondestructive. And don’t just skip implementing destructive actions — building a web application without a “delete account” command is criminal. For a long time, one of the t-shirt sites that I’d used didn’t have a way to delete a shirt — just a blurb saying to email customer support with the shirt ID. In the rare case where users do want to perform a destructive action, they are positive that they want to perform it. If it’s missing, the application seems ten times more unfinished and underpowered.

Give the user the chance to ask for forgiveness rather than forcing them to confirm a (destructive) action. Gmail and other web applications are pioneering this one.  Rather than asking something like “Are you sure you want to delete this conversation?” they provide a success notification “The conversation has been deleted” with an “undo” button next to it.  The insight here is that, although the application must provide a way to immediately abort a destructive action like this, 99% of the time, users actually intended to perform the destructive action. That should be the easy, one-click case, and aborting the destructive action should be the rarer, two-click case.

If the application pesters users with a confirmation dialog for destructive actions, users memorize a multi-step destructive command: click delete, then click OK — and when they accidentally delete the wrong thing, they miss the chance to abort. Many, many applications are guilty of this.

Deal with application failure gracefully. Don’t lock users out or lose state in the event of an application failure. Users witnessing an application failure are in the most stressful and worried mental state they will ever be in while using the application. The interface for alerting users about an application failure and recovering from it should be the smoothest, simplest, most comforting part of the interface.

Preserve state, mode, and user input for as long as it is relevant, until the user saves or discards it. Never make users answer the same question or enter the same information twice. Was there an error when saving? Show the form again with everything the user entered. Did the user switch the telephone keypad to Title Case? Stay on title case when the word isn’t in the predictive text dictionary and the user has to spell it. Take advantage of the fact that computers are better at remembering than anything else.

Provide multiple, complete navigation paradigms. Keyboard and mouse control, back and forward buttons, search and choose, scroll and jump, broad and deep, fast and slow. Digital bedside clocks and watches are particularly bad violators of this.  Often they provide, to set the time, just “up” and “down” buttons, or “fast up” and “slow up” buttons. With these two-button interfaces, users must hold down one of the buttons and watch the time change. While performing boring, slow tasks like this it’s easy for humans (your users) to get distracted, miss the target time, and have to go all the way around again, or back in the other direction, slowly. A speed sensitive knob, like on analog watches, or even just ten numeric buttons, would be a much superior navigation interface.

The iPhone lets the user scroll slowly through their address book, or click on a letter and jump ahead in the alphabet. The speed of the scroll when dropping an item into a long list in a scrolled window should depend on how far from the edge the item is being dropped. Would an e-commerce site succeed with just browse-by-category and no search? The phone I carried had one button to cycle to the next word in the group of words offered by the predictive text system, but no way to go back to the previous word. Press next too many times? Sorry, you have to cycle through all nine words all over again.

Design the interface before starting to code. Even just a sketch will help — what commands and what functionality is going to be accessible? When and where? What will need extra heuristics? What will need custom widgets? What are the trouble spots? And don’t just copy what some other application has done. Even great interfaces have problems — copy what’s good and improve what isn’t. Didn’t design the interface before starting to code? Stop now and design it.

If the application violates one of these rules because its design makes implementation of a better interface too complex or too difficult, then the application needs to be refactored until it supports a better interface. This one is sometimes the hardest to swallow — how could an application with a mathematically perfect algorithm and beautifully coded implementation of it need to be re-engineered? If the excuse for not implementing a powerful new feature is a back-end that can’t support it, then that back-end, no matter how awesome it is, is not good enough, and rewriting is the only option. A better UI is one of the most powerful new features that can be added to an application, so if it requires a redesign and a rewrite, so be it.

Officer Google, the frothy-mouthed robot trademark cop

Monday, July 28th, 2008  

I just sent this to Google AdWords tech support. It’s self-explanatory. I’ll post the response from Officer Darth Google (if there is any).

Yesterday, I edited and changed only the URL in Google ads I was running for a t-shirt I made on RedBubble. The text of my ad remained unchanged. Yet suddenly Google has decided that my ads, whose text has not changed, are now in violation of a supposed trademark on the word “angels.” I fail to see how changing the URL in my ads suddenly makes the ad content infringing.

An “angel” is a Judeo-Christian mythological creature that predates U.S. trademark law, and in fact the entire nation, by at least two thousand years. It appears in a book called The Bible which you may have heard of.

The only corporation that I can think of which might have a trademark on anything having to do with angels is the Anaheim Angels baseball franchise. My ads are for t-shirts which say “The angels have the phone box” and are wholly and completely unrelated to Anaheim Angels, baseball, or in fact the entire continent of North America. If you follow the link and look at the shirts, you will see there is nothing related to the team or the sport on the t-shirts. Heck, they’re not even related to the great sport of cricket.

The phrase on my t-shirts is from, and targeted at, the fan community around a British television show called Doctor Who. I fail to see how the word “angel” could be infringing on anyone’s trademark.

I wonder, does Robbie Williams’ song “Angels,” or the book and movie “Angels in America” by Tony Kuchner infringe on this same supposed trademark? What about the lyric “I see angels in the architecture” from the song “You can call me Al” by Paul Simon? What about the street named “Angel Kanchev” in downtown Sofia, the capital of the great nation of Bulgaria?

I’m no expert in trademark law, but I’m reasonably sure that it would be my neck or other body part on the line, not Google’s, if the supposed holder of this trademark on “angel” (possibly it is The Vatican?) decided to sue. Thanks for looking out for me, Google, but that’s a risk I’m willing to take. I wonder if they also have a trademark on the word “phone,” “box,” or maybe on “have” or “the.”

I’m not changing the text of my ad. That would be like requiring a toilet paper company to remove the word “paper” from their ad because Paper, Inc. held a trademark. And my t-shirts are a lot cooler than toilet paper anyway. So you can either re-approve my perfectly reasonable ad, or I just won’t run Google ads for my t-shirt, and you won’t get any money from me. Your call. I suppose you’re not really hurting for money over there at Google anyway. Let me know what you decide, and thanks for listening.

Why RedBubble kicks ass

Sunday, May 18th, 2008  

I’m always looking for clever t-shirt ideas, and ever since reading Alvin Toffler’s The Third Wave back in 1997, I’ve wanted to make my own custom t-shirts. I’ve made CafePress, Spreadshirt, and Zazzle stores for my own designs, plus a CafePress store for the Neighborhood Project, a Spreadshirt store for Mosuki, and a Zazzle store for my Burning Man camp. All three of these websites suck, in various ways.

RedBubble is the new kid on the custom t-shirt design block, and it kicks these three competitors to the curb1. I’ve moved all my designs there. To explain what they’re doing right, first I’ll explain what these three competitors are doing wrong.

CafePress is dog slow and riddled with quirks. Browsing through my private designs, a few of the product previews show up as broken images. Background images and UI graphics re-load on every page change, slowly. Designing a new product is a complex, multi-step process. First you choose blank apparel or household items and add them to your shop. Then dig through pages and pages of FAQs to find the exact DPI and pixel dimensions for the particular item. Then fire up your image editor and resize your graphic to match. Upload the image to your “media basket.” Then go find your blank item, and add the graphic to it. If you add a graphic of the wrong size — like putting the 200DPI version of your graphic designed for a coffee cup onto a 300 DPI t-shirt instead — there’s no warning and no visual feedback. You, or your customer, will just get a badly pixelated product. And worse, if you upload an image that’s too large — it will be badly down-sampled, and look almost as bad as an image that was too small. They added support for dark apparel at least as far back as 2006 — yet their product previews still don’t look right. This is basic image manipulation, not rocket science.

Spreadshirt has two different printing techniques. The better-quality one, “flock print,” can’t print designs that are too detailed, so they require you to wait several days for each design to be approved by a human at Spreadshirt. Like CafePress, creating a shirt involves first uploading a design, and then adding it to a product. Their shirt designer is a pure flash widget in a pop-up window which takes over a minute to load on my fast net connection and has all the standard Flash problems; high CPU usage, no keyboard navigation, no scroll wheel support, etc., etc. It has a bunch of controls and widgets, some of which I have never needed and others which I don’t understand. I ordered this shirt from Spreadshirt, after their customer service confirmed that the black in the image, around the diamond with rounded corners, would not be printed. The shirt had a salmon color in place of red — that’s #ff0000 red — and the design’s edges had been sloppily cut, totally ignoring the rounded corners. It was so hilariously bad that I didn’t bother asking for a replacement. Each shop even has a bunch of settings and fields, including “title,” and “shop name,” (only one of which actually appears inside your <title> tag — the other appears to be ignored), two different “description” fields (one of which also appears to be ignored) and the cryptic “Product choice-Display category type.”

Zazzle replicates the multi-step product creation process of CafePress and Spreadshirt. Their product design UI is, thankfully, AJAX and not Flash, and you can upload an image to your “gallery” inside the product design process, although it takes three clicks and about two minutes of waiting for the UI to load before you actually get to the HTML file upload widget. Like Spreadshirt, their product designer has a suite of widgets to position and transform your image, and it takes several clicks to get the final product up on your store. I’ve ordered three shirts from Zazzle – the first I sent back because it was printed at such low quality I assumed their printer was running out of juice. I was wrong; when I received the replacement, it was just a tad bit better. The second was a retro design; I was planning on the poor printing adding to the retro charm, and it did.

So what makes RedBubble so much better?

RedBubble’s print quality is superior. If all these sites are screwing it up so bad, full color, digital printing on fabric must be a really difficult problem, right? If it is, RedBubble has solved it. I bought two designs on RedBubble that were too good to pass up, despite the poor print quality I’d come to expect from design-your-own t-shirt sites. And guess what? The shirts look great — you have to look really close to see that they’re not actually silk-screened in five different colors.

Maybe designing t-shirts on the web is just a complicated, difficult process? Nope. RedBubble’s t-shirt design process is extremely simple and quick. You select a 2400×3200 image to upload, and click save, and you have a t-shirt ready for sale. What about positioning, scaling, adding text, and compositing multiple images, like you can do on these other sites? RedBubble doesn’t provide any on-site UI to help you do these things. And they shouldn’t. People who design t-shirts — especially the good designers — are using Photoshop, Illustrator (or GIMP & Inkscape) and their ilk, to begin with. Those programs are going to do a much better job at tweaking your image than some Flash or AJAX web app coded by that nerdy intern from last summer with a 250×250 product preview window and a bunch of buttons with icons your users haven’t seen before. Rather than maintaining their own inferior design and preview widget, RedBubble gets out of the user’s way.

In RedBubble’s shirt design process, you can also pick the default shirt type, the available colors, and the default colors, and add a title, description, and tags, but all those items are optional. You can design a t-shirt in three clicks.

The theme here is that RedBubble’s superiority is distinguished as much by things that it does better as by those it doesn’t waste time with. There’s no “store,” and no concept of multiple stores on a single account, just a bare-bones profile. There’s no site-wide marketplace in addition to your store. There’s no way to customize your product list’s colors, logos, or background. Custom layout is not necessary, since (unlike it’s competitors) RedBubble’s default site colors are clean, simple, and don’t detract from your designs.

RedBubble doesn’t let you make hats, sweatshirts, panties, aprons, mousepads, buttons, magnets, coffee mugs, dog t-shirts2, baby aprons, pet bowls, or light switch covers. Just t-shirts, posters, prints, and calendars. Their interface, and the underlying code, is vastly simpler because of this — there are no “choose product” or “add products to store” steps. And I bet t-shirts, posters, prints, and calendars make up a very large percentage — like 70% — of CafePress’, Spreadshirt’s, and Zazzle’s revenue. Doing less gets RedBubble to market quicker, gives them a simpler product, and makes them a more agile competitor.

RedBubble has also built a community, and channels to keep users on the site and bring them back. You can give people positive feedback for their work by “favoriting” it or by “watching” them. You get summary emails with new work by people you’re watching, comments related to your work, and so on, drawing users back to the site over time. I was overwhelmed when I got even one comment on the first design I posted. I’ve now got thirteen comments on sixteen designs; compared to just two comments on my entire Zazzle store. And the comments, favorites, tags, and watchlists mean there are more users and t-shirts on every page to click on, making their site almost annoyingly sticky.

There’s only one thing that RedBubble is missing. They need to let you print on the front and back of a shirt. And I bet they’re working on that.

RedBubble is following all of those pithy little maxims for building a successful website:

  • Keep it simple — your product, your message, and your interface.
  • Do one thing and do it well.
  • Get out of the user’s way.
  • Build a community and keep it happy.
  • Always have something cool for the user to click on and look at.

So long, CafePress! Farewell, Spreadshirt! See ya later, Zazzle! Me and my custom t-shirt designs will be hanging out over here on RedBubble from now on.

  1. Threadless and Oddica, although good sites with very good printing, live in a different neighborhood because they both only print a small set of submitted designs. And my designs are generally too weird to win any beauty contests. 
  2. What kind of person buys a custom dog t-shirt, anyway? 

The third flavor of focus-follows-mouse

Saturday, April 26th, 2008  

Steve Yegge’s excellent Settling the OS X focus-follows-mouse debate explains why OS X’s application-centric paradigm, with its application-global menu bar, doesn’t work so well with focus following the mouse but no automatic window raising. Background windows, attached to background applications, can’t, and aren’t expected to, listen for modifier+key events, because the application’s menu isn’t active.

The lack of focus-follows-mouse on OS X is one of the biggest reasons that I stick with Linux and Xorg on my main machines. Whenever I use one of my Macs for an exended period of time, I feel like a marathon runner who’s had to trade in his sleek running shoes for a pair of swimmer’s flippers. If there was a third-party tool to provide focus-follows-mouse on OS X that worked properly, I’d install it in a heartbeat.1

Yegge also points out that the auto-raise flavor of focus-follows-mouse is a taste only an epileptic could love. Set the auto-raise delay too low, and moving your mouse across big windows towards a smaller target window, or moving it too slowly, causes a cascade of ugly, annoying window raises and destroys your carefully crafted window tabbing order. Set the auto-raise delay too high, and you’re waiting too long for windows to focus once you’ve got the mouse there. In my experience, there’s no delay setting that works — every setting is too high, too low, or both.

But there’s another flavor of focus-follows-mouse, that, as far as I know, is only available via a third-party plug-in to the semi-abandoned and deeply buggy Sawfish window manager. It’s a flavor that is evocative of ripe nectarines and raspberries on a summer afternoon. And it’s so good it’s kept me using Sawfish despite its abandonedness and bugginess.

It’s called stop-focus, and it works like this:

  1. While the mouse is moving, don’t raise windows.
  2. Once the mouse has stopped, raise the window it stopped on.

“Stopped” is defined as below a certain configurable velocity (ten pixels per second works for me), and windows are only raised after a short (200ms) delay. Stop focus lacks the cascading window raising behavior of auto-raise. Moving the mouse across slowly across several large interim windows doesn’t raise them or screw up window tabbing order. Starting to move the mouse to another window, and then stopping and moving it back to the currently focused window, which you and I do more often than we care to admit, doesn’t cause any window raises at all.

So, while focus-follows-mouse without raise-on-focus (Yegge’s preferred autofocus), may not be feasible on OS X right now, there is a variant on focus-follows-mouse, with sane rules about when to raise the window under the pointer, that might make all us old focus-follows-mouse Unix relics happy.

If I were an OS X hacker, I’d probably just go code this up right now. But I’m not, because, among other things, OS X’s mandatory click-to-focus bugs me too much. Chicken, meet egg; Egg, meet chicken.

  1. One of the other big reasons I don’t want to switch to OS X is that, for reasons I won’t get into here, the global menu bar really bugs me. Funny that these two gripes of mine turn out to be intimately connected. 

Ruby’s not ready: comments, corrections, and clarifications

Thursday, April 10th, 2008  

Some good discussion on this one. It’s nice to see Ruby people saying things like this (5th message from the top, from Song Ma):

Interesting. But what I am thinking about is not the attitude of the author, but the points he was trying to make. The deep review and discussion will benefit the language insights.

Or this one (from Trans, on the same forum):

Why is everyone getting so worked up? It’s a critique. Biased it may be, but that in itself does not make it worthless. In fact, it can be very constructive b/c it uncovers “attack points” with the language. With each point we can ask ourselves objectively is this a misconception or a fair point? In either case we have an opportunity, to address misconceptions in our Ruby evangelizing blogs and to work to improve Ruby where a point has merit.

Bias can work both ways. But I think the Ruby community can rise above it, and Ruby will be all the better for it.

And from Peter Cooper at Ruby Inside:

As it is, I think he’s missing the point a lot of the time (he tends to think Python’s better because he likes its conventions more than Ruby’s – not a compelling argument), but it’s an interesting read none the less. Anything that keeps our minds open to the fact that Ruby != perfection is worth a look.

And a comment on the same post:

Let’s take his best points and incorporate them into future versions of Ruby.

Sounds like a plan.

I saw a few counterarguments like this:

Everything he’s saying is well known.

Just because a problem is well known inside a community doesn’t make it any less of a problem.

Everybody who mentioned documentation, even those who disagreed strongly with the rest of my post, agreed that Ruby’s documentation is seriously lacking. In fact, a lot of the mistakes in my original post are due to me not being able to easily find an explanation of something on the various Ruby doc sites. Which leads me to…

(more…)

Ruby’s not ready

Monday, April 7th, 2008  

Introduction

A few weeks ago, I learned Ruby and Ruby on Rails to compare them head-to-head against Python and Pylons, in preparation for a new project. When I began, I knew nothing about Ruby or Ruby on Rails. I have tried to be as objective as possible: before beginning this project, I wrote in email on March 5th:

I promise we’ll be as objective as humanly possible; if Ruby and Ruby on Rails truly is better, we’ll happily use RoR and never look back. I want to know that I’m using the absolute best tool for the job.

Since then, I have reimplemented one complex nine-hundred line Python library, PottyMouth, in Ruby. Another team member has also reimplemented parts the Pylons web application Spydentify in Ruby on Rails.

The best tool for the job is Python & Pylons. While Rails and Pylons are similar, shortcomings in Ruby compared to Python make Python & Pylons the clear choice. I make three basic arguments against using Ruby:

  • The language and its implementation are incomplete and immature. Immature implementations breed performance issues. A project loses time when it must implement missing or incomplete functionality.
  • The language is inconsistent and needlessly complex. Inconsistency and complexity confuses people and confusion breeds bugs.
  • The documentation is incomplete. Incomplete documentation breeds bugs as you might misuse a feature. And a project slows down while you read the language or library source code, or ask the community for help with undocumented features.

I believe Ruby would fare poorly against other languages, not just Python, on these angles as well.

Why have Ruby and Ruby on Rails gained so much traction, despite these issues? Aside from the Rails hype, it’s because they are not insurmountable issues. It is possible to build a large application in Ruby; many people have. But any programmer building a large application in Ruby will have to deal with the issues listed here at some point. These are all issues that do not appear right away. A project doesn’t face them until a website reaches maturity, develops lots of features, fields traffic from lots of users, or until a project hires programmers who aren’t Ruby experts or experienced enough to anticipate these issues.

My point is simply that Python (and other languages), allow you to handle most of these issues more elegantly, or avoid them completely.

Subscribe here if you’d like to be notified of any follow-up posts (for an article this long, I’m sure there will be a few), or if you’d like to read my critiques, positive and negative, of other things technological.

Contents

  1. Unicode and encodings
  2. Regular Expressions
  3. Documentation
  4. Migration to Ruby 1.9/2.0
  5. Performance
  6. Scoping
    1. One nice thing about Ruby’s scoping
  7. There’s more than one way to do it
    1. String conversion
    2. print, p and puts
    3. Ranges and slices
    4. require and load
    5. Raising exceptions, throwing strings
    6. do and then are extraneous
    7. length and size, update and merge
  8. Object model
  9. Faking keyword arguments
  10. Libraries
    1. SAP support
    2. DateTime support
  11. Debugging
  12. Rails & Pylons
  13. Cool things about Ruby
  14. Conclusion
  15. Further reading

Unicode and encodings

If you’re already familiar with Ruby’s problems with Unicode, feel free to skip this section. Ruby did not have any support for Unicode character strings when it was originally released in 1996. This is only slightly silly for a language that was invented after Unicode 1.0 was released in 1992. It is inexcusably shortsighted that Ruby has not added Unicode objects over the last twelve years.

A third-party Ruby library for conversion ties into the Unix iconv program, allowing conversion between two different encodings. However, converted strings are still sequences of bytes. This means that using most of the string methods (slice, reverse, size, index, downcase, upcase, strip) and indexing into the string with [] notation do not work in non-ASCII encoded strings. You can get the desired results out of these methods by first accessing the .chars attribute of non-ASCII strings. This is less desirable because the programmer must remember to use .chars whenever he or she is working with non-ASCII strings.

A better solution would be to support first-class Unicode objects, as strings of Unicode characters, natively in the language.

There is a third-party Unicode support library that replaces Ruby’s String class and adds Unicode support, but it is acknowledged to be hackish, potentially dangerous, and makes Ruby somewhat slower.

Unicode support may or may not be forthcoming in Ruby 2.0. There are certainly members of the community advocating it.

This means, among other things, that there is no built-in support in Rails’ HTML generation1 , for converting Unicode characters to HTML entities. This page details how to hack around this problem; but this is something that should be automatic and built-in, not hacked around.

Python’s built-in Unicode and encodings support, which is a first-class, native Unicode object and a full suite of built-in encodings, was introduced in Python 1.6 in 2000. It has evolved into an extremely reliable, secure and versatile Unicode implementation. It is also extremely simple to use.

Python supports all of the encodings that Ruby supports via iconv, and a number that it doesn’t, including Quoted-Printable, the encoding used for the vast majority of email messages, and MBCS, the encoding used by Windows FAT32 and NTFS file-systems.

Because Python’s built-in Unicode support is so robust, the vast majority of Python libraries all convert to Unicode when accepting input, and convert to the proper encoding when producing output. Multi-language support and correct encoding handling is usually a non-issue when building a Python application. For example, non-ASCII input to, and output from, a Pylons web application Just Works™.

Regular Expressions

For a language that borrows so heavily from Perl, the regular expression support in Ruby is pretty disappointing. Regular expressions might not seem like a very important part of a language, but it’s an interesting litmus test because Python, Perl, and JavaScript all support essentially the same regular expression syntax. Ruby’s regular expressions, however, were so broken that I switched to Ruby 1.9 to finish porting PottyMouth.

Ruby’s Regexp::MULTILINE flag doesn’t behave the way multiline does in other languages. In other languages, the multiline flag is off by default, and when enabled, it considers . to include newlines and ^ and $ to match right after, and right before, every newline:

In Perl:

if ( "foo\nbar\nbaz" =~ /^bar/m ) { print "yes\n"; } else { print "no\n";}
yes
if ( "foo\nbar\nbaz" =~ /^bar/ ) { print "yes\n"; } else { print "no\n"; }
no

In Python:

>>> import re
# This matches
>>> re.search('^baz', "foo\nbar\nbaz", re.MULTILINE)
<_sre.SRE_Match object at 0xb7c4bf38>

# This does not match
>>> re.search('^baz', "foo\nbar\nbaz")

However, in Ruby, the Regexp::MULTILINE flag appears to only affect the interpretation of ., not ^ and $, making it more like Python’s re.DOTALL or Perl’s /s switch.

in Ruby:

irb(main):001:0> /^baz/.match("foo\nbar\nbaz")
=> #<MatchData:0xb7cd42b0>
irb(main):002:0> /^baz/m.match("foo\nbar\nbaz")
=> #<MatchData:0xb7cdb740>

irb(main):003:0> Regexp.new('^baz').match("foo\nbar\nbaz")
=> #<MatchData:0xb7cc7df8>
irb(main):004:0> Regexp.new('^baz', Regexp::MULTILINE).match("foo\nbar\nbaz")
=> #<MatchData:0xb7cb59dc>

There is no documentation whatsoever of the actual semantics of Regexp::MULTILINE, so it’s not clear whether this is an accident, a bug, or an intentional departure from the standard. Either way, it makes the language more difficult to learn and less predictable to use.

There’s also no documentation whatsoever of the actual semantics of Regexp::EXTENDED. The eregex.rb file in the Ruby source just adds support for & and | logical operators, and the only documentation is the message “This is just a proof of concept toy.” As best I can tell, regular expressions in Ruby always behave like extended regular expressions, supporting ?, +, | and \N, regardless of whether you use the extended flag or not. What does the extended flag actually do? I don’t know.

Ruby’s Regular expressions also match only ASCII and a small set of encodings, including UTF-8 and the Japanese encodings EUC and SJIS. Want to write a regular expression that matches UTF-16, Latin-1, or raw Unicode? You’ll have to use the third-party Oniguruma package or a different programming language. You can’t use pure Ruby.

Lastly, positive and negative look-behind aren’t supported in Ruby 1.8. I only noticed this because the code I was porting used negative look-behind expressions. Ruby 1.9 adds look-behind. The options for Ruby 1.8 users are to install 1.9 or the third-party Oniguruma package, which also supports many more encodings (but still not raw Unicode).

Both Python and Perl support positive and negative look-behind and Unicode regular expressions natively.

In general, it’s a bad sign when a third-party reimplements a large chunk of functionality in an existing piece of software. It means that the existing functionality was just plain not good enough. And, for open source projects, it means that the existing project was unable, or unwilling, to solve the problem, or let others contribute patches to solve the problem, within the project. The fact that this happened for both Ruby’s encoding and regular expression support is disturbing.

Documentation

The Standard Library Documentation for Ruby is woefully incomplete. For example:

  • There is no documentation whatsoever for:
    • the digest library, which contains the SHA1 and MD5 check-sum tools. These tools are critical for generating secure cookies and storing user passwords securely. Without documentation, you have to go read the Ruby source code to know that your application is secure.
    • Racc, a LALR(1) parser generator for Ruby
  • The documentation for gdbm only includes a list of constants it defines. No methods, descriptions, or anything else.
  • The documentation for the syslog module is useless. It lists one method, close. A useful syslog library would have to have at least open and write functionality.
  • The link to the tcltklib module documentation returns an error page.
  • The Profiler documentation is extremely limited. It looks like it would be possible to use the profiler, but there’s no information about how it works, which is critical when you are profiling an application.
  • As noted above, the regular expression documentation doesn’t cover MULTILINE or EXTENDED.

In general, the majority of modules listed have no description page. None of the pages specifically state which version of Ruby they were written for.

Ruby’s development and documentation writing appear to be two disconnected endeavors, and the documentation is acknowledged to be incomplete. In fact, there is no single rally point for Ruby material. A visit to the the official Ruby documentation page lists a variety of documentation, tutorials, examples, etc. spread across many different websites with varying levels of completeness and relevancy to the current version of Ruby. There is no single tutorial or language overview which is complete for the current version of Ruby 1.8.x (over four years old).

This doesn’t inspire confidence. Are there any libraries that aren’t listed at all? And how many of the existing libraries have documentation that is incomplete, out-of-date, or incorrect?

By contrast, Python’s standard library documentation is complete, versioned and dated.

Migration to Ruby 1.9/2.0

It’s not clear what’s happening with regard to Ruby 1.9 and/or 2.0. Ruby 1.9 has been under development since (at least) 2006. (It may have been under development longer than that; the lack of any official documentation about it makes it hard to know for sure. This podcast claims it’s been around longer than Perl 6, which would make Ruby 2.0 almost as old as Ruby itself.) An experimental/development version, 1.9.0, was released in December 2007. I tested against the version of Ruby 1.9 in Ubuntu 7.10: 1.9.0+20070830-2ubuntu1.

Quite a few things that are allegedly new in Ruby 1.9 actually exist in Ruby 1.8, and it’s not clear whether they’ve been back-ported or whether their behavior has only subtly changed.

For example return value unpacking, % string formatting, and newlines inside the ternary operator are supposedly new in 1.9, but work exactly the same in 1.8. Other things that are supposed to be introduced in Ruby 1.9, like multiple splats, don’t (yet) work at all in 1.9.

Other improvements in Ruby 1.9 include literal hash syntax, block-local variables, and, as already noted, better encoding and regular expression support.

Some documents indicate that Ruby 2.0 is going to be different in ways that will break existing Ruby 1.x programs severely. They also contain disturbing statements like this one about the new garbage collector: “It will be (mostly) thread safe.” Being (mostly) thread safe is like being mostly pregnant. You either are, or you aren’t.

There are two explanations for this lack of clear plan for Ruby 2.0. Either Ruby 2.0 is so far off that no such document would be useful yet, or nobody in the Ruby community has thought about these issues yet. Both of these would be bad signs. Either way, it’s a total mystery how difficult it will be to move to Ruby 2.0, or when that move might have to happen.

Python, on the other hand, has been in the 2.x series for a long time. Planning for Python 2.0 began while Python 1.5 was the current version. In September 2000, as Python 1.6 was released, there was a complete outline available of what to expect from Python 2.0. Python 2.0 was released in October, 2000. Programs written eight years ago for Python 2.0 will still run, unmodified under Python 2.5. Many Python 1.x programs will also run under 2.5.

Python 2.6 and Python 3.0 are slated for release this summer. The Python 3.0 process has been going on for about a year. There is a clear outline of exactly what’s changing between 2 and 3, and guidelines for how to write Python code that will run equally well under 2.6 and 3.0. The Python developers are also providing a conversion program that will automatically translate between 2.6 and 3.0 code, and warn programmers about code it was not able to translate.

Python proves that a programming language can evolve safely, easily, and largely free of hassles. Future versions should not be a potential wild-card (or worse, a complete clusterfuck, as with PHP).

For a piece of software that’s going to be the core of your business for as long as you are in business — hopefully many, many years — why choose anything other than a language with a migration process like Python’s?

Performance

It’s difficult to precisely evaluate the difference in execution time between different languages. However, The Computer Language Benchmark Game gives a pretty strong indication that Ruby is slow. On its tests, Python is 3×-4× as fast as Ruby. Ruby is slower than TCL, a language that is twenty years old. Ruby is about the same speed as JavaScript (in Mozilla’s SpiderMonkey interpreter). The only thing slower than Ruby is Prolog.

The notorious Rails is a Ghetto article outlines performance problems in Ruby and Rails that, disturbingly, went unaddressed for long periods of time. In the worst one, the author reported serious performance issues to the Rails community, which largely ignored the problem or denied its existence. Meanwhile, the problem had been identified and patched by someone else, but the Ruby core developers ignored the patch for a year.

In another incident in the same article, the original Rails author admits that the original Rails code required about four hundred restarts a day, or six to seven restarts per thread per day. Four hundred restarts a day means four-hundred chances for a database transaction to fail, four hundred chances for a verification email to be sent by the system without the corresponding data being stored in the database, four hundred chances for the user’s browser to not receive all the data it needs to correctly render a page or display data.

Even for a project for which performance is not the primary concern, these trends should be cause for concern. Serious performance issues mean buying more RAM, and upgrading servers sooner.

Scoping

Ruby’s scoping rules are complex:

  1. Files, modules, classes, defs and blocks create new scopes.
  2. Local variables have no sigil and begin with a lowercase letter. They are available only in the scope they are defined in.
  3. “Constants” have no sigil and begin with an uppercase letter. They are available in the scope they are defined in and in all enclosed scopes.
    1. “Constants” are not constant; they can be reassigned whenever you like, just like everything else.
  4. Globals begin with the $ sigil and are global.
  5. Instance attributes begin with the @ sigil and are, by default, protected, or available only inside the class.
    1. Instance attributes can be made available outside the class with attr_accessor or attr_reader.
  6. Class attributes begin with the @@ sigil and are available only inside the class.
    1. Unlike instance attributes, class attributes cannot be accessed outside the class with attr_accessor or friends.
  7. Methods are, by default, public.
  8. Methods can be made private or protected with the private or protected keywords.
    1. protected doesn’t mean what you think it means. Both private and protected methods are available within the class and within all containing subclasses.

The terminology used to refer to non-constant “constants” is extremely unfortunate.

Why isn’t the full range of public/protected/private scopes available to attributes as well as methods? Why is a totally different convention used to scope attributes? Why can’t class attributes be accessed outside the class like instance attributes?

What is the point of the subtle, weird difference between protected and private? What problem does it solve? Why don’t protected and private work the way they do in Java and PHP?

Clear, consistent, simple scoping makes it easy to keep track of what variables are available where. Complex scoping rules mean there’s more to remember, there are more mistakes to make and more ways to get confused. Mistakes and confusion cause bugs.

Python uses just a naming convention to convey whether a variable should be thought of as private or protected. Both Python and Ruby can be monkeypatched to modify private and protected attributes or methods, so it’s best to think of private and protected as purely advisory in either language. Experienced Pythonistas learn that someobj.__private__ is a red flag; the fact that you must always monkeypatch to do this in Ruby might provide an additional disincentive to doing it, but it also makes it easier to do it on accident.

One nice thing about Ruby’s scoping

There’s one place where Ruby’s scope behavior is better than Python’s. Default argument values in function definitions are (re)evaluated each time a method is called in Ruby. In Python, default argument values get evaluated in the containing scope when the function is defined. This can get you into trouble in Python, if a default value is a mutable type. If you’re modifying the value, it’ll persist across subsequent calls to foo:

def foo(arg=[]):

In Python, you end up having to do this:

def foo(arg=None):
if foo is None:
foo=[]
# some code

This is definitely less clear than in Ruby, where you can simply say what you mean:

def foo(arg=[])
# some code
end

There’s more than one way to do it

We can thank Larry Wall and Perl for There’s more than one way to do it. TMTOWTDI is bad, because to really know a language, you must know each of several ways to do similar, but different things, and each synonym. If there is only one way to do it, you only have to remember that one way, instead of many. Programmers spend more time reading code than writing it, and often, they’re reading other people’s code, so they can’t get away with remembering only their favorite way to do it. The more you have to remember, the more likely you are to forget, make a mistake, or have to stop to check the documentation. Mistakes breed bugs, and checking the documentation takes time. While not nearly as bad as Perl on this front, Ruby commits some serious TMTOWTDI.

String conversion

Some Ruby objects have an extra stringification method, .to_str, as well as the standard .to_s. .to_s is an explicit cast, used whenever you need a string representation of an object. .to_str is an implicit cast, which gets called when you are using a string-like object in a context that requires a string. (This illustrates a philosophical difference between Python on the one hand and Ruby and Perl on the other; Python never does context-sensitive implicit conversion.)

The naming of these methods is atrocious — they are radically semantically different, yet the name of one is an abbreviation of the name of another. What happens if you write code that critically relies on this distinction, go work in another language for six months, and then get called in to fix a critical production bug in that code? Would you remember which is which, and what the difference was, exactly? I wouldn’t. And the presence of both has confused people other than me. .to_str should be named something like .stringcontext.

What is the use case for to_s‘s concatenation of arrays and hashes? It just runs keys, values, and items together in a string, making it impossible to tell whether it was a number, a string, a hash or an array that you just stringified:


irb(main):001:0> h = {1=>2}
=> {1=>2}
irb(main):002:0> a = [1,2]
=> [1, 2]
irb(main):003:0> h.to_s
=> "12"
irb(main):004:0> a.to_s
=> "12"
irb(main):005:0> a.to_s == h.to_s
=> true

When is this useful? It’s not human-readable, and it’s not computer-readable. It’s just mangled garbage. It’s even worse when you call to_s on more complex data structures:


irb(main):001:0> c = [1, :two, {:foo=>['bar', 'baz'], :key=>'val'}, [7,8], 9]
=> [1, :two, {:key=>"val", :foo=>["bar", "baz"]}, [7, 8], 9]
irb(main):002:0> c.to_s
=> "1twokeyvalfoobarbaz789"

As a side note, Python has two stringification methods as well, str() and repr(). str() provides a string representation, and repr() provides a string that can be eval()-ed or pasted into a Python interpreter. Ruby appears to have no equivalent of repr(), aside from p, which leads to the next topic….

print, p and puts

What is the difference between print, p, and puts? This isn’t documented. p is a sort of poor-man’s repr(), printing each argument in a form that could be pasted into Ruby source code on a separate line. Strangely, there’s no way to capture the output of p and store that string for later. print prints each of its arguments without any space between them. puts prints each of its arguments, or each item in each collection argument, on a separate line. Why does Ruby need all three? Are you going to be able to remember which one behaves each way, and use the right one at the right time?

Not only does Python get away with one, but the behavior of print in Python is exactly what I’ve wanted out of print, or printf, in every programming languge I’ve ever used — print the str()-ification of every argument, separated by spaces, with a newline at the end. If you want all the arguments concatenated, or on separate lines, you can join on empty string, or "\n". If you don’t want a trailing newline, use a trailing comma.

The behavior of puts is even weirder; it seems to stop descending into collections at some point. Note that the hash item is just stringified, but the items in the array inside the array are printed on separate lines:


irb(main):001:0> c = [1, :two, {:foo=>['bar', 'baz'], :key=>'val'}, [7,8], 9]
=> [1, :two, {:key=>"val", :foo=>["bar", "baz"]}, [7, 8], 9]
irb(main):002:0> puts c
1
two
keyvalfoobarbaz
7
8
9
=> nil
Ranges and slices

Ruby has two range operators, .. and .... Why? The only difference is that the shorter one, with two periods, returns a longer range, and the longer one returns a shorter range, not including the endpoint. Can this possibly get any more confusing? 2 The language should have one, not two. It is not hard to add or subtract one when you want a range with, or without, its endpoints.

Python’s range() and xrange() built-in, and its slice syntax, are less confusing, as they never include the endpoint. They are also more powerful, because they allow a third “step” argument. Want every other, or every third, element in a list, or a range that steps by 2 or 3? Try l[::2] or l[::3], range(start, stop, 2) or range(start, stop, 3). Want the list in reverse? [::-1] Want a range in reverse? range(stop, start, -1). Python’s syntax is simpler, easier to remember, and more powerful to boot.

Ruby’s range operators are also used to see if a value is in a particular range, like this:

irb(main):001:0> (0..2**1000) === 2**999
=> true

This has a disturbingly clever ring to it. The range operator doesn’t actually walk through every element in the 2**1000 element range and compare it to 2**999 — if it did, this code wouldn’t execute instantaneously. It’s doing something like this underneath: 2**999 >= 0 and 2**999 < 2**1000. The only reason to use a range operator like this, when it’s about as much typing to just say what you mean directly, is when you have a range that you’re passing around as a variable.

In Python, the corresponding idiom is 0 <= 2**999 < 2**1000, but the ternary comparison syntax doesn’t work in Ruby, so you have to write 0 <= 2*999 and 2*999 < 2**1000 in Ruby. Python’s xrange() can also be passed around like a variable, but you test for membership with value in myrange instead of ===.

Now imagine you’re someone who hasn’t seen Ruby before, or who has been working in some other language for months, who is now tasked with fixing a critical bug which relies on this strange, non-obvious idiom. Are you going to know, or remember, that === combined with .. has special semantics? Compare that to how difficult of a time you will have understanding lower < value < upper, or value in myrange, in Python code. Simplicity and straightforward syntax has a significant long-term benefit.

require and load

Ruby has two ways to handle code in other files require and load. The difference is that require loads the code only once per application, and load loads it each time the interpreter sees load. Yet again, there’s more to remember to be fluent in Ruby. This distinction is of dubious value; if you have code that you want to run more than once, put it in a method and call the method. Don’t make the interpreter re-load, re-parse, and re-run the file.

And there’s more. The Ruby interpreter requires/loads the file corresponding to the path specified by a string. Unlike Python, Ruby has no concept of set of paths to search for modules by name, so you see recipes like this to establish a file’s location and find modules installed on a system.

And because Ruby loads strings as paths instead of modules by name, you can trick the interpreter into accidentally requiring a file twice. Oops! Python provides __import__ if you need to reload a module, but by default, it only loads modules once per application.

And finally, since require and load just pull the contents of another file into your local namespace, there’s no simple way to pull in just a single class or variable from a module. And there’s no way to ensure that classes in the file you’re importing don’t clobber classes in the file you’re importing it into. Want to attack lots of Ruby applications? Just write a helpful library with obfuscated code that overrides a common class, in something like HTTP or cookie authentication code, and adds a back door.

Raising exceptions, throwing symbols

Exception handling in Ruby is handled with raise/begin/rescue. Python uses raise/try/except, and Java & JavaScript use throw/try/catch to perform essentially the same exception handling. But Ruby also has throw/catch, which is unrelated to exception handling. It is normally used as a way to achieve labeled break.

Now, labeled break is a feature that I’d very much like to see in Python, but this feature in Ruby is essentially goto — and it’s even more powerful than goto in C, since it is not confined to single functions. Rather than debating the merits of goto, I’ll just ask this: does Ruby have to use terms that are commonly associated with exception handling, for a feature that is totally unrelated to exception handling?

do and then are extraneous

Ruby’s while and if statements can optionally have do and then keywords following them:

while condition do
# some code
end
if condition then
# some code
end

This is just one more extra variation that Ruby programmers have to remember to be able to read other people’s code.

length and size, update and merge

What is the difference between the length and size methods on String, Array, and Hash? There is none. Hashes have update and merge methods. What’s the difference? None.

These are particularly atrocious synonyms, because the English words they are based on aren’t synonymous. What if you have a class representing a geometric object, and you want length and size to return different measurements? What if you have a class representing a wiki page or source code repository, and you want update and merge to perform radically different operations? When someone else is reading your code, and they’ve been trained that these two methods are synonymous in Ruby, and they might forget that the methods aren’t synonymous in this particular code.

Object model

Ruby doesn’t require self to be explicitly passed in to methods. Python has explicit self, and for good reason.

Rather than using self to get at class and instance attributes, Ruby uses @ and @@. You can get at self, to pass it to a method in another object, by calling self. And you can get at the superclass’s method of the same name by calling super. Arguably, if you need to get at a different method on the superclass, rather than that different method on self, then your object’s inheritance is broken. This is different from Python, but still fine.

You can delegate to another method on that class by simply calling that method. And here’s the problem with Ruby’s object model: because you don’t need to use @ to access methods, it’s too easy to accidentally shadow a method with a local variable.

There’s at least one case that requires self as an explicit reciever: when calling an attribute writer. Otherwise you’re just shadowing the attribute writer method locally. It’s not clear that there might not be other rare cases that require self as an explicit reciever too. This seems dangerous; in Python, self is always required. In Ruby, you almost always don’t need self, except in the rare case where you do. This feels like an accident, or an overly clever solution. Clever solutions make me suspicious, and inconsistency breeds bugs. Simple solutions, like Python’s strict reliance on explicit self, make me confident I’m writing reliable code.

Faking keyword arguments

Ruby doesn’t support keyword arguments. The commmon idiom to “fake” keyword arguments lacks the expressiveness and versatility of Python.

In Python, you can have a function definition like this:

def HTMLTag(tagname, parent=None, *children, **attributes):

And you can call this function in many different ways:

HTMLTag("br")
HTMLTag("div", parent=bodytag)
HTMLTag("div", bodytag, p1, p2, p3, width="100%")
HTMLTag("a", p1, href="http://google.com", *["google"])
HTMLTag("a", p1, "google", href="http://google.com",)
HTMLTag("hr", width=77, parent=div, height=4, color="#000")
HTMLTag("hr", **{'parent':1, 'width':77, 'height':4, 'class':'ruler'})

Keyword arguments with default values may not seem like a very critical feature to be missing. But it’s one of the most powerful idioms in Python, because there are a lot of cases where arguments act like configuration, modifying a function’s behavior. If you can leave these modifiers off in the common case, code is faster to write and easier to read; you don’t have to remember the common modifier values; and you’re less likely to use the wrong modifier.

Ruby does support the * expansion and collection of Arrays, similar to Python. And it does support default values for optional arguments:

def tallandskinny(height=100, width=1)
print height, " tall ", width, " wide"
end

tallandskinny()
# prints "100 tall 1 wide"
tallandskinny(1)
# prints "1 tall 1 wide"
tallandskinny(1, 100)
# prints "1 tall 100 wide"

But the optional arguments can’t be passed in as key-words in a different order. Ruby collects any key-value pairs in an argument list into a Hash, but that hash takes the place of a single argument position; it has nothing to do with the parameter names in the method definition. Here, the hash gets assigned to height and then stringified:

tallandskinny(:width=>100, :height=>1)
# prints "width100height1 tall 1 wide"

To duplicate Python’s keyword argument behavior, you have to write something significantly more complicated:

def tallandskinny(kwargs={})
defaults = {:width=>1, :height=>100}

kwargs = defaults.update(kwargs)
print kwargs[:height], " tall ", kwargs[:width], " wide"
end

People on the #ruby-lang IRC channel were quick to point me to snippets of code like this and say “Ruby can fake Python-style keyword arguments easily.” And they’re right, you can fake it. But users (programmers, in this case) shouldn’t have to resort to tricks to get a piece of software (a programming language, in this case) to work the way they want it to work. If users are doing this, it means the software has failed to provide the features its users need.

It looks like keyword arguments are at least under consideration for a future version of Ruby.

Libraries

SAP support

SAP support is critical to the application that I’ll be working on. Ruby’s SAP support is alpha, version 0.06, and hasn’t been updated in over a year. Python’s SAP support is 1.0, and has been around for four years and there is documentation written by a SAP developer.

DateTime support

Ruby supports Date and DateTime objects natively, but there’s no duration or timedelta support built-in. There’s only a third party Duration library written by the Rails people, no doubt to support the SQL duration type. It’s unacceptable that a duration/timedelta isn’t built in. Why not? Because if it were built-in, subtracting two dates could return a timedelta, instead of a Rational, as it does in Ruby:

require 'date'
irb(main):002:0> puts Date.new(2008, 03, 29) - DateTime.new(2008, 3, 28, 22, 8 )
7/90

It’s not helpful to know that the time delta between 2008-3-28 22:08 and 2008-3-29 is 7/90ths (of a day). What would be helpful is to know that it’s 1:52:00, like in Python:

>>> from datetime import datetime
>>> dur = datetime(2008, 3, 29) - datetime(2008, 3, 28, 22, 8 )
>>> print dur, type(dur)
1:52:00 <type 'datetime.timedelta'>

Debugging

Ruby tracebacks don’t print the line of code on which the error occurred. Compare these two tracebacks, each in programs that divide by zero three function calls deep:

hack.rb:10:in `/': divided by 0 (ZeroDivisionError)
from hack.rb:10:in `baz'
from hack.rb:6:in `bar'
from hack.rb:2:in `foo'
from hack.rb:13
Traceback (most recent call last):
File "hack.py", line 10, in <module>
foo(0)
File "hack.py", line 2, in foo
bar(arg)
File "hack.py", line 5, in bar
baz(arg)
File "hack.py", line 8, in baz
8/arg
ZeroDivisionError: integer division or modulo by zero

Often you can see exactly what’s going wrong just from a Python traceback, because you can see the line of code that was a problem. Debugging Ruby is slower and more difficult because it doesn’t provide this information.

By the way, Perl’s even worse than Ruby at providing useful information when there’s an error:

Illegal division by zero at hack.pl line 10.

Rails & Pylons

The Ruby on Rails and Pylons web frameworks are more or less comparable. A good chunk of Rails’ core has been ported to the Python webhelpers package, which is used by Pylons (and other Python web frameworks). There doesn’t seem to be any major features in one web framework and not the other. Pylons has in-browser debugging (off by default in production code) and, since it relies on existing, and pluggable, templating, ORM, and other modules, may be slightly more mature and flexible. Rails’ DB migration is more mature than SQLAlchemy’s.

Cool things about Ruby

Ruby’s block arguments have interesting potential, especially if you were writing a heavily thread-based or event-based application. Of course, a traditional MVC web application doesn’t really need threads or events (unless you’re writing a HTTP server too). Most places where I’ve used, or seen examples of, block arguments in Ruby are places where I would have used a list comprehension in Python. In other words, block arguments are far more powerful than their common use case.

Metaprogramming with Ruby clearly takes less code than in Python. I don’t think there’s anything that Ruby does that Python cannot, or vice versa, with regards to metaprogramming. I’ve needed real metaprogramming in Python extremely rarely, and I don’t know if I’d use the metaprogramming in Ruby any more frequently. The examples of metaprogramming with Ruby that I’ve seen (The Poignant Guide’s chapter, or the way ActiveRecord works) would have been doable, in Python at least, by inheriting from a base class and using class attributes on the derived class as configuration variables. So, whatever win that Ruby gets from easier metaprogramming is minor.

Conclusion

Ruby has standard libraries that are so poor the community has provided drop-in replacements. The documentation about the current and future versions of the language is extremely lacking. The core implementation of the language is not competitive with other interpreted languages. And the language itself is full of idiosyncracies and inconsistencies that are neither useful nor lend themselves to cleaner, simpler code. The language is not without promise or potential, but in its current state there is no reason to choose it over a mature, robust language like Python.

I’d like to thank Jeremy Avnet, Steve Hazel, Greg Hazel, and Ross Cohen for their comments and corrections on drafts of this article. Nonetheless, all inflammatory opinions and any inaccuracies are my responsibility. Subscribe here to read any follow-ups to this article.

Read the follow-up: Ruby’s not ready: comments, corrections, and clarifications

  1. Even the Rails documentation at noobkit.org, the official documentation for the official Rails IRC channel, can’t seem to get Unicode support working (scroll down to “3. Go to localhost:3000/ and get ‘Welcome aboard: You’re riding the Rails!’”). 
  2. Perhaps Ruby could also use ...., which would return an even shorter range, not including the beginning or end points.