Essays

The next big thing, part 3: Taking the relational out of relational databases

Monday, April 6th, 2009  

Part of an ongoing series.

The relational database is an extremely powerful tool. But sometimes data isn’t very relational, and sometimes transactional, relational, integrity is not as important as it is for, say, a bank. This is one reason why so many sites can get away with mySQL backed by myISAM tables — they’re fine if you’re read-heavy and data integrity is not mission-critical.

Some new projects have sprung up which provide key-value stores or simpler kinds of databases without all the overhead and inflexibility of a relational database.

On the other hand, sometimes data is way more interrelated than a traditional relational database is prepared to handle. Sometimes different kinds of items (i.e. rows) in a database can be related to many other kinds of items in that database, and sometimes end users can create not just new items or new relationships, but new kinds of relationships between items. This type of database is called a graph database, and there are also projects pushing the boundaries of relational in this completely opposite direction.

Pretty much everywhere I interviewed back in February 2008 was either building their own graph database, working on an existing one, or repurposing a relational database (or, in one case, a search backend), to kinda, sorta behave like one. The w3c, not one to be left behind when there’s a specification to be written, is even working on a SQL-inspired query language intended to search them1.

Most applications have some combination of totally un-relational data that can go in a key-value store, some strictly relational data that belongs in a SQL database, and some flexible, highly relational data that belongs in a graph database.

What will happen when these alternative databases start giving traditional relational databases a run for their money? Well, sharding, caching, and normalization all start to sound a lot more complex when the data is in a few different kinds of databases — but then again, maybe optimization won’t be as necessary if a single SQL database isn’t doing all the heavy lifting. Object-relational mappers (and the web frameworks that use them) might need to talk to, and abstract away from, different kinds of databases2.

And the different types of data won’t always be easily separated along table boundaries. Maybe these different types of databases will talk to each other, or maybe they will mature into über-databases that understand lots of different types of data relationships.

But the monolithic, strictly relational, master SQL database is eventually going to go the way of Cobol3.

  1. Of course, if it’s anything like other technologies designed by the w3c, it’s a steaming pile. 
  2. Some can already handle talking to multiple SQL databases, and of course there’s two-phase commit
  3. Or Kobol

The next big thing, part 2: Taking the web out of web applications

Monday, March 30th, 2009  

Part of an ongoing series.

A web application is just a stateless1 application that responds to various requests by performing actions and providing resources. There’s no fundamental reason an application must only communicate over HTTP. Web applications are going to start adding alternative methods of interaction, and I think the first common one will be email.

Perhaps an example will best illustrate this:

Like many web forums, posts to Mosuki‘s discussion forums get mailed out in email. But, unlike any other web forums I know of, they also behave like mailing lists. All the emails have a reply-to header with an email address that identifies the message, the recipient, and the action to be taken if that email address is used. In this case, the contents of a reply email are posted to the forum exactly as if a reply had been posted via the website.

In other words, the action “post a message” can be accessed via a web page and a browser or via a reply-to header and your mail client.

There are other examples of this separation between input/output channels and the application logic. The most obvious is Twitter, which of course can be interacted with via HTTP or SMS2. And the Son of Sam project intends to let you “use modern concepts like handlers, requests, responses, state machines” to interact with email.

Confirm a Facebook friend request, RSVP to an Evite, revert a Wikipedia edit, or reassign a bug report, just by replying to an email or sending an SMS.

There are a number of technical issues inherent a system like this.  An application’s framework has to handle multiple input channels, and massage email bodies, HTTP requests, and other input into a least common denominator “request.” Authenticating a user via email, an intrinsically forgeable medium, and protecting against spam, are non-trivial challenges. And a suite of templates suddenly gets a lot more complex when it has to provide views for multiple types of interfaces3 .

This blurring of the line between email, HTTP, SMS, and other communications is not new, strictly speaking. But I think it will become commonplace and even expected. Rather than writing a modern (MVC, stateless, REST-ful, &c.) web application, people will be writing modern (MVC, stateless, REST-ful, blah, blah, blah) applications that have web interfaces, email interfaces, and whatever other interfaces they need.

Stay tuned for the next installment of The next big thing: Taking the relational out of relational databases.

  1. More or less stateless, that is, authentication tokens like cookies notwithstanding. 
  2. As well as more standalone apps than you can shake a stick at. 
  3. Generating text and HTML responses for email that look good and work well in the top 75% of desktop and web email clients is a lot harder than testing a site’s HTML in Firefox, IE, Safari and Opera. 

Why your all-graphic website sucks

Friday, March 13th, 2009  

Using only graphics to build a website is 1996′s version of using Flash to build an entire site. Why?

  1. Your users can’t copy and paste the text. You know, if you were, for example, promoting an event and someone wanted to copy the event description into an email or onto an events website.
  2. They can’t scale the text up — even though Firefox’s page zoom will scale the text-images up, they won’t get easier to read, just uglier.
  3. Inline search doesn’t work.
  4. Screenreaders and webcrawlers are out of luck.
  5. And the page takes forever to load. What’s that? Load times don’t matter so much anymore, now that most people are on DSL? Try loading this page on your phone, over Edge. Blazing fast.

The photo credits are text, not images. The author of this page can’t plead ignorance of how to put text into a web page.

An all-image website doesn’t get in the way of proper scrolling, UI widgets, and functioning URLs (although the URL to this one seems a bit redundant). So building an entire site out of Flash is dumber than using images for all your text. That’s really saying something.

P.S. At least their images are properly transparent PNGs.

P.P.S. At least they didn’t lay the page out using <table> tags. <div> and <span> FTW!

P.P.P.S. This post should not be misinterpreted as denigrating the venerable Cacophony Society or the Brides of March. All denigration is directed soley at their web design. Any failure on the part of the reader to not take this post seriously is not my responsibility.

The Fifth Bottleneck

Wednesday, March 11th, 2009  

CodingHorror points out that the game of “find the bottleneck” that is computer performance optimization is always looking for a bottleneck in CPU, disk, network, or memory.

But there’s a fifth bottleneck — a fifth resource most applications wait on. The user.

If an interface is too difficult to understand, or if an action takes too many clicks or keystrokes, the application will be stuck waiting on the user. If an interface is really bad, the application will sit idle while the user is searching for “how to do X in ProApp 8.0,” or reading the manual, or asking their friends for help, instead of working. And the ultimate interface failure, when a user decides to stop using an application, means, from the point of view of performance, that it will never complete — it’s blocked forever.

Sure, a bad interface won’t slow down a computer. But it does slow the user down. And that’s why programmers care about performance – because we humans want to complete our tasks faster, not because we want computers to complete their tasks faster.

What isn’t new in Ruby 1.9.1 (or in Python)

Saturday, January 31st, 2009  

Like Josh Haberman, I was excited to see the changelog for Ruby 1.9, but immediately disappointed by its vagueness and terseness.

This list is meaningless to anyone who isn’t already familiar with the changes that have been happening in Ruby 1.9.x.

For someone like me who tried an older version of Ruby, there’s nothing to read that will tell me whether it’s worth checking out again.

Take this example, from the changelog:

  • IO operations
    • Many methods used to act byte-wise but now some of those act character-wise. You can use alternate byte-wise methods.

That’s terrifying. If I’m switching to a new version, I need to know exactly which methods have changed and which ones haven’t. Saying that “some” have changed is almost less helpful than saying nothing at all.

Here’s hoping that “improved documentation” will make it into a future Ruby 1.9.x release.

In the same blog post, Haberman makes some inaccurate assertions about Python’s encoding support:

Python has taken an “everything is Unicode” approach to character encoding — it doesn’t support any other encodings. Ruby on the other hand supports arbitrary encodings, for both Ruby source files and for data that Ruby programs deal with.

Incorrect. For the last five and a half years, since Python 2.3, source code in any encoding has been supported, and Python 3.0 will expect UTF-8 by default. And of course, Python supports exactly the same wide range of encodings for data. Python’s approach can best be described as “Unicode (via UTF-8) is default.”

Eight Python warts

Thursday, January 15th, 2009  

I love Python, but a few things still bug me about it. I’ve bashed on several other technologies; here’s some Python bashing. In no particular order:

Update: This has started a pretty good discussion on Reddit. Many people correctly guessed that I’m using singleton in the mathematical sense, not in the sense of the programming pattern. The comments from Cairnarvon and tghw are particularly worth reading.

How to avoid sounding like a total asshole when talking about the internet, even though you actually have no idea how it works

Monday, December 1st, 2008  

Alternate title: I know it’s time to get out of the house when I start exercising my license to practice linguistics.

The mainstream media and people who are not internet-savvy have radically different uses of the verbs log in, log on, click on, download and upload.

The common, internet-savvy definitions for log in/on and click on are:

log in, log on: to identify yourself and gain access to a resource on a computer by providing authenticating information.

click on: to press the mouse button down and then release it, while the mouse pointer is over a visible link, menu item, or icon or other resource on the computer screen.

The non-internet-savvy use appears to simply be shorthand for the verb go or visit. For example, an announcer on the evening news might say:1

For more details on this story, log on to our website at www….

No identifying information is needed to allow access to the news’ website; the intended meaning is simply to visit. The particle constructions click on and click on to also are occasionally also used with this same intended meaning:

For more details on this story, click on to our website at www….

For more details on this story, click on our website at www….

The news announcer wouldn’t be reading out the website address unless the listener needed to type it in first. Since the address isn’t going to be visible on the listener’s screen until they type it in, there’s nothing to click on. And once the listener is finished typing it in, they’ll be automatically taken to the site when they push enter. And in no modern browser is the website address ever clickable. So this usage of click on has the meaning of go or visit as well.

The internet-savvy uses of download and upload are deictic, like the verbs send and recieve, or come and go. That is, their meaning is dependent on the location of the agent performing the action:

download: to cause electronic data, files, or other information to move towards the agent, generally over a network.

upload: to cause electronic data, files, or other information to move away from the agent, generally over a network.

For example, consider two people, Alice and Bob. Alice is working from home and Bob is at the office.  If Alice is going to transfer a file from her computer at home to a computer at office, where Bob is, they would both use upload, because the data is moving away from the agent, Alice:

Bob: Can you upload today’s TPS reports?

Alice: Sure. I’ll let you know when they’re done uploading.

Download in place of upload would mean that the TPS reports2 were on the computer at the office, and Alice was transferring them to her computer at home:

Bob: Can you download today’s TPS reports?

Alice: Sure. I’ll let you know when they’re done downloading.

Similarly, if Bob were the one initiating the action of obtaining the information from Alice’s computer, he would use download, and if he were sending the information to Alice’s computer, he would use upload.3

The non-internet-savvy usage of download simply means transfer. It has no deictic component and encompasses the net-savvy meaning of upload and download, as well as that of simple copying:

Once this file is finished downloading to my Hotmail I’ll download it to you in an email.

I downloaded the photos from the CD to my computer and now they won’t load.

Unsurprisingly, upload does not seem to be present in the non-internet-savvy dialect; possibly because download encompasses its meaning entirely.

Although there are probably more differently-used terms, these four usage differences are more than enough for internet-savvy speakers to identify non-internet-savvy speakers.  Generally the internet-savvy speakers then make the same kind of assumptions that adults make about children who mimic and misuse words they’ve just learned — specifically, that the speaker has no idea what the words he or she is using actually mean, and therefore no idea what he or she is talking about.

Sounding internet-savvy is easy; then, even if you’re not. Use visit instead of log in/log on when the user doesn’t actually have to identify themselves to use a website or other resource.  Don’t use click on unless there’s actually something to click on. And think about where the data is going and who’s making it go there before choosing between upload and download. Before you know it, people will start asking you for help debugging their IPv6 firewall rules and recompiling their embedded Linux kernels.4

  1. That’s right, Fox News: read up and find out how real live internet users talk about it. 
  2. The type of report is irrelevant to these examples. 
  3. Unless, of course, Bob or Alice were downloading something illegal with BitTorrent or other peer-to-peer software, in which case they would say something like: d00d eye m d0wnlo4d1n9 t|-|is t0rr3nt @ th3 k-rad sp33d of 7.3KB/s!!!1!!1!  
  4. Don’t actually help them do these things, though. You might break something. 

Worst website ever

Tuesday, November 25th, 2008  

Zzzphone.com. Excessive use of flash. Unsolicited, over-compressed, auto-starting audio (on the specs page; there’s no surer way to get users to leave a site and never come back than automatically assaulting them with unexpected audio when the page loads). Random fonts and sizes, and graphics containing text. Best part: click on the shopping cart link at the bottom of the page (obviously without ordering anything first), and this comes up:

Step 3 of 6? Am I allowed to skip steps like that? Buy another nothing? Can I get a discount on that? (via Max.)

Oh, and in case anyone out there needed reminding that you can design complex, interactive, highly graphical sites in HTML without Flash, check out Shiftn’s Obesity System Influence Diagram.

Bad journalism

Wednesday, October 29th, 2008  

Consider the following two quotes from this BusinessWeek article by Susan Berfield. From the first paragraph:

He has Asperger’s syndrome,

On the second page:

Cohen never sought a formal diagnosis

If it’s not clear why this is bad journalism, replace “Asperger’s” with schizophrenia, or Bipolar disorder or with any other psychological or even any physical condition. How can the author of a story about a person living with something like Asperger’s, where the journalist admits that that person has never been diagnosed, be trusted? What other key details in this story come from single sources and make it into print unchallenged and unverified? How many other stories in BusinessWeek suffer from the same sloppy reporting?

When I worked on the bi-weekly underground student newspaper at UC Santa Cruz, our editors and faculty advisors in the journalism department would have flayed us alive for committing such an oversight.

Full disclosure: I also used to work for Bram at BitTorrent, but, unlike Susan Berfield and the BusinessWeek editors, I don’t consider myself qualified to comment on any medical diagnoses.

Helping fix Firefox and adapting to it at the same time

Thursday, October 23rd, 2008  

Hank Williams over at Why does everything suck? wonders how to best report an obscure and weird browser bug.  He’s got the right idea: save a static copy of the HTML, JavaScript, and CSS, and then strip out everything unrelated until you have the minimal chunk of code required to duplicate the bug. You can speed this up by making an educated guess at which code is related, and doing a binary search from there. Include the minimal test case with the bug report, and the Firefox (or WebKit) team will love you for finding a test case for such a wacky, rare bug.

I’ve done this a few times for Mosuki, and each time, I discovered a workaround before I got to reporting the bug. Each time, the bug was either the result of a interaction between components I thought were unrelated, or I found some other method of achieving the same result that didn’t trigger the bug. (Of course, it was usually an Internet Explorer bug that I was fixing, so there was no way to report the bugs.)