Dabble DB

The Dabble Blog

Archives: May 2005

« April 2005 · July 2005 »

Which Incremental Paths?

Incrementality. It’s been springing up all over the place in our posts, a trend I’m certain will continue. This being said, I think it’s useful to throw in a measure of caution: like all tools, incrementality is no silver bullet.

As a developer building software systems, I’m skeptical about any design approach that isn’t incremental. However, this is very different from being happy with any incremental path to a solution. Not all incremental paths are created equal. In fact, I’d claim that one of the more important skills possessed by expert designers in any field is the ability to distinguish good incremental paths from bad ones. How does one do this? Of course, there isn’t going to be a simple, comprehensive answer to this question, but I believe one necessary component is a strong awareness of the major decisions at play: in particular, it is very important to know which ones will be difficult to change later, the reasons why you might want to change them, and how likely it is that these reasons will come into play.

Let’s look at a dangerous example of this in the data management space: deciding on the amount of structure you’d like to impose on your data. By “structure” here I don’t mean presentation structure; rather breaking the data down into smaller parts, with the same kind of data always broken down into the same kind of parts: for example, the items in a to-do list or first, middle and last names for a person. Deciding on structure is particularly dangerous because totally valid short-term concerns can conflict with and potentially overwhelm totally valid long-term concerns. In the short-term, there are some strong forces pushing against the imposition of structure. Doing so certainly requires more thought (what exactly are the boundaries for different data components?). If you have any legacy data that needs to be incorporated, especially un- or semi-structured data, this complicates things all the more. Being a good, lazy incremental citizen, it feels like you should probably just impose a minimum amount of structure now, and worry about increasing it later. Unfortunately, this is an extremely difficult decision to change further down the road. Once you are managing even a moderate amount of data, automating migration to a more structured format, which typically means trying to parse arbitrary text, is extremely difficult, if not impossible–if it isn’t, that means it probably wasn’t actually that unstructured to begin with. Usually, you are left to either expend a huge amount of manual labor or to just take a pass on the benefits of structure.

Clearly, just “going incremental” isn’t enough here; some further help is required. For one thing, it’s helpful to be able quickly and fairly painlessly distinguish cases where structure isn’t particularly useful from those where structure might be useful. I usually find that this has to do with the overall “scope” of the data: How many people will look at it? How long will it be of interest? Will people want to look at this data in multiple ways? Strictly speaking, it is largely an affirmative answer to this last question that begs structure, but increased numbers of people looking at data over an increased period of time tend to make the desire for multiple views more likely. So, if structure isn’t necessary, great, paste it into an email to your Gmail account and it will be available and searchable for as long as you desire.

Even in cases where it seems structure might be useful, I don’t believe things have to be so bad, provided data management systems don’t make schema/structure definition too big a deal. There is usually more than one way to be incremental. Maybe you need structure, but that doesn’t mean you need to go off and do a requirements analysis and build a complete schema. If the system allows you to simply add a few useful slots for your data, stick some data in them, and repeat as necessary, you can get both the benefits of structure and incrementality.

What we don’t want data management systems doing is encouraging users to just thow in their data in an unstructured manner, with a vague promise that they can simply add structure later–doing so will require an incredible amount of work. Stepping back to the broader question of different kinds of incremental paths in general, those of us building data management systems need to try to subtly guide our users down the good, productive incremental routes, and nudge them away from the dangerous ones. Even expert designers appreciate time-tested patterns.

Equals-Sign Moments

It’s always interesting to watch what the guys over at 37signals are up to. They’re on the verge of releasing their new Backpack application, and one of their previews explores a set of ideas that I’ve also been finding myself increasingly intrigued by.

The specific feature that I’ve been thinking about, and that Backpack seems to leverage nicely, is the use of email as an input method. This seems like an odd idea: isn’t it a lot clunkier to send an email to an application, even a web application, than to use some custom form within the application itself? And yet, even without something like Backpack available, I find myself using email as a quasi-database all the time: to make lists, send myself reminders, record important snippets of information, even log hours spent on a project. Why? First of all, because the email client is the one window that I will always have open and easily accessible, whatever else I might be doing. Second, email is a great redundant data source: as well as going to whoever (or whatever) I send it to, a copy of the message ends up on my local machine, another on my servers, maybe yet another on Google’s servers, and so on. If I need to, chances are I can find it later from pretty much anywhere. But probably most importantly, email is, for many of us, the main personal input stream: if some new piece of information comes up that I have to record or take action on, chances are pretty good it came to me through email. So what more natural way to respond than right from the email client?

Because of all of this, it’s become somewhat common for blogs and wikis to accept emails as unstructured data. What Backpack does that’s interesting is accept them as structured data. Given enough context - from the subject line, say - and a few simple conventions, it’s possible to extract pretty reasonable structure from the text of an email message. Backpack scans (much as Word does, actually) for text that looks like a list of items and makes it into a structured to-do list. A contact manager might use something like SBook5’s parsing to build a structured address book record. A generic database application could look for one of its field names at the start of a line and take the rest of the line as the value. It’s crucial that the application also provides more direct ways to do the same thing, but there’s something immensely satisfying about firing off a quick email and having it show up as “real” data rather than just text.

The more general idea here is the use of freeform text as a way to reduce the complexity but increase the power of a user interface. With a good parser, for example, a freeform date input can be much nicer to use than any calendar widget. Or look at Google. A single text input lets you put in not just search keywords, but also stock ticker symbols, unit conversions, tracking numbers, and no doubt all kinds of other things that I don’t know about. The key here, as always, is incrementality. As a new user to Google, I can just type some keywords into the search field and get back reasonable results. Only when my needs get more sophisticated do I have to even be aware of the wealth of other features available. Hiding these options in a textual syntax, rather than building complex forms specifically for them, may seem like a regression to the command line, but it’s also a great way to reduce clutter and let your user experience scale smoothly.

And since this blog is somewhat spreadsheet-obsessed, it’s worth pointing out that Excel does the same thing. There’s no special mode or dialog you have to access when you progress from entering data into cells to creating formulas, you just have to figure out to use the “=” sign at the start. The moment where a spreadsheet user first learns to do this is a critical and somewhat magical one; what other equals-sign moments can we build into our software?

« April 2005 · July 2005 »