Tag Archives: Alphabetizing

Followup, Of Sorts

Many Twitter responses to yesterday’s post about sorting problems:







In the end, it’s probably best to remind ourselves that the whole concept of “order,” as we perceive it, is merely a layer of artificial augmented reality that our brains came up with. If we realized that everything’s that’s ever happened and every will happen is random, and that there’s no plan for anything in the universe, we’d probably all just freak right the hell out and forget to eat those sugars and proteins that our brains like so much.

These Tweets and others jarred a memory loose. I actually did solve my comic book database’s sorting problem, by creating multiple “Title” fields to serve multiple purposes. The record for an issue of “Avengers West Coast” had:

  1. A field containing the title as it should appear when printed (“Avengers West Coast”)
  2. A field containing the title as it should appear in a super-condensed multi-column report (“AVENWC”)
  3. A field used exclusively for sorting purposes (“West Coast Avengers”)

Yes, this was another book that Marvel renamed midway through its run. But by sorting on the third field, Issue #47 of Avengers West Coast Volume 2 naturally followed issue #46 of West Coast Avengers Volume 2 without any trouble.

I was quite proud to have figured that out. It also informs my philosophy that sometimes, we delay the discovery of a great solution to a problem by insisting that it be simple and elegant. In Game Boy terms: we’re playing Tetris, and we’re so determined to set up a move that clears four rows at once that we pass up dull, but effective, drops of one row or two. I remember the moment late at night, in front of my DOS machine (hey, it was the late Eighties), when I realized that all of this extra Turbo Pascal code (hey, it was the late Eighties) I’d been writing to compress title strings and alias one title onto another just a big waste of my time and the CPU’s.

This is making me nostalgic for the days when I had some sort of huge programming project to work on..code that I build, maintain, and enhance for years and years, just for my own use and my own pleasure. My first big project was an Apple II operating system. My second was the comic book database. My last one was the CMS and desktop app that ran my blog from the mid-Nineties until 2007 or so. Since then, all I’ve done are little one-off AppleScript, Ruby and Perl scripts to help me finish simple, repetitive tasks quickly.

Maybe this is my way of nudging myself to start to learn Swift in earnest, beyond the usual “Jackdaws love my big sphinx of quartz”-style test app I build when I need to test a language or a development environment.

iOS app development has always intimidated me, though. It feels like I shouldn’t look back after a year’s worth of effort and only then realize that I could have made a million dollars if I’d spent that time writing a different kind of app.

A Less-Definite Article

“The.” Such a meaningless word. Such a cause of trouble for those of us who rely on the alphabet.

Take a look at my iTunes library. What’s the name of the band that originally recorded “Hey Jude”?

You say you know. I’m telling you that you don’t. I always have to take a guess at it…if I’m looking for it in my iTunes library. Thanks to the plurality of early iTunes users who submitted CD track listings to the CDDB while stoned, the Beatles catalogue is split between three different bands:

“The Beatles”



“Beatles, The”.

And why, yes, this does completely **** up browsability! I’m forced to weed these things out and fix them manually.

The “the” problem screws lots of things up. It’s “Spider-Man,” not “The Spider-Man.” But purists will insist that it’s supposed to be “The Batman.” And as big a fan of this band as I am, I’m not 100% sure if it’s “Foo Fighters” or “The Foo Fighters” until I consult a canonical source.

(Usually, I call up Dave Grohl. “How the hell did you get this number?” he shouts. “You know perfectly goddamned well that the court order forbids you from ever calling me or any other member of Foo Fighters!” And there’s my answer.)

All of this is simply part of our daily burden as free-thinking members of this planet’s alpha species. It’s on my mind tonight thanks to a conversation that Marco Arment has been having on Twitter about how his lovely new podcatcher app sorts show titles.

I don’t think that’s the right way to go. I’m looking at my list of podcast subscriptions and I reckon that by this scheme, about a third of the shows I regularly listen to will be clumped under “T.” I reckon that this is why so many music apps (like Google Play Music) will display “The Beatles” but sort it as though the band name starts with “B”. I reckon if I use the word “reckon” a third and fourth time and point that out, it’ll sound like I used it over and over again to be entertaining, when in truth, I was just lazy.

If you’re a stickler, you could just say “common rules of indexing command that ‘the’ be treated as though it were the last word in a business name or title.”

(Go on and check out the Chicago Manual Of Style’s Q&A page about alphabetizing. It’s a hoot, and reminds me that if I took a job as a librarian, I’d last about three weeks before I shot myself in the foot to get a discharge stateside.)

But all of this skips over the real point, when designing software. Rules should be damned: the choice just has to make sense and it has to be consistent. The developer needs to ask “where will people expect to find ‘The Beatles’?” and act accordingly.

At some point, he or she just has to make the choice that feels right. Then, send baby out into traffic and see how well that choice works.

This is a good example of what I think of as a “big endian/little endian” problem. These terms have nothing to do with how data is stored in address space; I’m referring to the original Jonathan Swift idea of a society in which people who slice open their hard-boiled eggs from the little end can’t understand the people who slice them open from the big end because, obviously, their way is totally the right way to do this. The other way seems so bizarre that those people might as well be of some other species or something.

So: you can argue endlessly about the “right” way. But it’s almost (not quite) an arbitrary choice. By trying to satisfy people who will never agree with the “other” way of doing things, you’ll just screw things up for everyone. It’s best to just have a point of view and stick with it until user feedback makes you second-guess your choice.

Alphabetizing things will never work smoothly, anyway.

John Hodgman and Jesse Thorn refer to their show as “The Judge John Hodgman Podcast” on-mic, but it’s canonically listed without the definite article. Where do I find Elvis Mitchell’s swell entertainment interview show, “The Treatment“? Is it under “T” for “Treatment,” or “T” for “The”?

Trick question! Both words start with the letter “T”!

AHA! DOUBLE trick-question! Because it’s listed as “KCRW’s ‘The Treatment'”!

My mental eye paints White-Out over the “The” in a podcast title almost every time. But I never think of my favorite podcast as anything other than “The Bugle.”

This sort of problem goes way, way back. When I was a kid, Marvel Comics inflicted the first of what would become a decades-long string of abusive editorial decisions by renaming the comic “Peter Parker, The Spectacular Spider-Man” as “Spectacular Spider-Man.” Well, crap. Now where do I file these issues? I checked the indica. Marvel didn’t start a new numbering scheme and this was still Volume 1.

Nerdy kids who grew up in the Eighties are united by two traumatic events that affected all of us: “Spectacular Spider-Man,” and the Challenger disaster. I am convinced that if I were to send a one hundred item questionnaire to 500 comic book fans, the answer to Question 1 (“How did you choose to sort your Spectacular Spider-Man comics?”) would let me predict the answers to many questions about the respondent’s views on politics and ethics, after all 500 sets of answers were submitted to proper analysis.

(My introduction to formal data structures came when I wrote an app to keep track of my comics. Through high school and college, I solved so many problems and added so many features to it. But I never figured out an elegant way to handle a comic that runs for 131 consecutively-numbered issues across three or four titles.)

What I’m saying is that alphabetizing things is a big mess…maybe the biggest mess there is, if ranked as a ratio of “how difficult this problem is” to  “how difficult it appears to be.” I always expect, and hope, that “the” is invisible for sorting purposes…but I can forgive a developer for doing what makes sense to him or her.

All of this reminds me of a brilliant name for a band, which I came up with when I was a teen: “Miscellaneous M.” It guaranteed that the band would get its own divider in every store’s CD department even if it only released one album. The only way this scheme could possibly fail would have been if the entire market for physical media were to collapse over a short period of time.