THE DREADED T-WORD, OR, WHY DOESN’T GOOGLE KNOW HOW TO CLASSIFY BLOGS?

In knowledge management it is referred to as “the dreaded T-word”. It’s taxonomy, which in short means the classification scheme by which information content is organized and browsed. We’ve come a long way from the Dewey Decimal System (or its successor, the Master Reference File) and the Library of Congress classification scheme.

The Open Directory project (ODP), whose taxonomy is used by all the major search engines, and which is managed by a huge horde of volunteer taxonomists, now has broken down everything on the Internet into 17 meta-categories, shown below, and 100,000 sub-categories. Pragmatically, all non-English-language content is, for the time being, grouped under the World meta-category, all content specifically certified as safe for all ages is grouped under the Kids & Teens meta-category, all content deemed unsuitable for children is grouped under the unlisted Adult meta-category, and all content that is of parochial interest is grouped under the Regional meta-category. The sub-categories under World and Regional mirror, to some extent, the meta-categories for the ODP as a whole. Most meta-categories have several layers of sub-categories, allowing you to hone in on precisely what you’re looking for. Because some of these sub-categories are recursive, this is to some extent a classification network (a ‘semantic web’) rather than a linear taxonomy.

Weblogs have proven to be one of the most perplexing taxonomy challenges for the ODP. They’re scattered all over the place. Many of them are listed under Computers:Internet:OnTheWeb:Weblogs:Personal, but quite a few are listed under Society:People:Personal Homepages or Arts:OnlineWriting:Journals:Personal. This blog, How to Save the World, has its Business category listed under Reference:Knowledge Management, but my other categories, Environmental Philosophy, Blogs & Blogging, Politics & Economics, Arts & Sciences and Creative Works aren’t indexed in ODP at all. Blogs as far-ranging as mine without categories are even more challenging to index. Fortunately, the vast majority of web users use search engines, not catalogues, to find information, so this problem isn’t yet of great consequence: Many uncategorizable blogs get a lot of search engine traffic, and a significant proportion of that traffic stays to look around, and sometimes those visitors even find what they were looking for, and/or become regular readers. For the majority of us, however, most of our readers are regular readers, and most of them were initially referred by another regular reader. Few readers come via catalogues, and until an innovative mechanism is devised to allow people to browse weblogs, even uncategorizable ones, the same way they browse for books in a bookstore, that’s unlikely to change.

Arts
Movies, Television, Music
Business
Jobs, Real Estate, Investing
Computers
Internet, Software, Hardware
Games
Video Games, RPGs, Gambling
Health
Fitness, Medicine, Alternative
Home
Family, Consumers, Cooking
Kids and Teens
Arts, School Time, Teen Life
News
Media, Newspapers, Weather
Recreation
Travel, Food, Outdoors, Humor
Reference
Maps, Education, Libraries
Regional
US, Canada, UK, Europe
Science
Biology, Psychology, Physics
Shopping
Autos, Clothing, Gifts
Society
People, Religion, Issues
Sports
Baseball, Soccer, Basketball
World
Deutsch, EspaÒol, FranÁais, Italiano, Japanese, Nederlands, Polska, Dansk, Svenska

Interestingly, Amazon uses a different taxonomy for books, perhaps reflecting the greater proportion of fiction versus non-fiction in bookstores compared to the Internet (no smart remarks please):

Arts & Photography
Audiocassettes
Audio CDs
Audio Downloads
Bargain Books
Biographies & Memoirs
Business & Investing
Children’s Books
Christian Books
Comics & Graphic Novels
Computers & Internet
Cooking, Food & Wine
e-Books
Entertainment
EspaÒol
Gay & Lesbian
Health, Mind & Body
History
Home & Garden
Horror
Literature & Fiction
Mystery & Thrillers
Nonfiction
Social Sciences, Law, Politics, Current Events, Philosophy
Outdoors & Nature
Parenting & Families
Professional & Technical
Reference
Religion & Spirituality
Romance
Science
Science Fiction & Fantasy
Sports
Teens
Travel
Women’s Fiction


When I was young, I used to love to wander the shelves of libraries at random, browsing subjects that serendipitously came to my attention that piqued my interest. A typical male, I never consulted the card catalogue — I knew I’d find what I was looking for eventually, and I was convinced that finding what I wanted was a prelude to disappointment: The wanted volume would inevitably be checked out by someone else. As I’ve grown older, my random walks moved from small libraries to large libraries (in universities, still a decadent pleasure) and hence to large bookstores (where I can spend a half day without noticing the passage of time). Now that I buy books online, and maintain long lists of “to buy” books (mostly suggested by readers like you), I’ve found browsing bookstores both frustrating (the selection is terribly limited, and inventory management atrocious, except for the new breed of giant independents like Canada’s McNally Robinson), and unnecessary. But I miss it. My favourite fiction writers have all been discovered by browsing.

All right, to see if you’ve been paying attention, here’s your homework assignment:

  1. Where do you think weblogs should be put in the ODP taxonomy? Since everything under the sun is called a weblog these days, use this definition (from whatis): A weblog is a Web site of personal or non-commercial origin that uses a dated log format that is updated on a daily or very frequent basis with new information about a particular subject or range of subjects. The information can be written by the site owner, gleaned from other Web sites or other sources, or contributed by users.
  2. What subjects, from the top 2-3 layers of the ODP taxonomy, would you like to read more about? I’ll start the ball rolling with these suggestions:
    • Art: prehistoric, primitive, aesthetics
    • Literature: from exotic cultures and rarely-translated languages; writing humour
    • News: investigative reporting, news and in-depth analysis from countries outside English-speaking world, first-hand accounts
    • Home: home improvement, entertaining, interior design
    • Currently Uncategorizable: environmentalism, the education system, political system reform, philosophy of science, sex education, cultural studies, demographic studies, consumer education, the learning process, creativity, psychology of stress, proxemics, perception, sensation and synaesthesia, cognition

If you’ve written about any of these subjects, please point me to your posts.

I realize that some of the subjects in my Currently Uncategorizable list do appear in the ODP taxonomy, but they’re like square pegs in round holes — they don’t really fit where they’ve been put, suggesting to me the taxonomy is incomplete and counter-intuitive in places. The ODP says they’re open to adding categories and changing the taxonomy — anyone want to take them up on it?

Now you know why it’s the dreaded T-word. No wonder Dewey went nuts.

This entry was posted in Using Weblogs and Technology. Bookmark the permalink.