googleI have a mystery to solve. Up until last August, this blog was averaging about 450 hits per day, of which about 20-25% came from Google. But then suddenly, Google stopped crawling How to Save the World, except for a very few pages and some of my Stories posts. Since then, while my daily hits have risen to about 700 hits per day, the percentage from Google has steadily dropped, and now account for only 5-10% of my traffic. And virtually all of these diminished number of hits point to posts before last August.

In addiiton to costing me a couple of hundred serendipitous visits per day, the lack of Google indexing is aggravating for those looking for things in my archives. And the search bar in my right sidebar is only catching pre-September posts. Besides, lots of other search tools are also powered by Google.

Here’s a couple of examples. I’ve written two posts on parrots. One was on Alex, the gray parrot, on Nov.12/03, and the second was N’kisi, the gray parrot, on Feb.1/04. If you Google “grey parrot” you’ll come up empty, at least as far as references to my blog are concerned.

A second example: I’ve written two articles about the work of Hendrik Hertzberg: One on Liberal Radio on Aug.9/03 and the second on Unstead State on Jan.31/04. Google search returns the first of these — pre Sept.03, but not the second.

The irony is that the Google results include other bloggers’ references to my newer post on Hertzberg, but not my post.

Aalia Wayfare, who fixed my problem with the gap in the middle column of my permalinks, suggested I add some metatags in my home page, which I’ve done. It hasn’t helped:

And Robert Scoble says it’s illogical that someone with 350 inbound blogs isn’t getting spidered by Google.

So what’s the answer? Is Google deliberately omitting How to Save the World hits because I’m so prolific and perhaps drawing traffic away from other sources — was I too successful in getting Google traffic and hence “cut off”? Or this there a more innocent, technical explanation. I offer a modest reward, plus deepest thanks and publicity for your brilliance, to the first person who can solve the mystery.

This entry was posted in Using Weblogs and Technology. Bookmark the permalink.


  1. meg says:

    Dave, it’s because you’re Canadian, you poor kid. Google only gives me kooks, like people looking up “very short arms woman”, “what does a “clint” (yes, clint) look like” and my favourite, “man crapping in litter box”. Keep saving the world, and the hits will come.

  2. Philip says:

    David there is something wrong with your web design. If you look at the google cache of your site the only thing google finds on your web site is your blog roll. would suggest that you did something in the past that renders your content invisible to Google. I would consult with Lawrence Lee at Radio Userland perhaps send out a plea to Rogers Cadenhead. It is probably as simple as a screwed up html tag with all the tables in your HTML it’s a wonder how you keep it all straight. Does your site validate as valid HTML? Do you maintain a change log? When you modify your code it is a good idea to note the changes with date and time. That way if something strange comes up you can go back in time and look at the changes you made. Seems to me you know about when Google quit knocking (my bet is it didn’t quit knocking it just isn’t seeing any content). What did you do to your blog then? There lies the culprit.

  3. I think you should self-destruct in 15 seconds.

  4. I think the fault is your own1. Valid markup. http://validator.w3.org/check?uri=http%3A%2F%2Fblogs.salon.com%2F0002007%2F&doctype=%28detect+automatically%29&charset=iso-8859-1+%28Western+Europe%29 should give you some hints. (This might be the sole problem Google has with indexing your content)2. Semantic markup. Google prefers h1-h6 over font-size+b. Google also prefers simple markup. Drop the tables3. Content before navigation. Keeping all but the most important navigation after any content in the source helps.Yes. This means that you’ll probably have to do an extensive redesign.

  5. You’re over the size limit for frequent indexing. It’s about 110 K. If you look as to when Google stopped frequently indexing you, it’s when your files went over that size.

  6. Ken Hirsch says:

    3. Content before navigation. Keeping all but the most important navigation after any content in the source helps.Your page is 166,453 bytes. That’s enormous! Users on dial-up modems are going to avoid your page altogther–it could take 40 seconds to load. By contrast, Yahoo’s home page is 34,000 bytes. When I went to Amazon.com, I got a page of 86,000 bytes (and I wouldn’t be surprised if it tries to detect your connection speed and give a smaller page to slower users).Your new content doesn’t start until 140,000 bytes. I wouldn’t be surprised if Google just stops at 128K.Google has no problems with tables. It ignores almost all markup.

  7. The solution is to redesign so your posts are under the limit, shoot for < 90 K to be safe.If that works – I love publicity for my brilliance :-).

  8. Correction “110 K” should have been “101 K”. If you look at the results ofhttp://www.google.com/search?q=allinurl:blogs.salon.com/0002007&num=100&filter=0Many of them have hit the 101 K limitAnd if you page down that list, you can see the effects of what Ken Hirsch said – Google thinks many of your pages are duplicates with identical comment, since it’s stopped at 101 K :http://www.google.com/search?q=allinurl:blogs.salon.com/0002007&num=100&hl=en&lr=&ie=UTF-8&start=400&sa=N&filter=0

  9. snapmikey says:

    DaveDid you read Google

  10. Ken Hirsch says:

    The biggest problem is the list of links on the left side. You are wrapping each individual link in table tr td table tr td div. You could cut 100K (60% of the total) by changing this to just the BR tag. That 60 second fix will get you going again, but you do need to spend a while redesigning your page for the long term.

  11. This is all very interesting. Since simplifying my layout and optimizing my HTML (and making sure it validates) I’ve been getting more traffic from Google.So, that all makes sense!

  12. Dave Pollard says:

    First of all, thank you all for what must have been some time-consuming research. All this is extremely interesting, but for someone who just uses Radio themes and doesn’t generate any of my own code, it’s a little intimidating. Here’s what I plan to do, in order — let me know if you think this is fair and logical:1. I’m going to copy Lawrence Lee at Radio and tell him what the problem is and what I’m proposing to do. After all, since I’m just doing what Radio tells me to do in its manual, any prolific Radio user who selects this theme should be having the same problem, and that means Radio needs to do some fixes at its end. If he says OK, then:2. First thing I’m going to try is the quick fix of swapping my left and right sidebars, since the left sidebar alone is consuming some 70% of the code of my posts. If it’s just a case of Google indexing only the first 101k of each post, then that should fix it and I then owe Philip and Arve a debt of gratitude. If this blog suddenly disappears in the next day or so, it’s because I screwed up the change, so stay tuned. I suspect I’ll have to republish my entire blog to get these changes to apply retroactively to all my older posts, so Google picks them up. 3. If that doesn’t work, or isn’t enough, the second thing I will do is to revamp my blogroll to change it from tables to a simple list, cutting out a huge amount of blogroll code. I would do that first, but Radio uses a macro to code the blogroll, so this is going to need Lawrence’s help to override the macro. If that does the trick, I’ll owe Seth and Ken a debt of gratitude as well.4. If none of the above are sufficient, then I’m going to have to think about giving up on Radio and start using a tool that doesn’t generate all the invalid markup, and allows me to select font type and size. I’m praying it doesn’t come to that. This is a hobby for me, and if I have to get that deep into the technology underlying publishing my little daily column,I’m going to have to seriously question whether it’s worth it.So here we go. Contacting Lawrence now, and saving a backup of my Home Page as I start to fiddle with the column layout. Keep your fingers crossed for me! And thank you all again — your quick and thoughtful response is greatly appreciated. /-/- Dave

  13. Jim Byrne says:

    The problem is unlikely to be anything to do with your design – more likely to be the changes Google made to their search algorithm. It happenned to our site as well – traffic from Google to our Glasgow West End community website halved due to the Google changes. We have a huge amount of content and we are the main information site for the area – but type in Glasgow West End and you find the local fencing club with their one page, before you find us. Do a search for ‘Google changes cause uproar’, or ‘Google changes killed my business’, or similar phrases. Or more to the point – try ‘Florida update’ – and you will find out why lots of people are mad a Google.All the best,Jim

  14. Dave, you can use my templates as a starting guide. I designed them to be extremely Google friendly. My referers page shows they work.http://radio.weblogs.com/0001011/template.zipAlso, it’s one reason I only display the past two days of content. That makes my page about 60K complete.

  15. Dave, I was in the middle of optimising that left hand table, but all of a sudden you went all right-handed on me :0) I had got the table down form 211K to 38K.I still think you could shave about 50% of the page size by optimising that links table somehow. If the Radio macro is at fault then perhaps it could be replaced by a table that you’d have to maintain by hand.I think that Jim is right though as I’ve seen Google index even larger pages in the past. With Google’s acquisition of Blogger last year you have to be just a bit suspicious when this sort of thing starts happening to other blog formats.The changes are looking good and a lot of the validation errors have gone now. (Better go and fix all the ones on my site now LOL)I’m still puzzled by the usage of the Expires tag this on your site, Jim’s and Robert’s : <META HTTP-EQUIV=”Expires” CONTENT=”Mon, 01 Jan 1990 01:00:00 GMT”>Doesn’t this mean that after that date search engines should delete your pages from their database? I know that sounds like it might encourage search engines to come back but maybe it would be safer to get rid of that tag and rely on the “Revisit=-After” Tag.

  16. mscandide says:

    I googled “gray parrot” pollard and you were the first hit on the results page.

  17. mscandide says:

    Just “gray parrot” got me this page, so it’s been googled already. It was about 175th.

  18. Dave Pollard says:

    Thanks. Unfortunately, the link it picks up is the one complaining that the articles actually about grey parrots aren’t being picked up by Google. So Google is at least picking up new posts in the last couple of days, but since I republished all the old articles on my blog with the new format, I’d hoped Google would go and spider them too. I’ll wait a bit longer and see if that happens.

  19. To Jim Byre: I’ve analyzed the Florida Update, it has to do with spam filtering. See my report:http://sethf.com/anticensorware/google/bayesian-spam.phpThat’s definitely your Glasgow problem. But it affects *rankings*, not *whether indexed*.Sites which “look spammish” are penalized within the order of search results.Per above, the problem here is *indexing*, the pages aren’t being indexed in the first place.

Comments are closed.