On March 16th 2006, Matt Cutts chronicaled the Big Daddy infrastructure roll out at Google. Early results of the crawling/indexing upgrade debuted in January and was completed by the end of March 2006.

Matt characterizes Big Daddy as…

more comprehensive (by far, in my opinion) than the previous crawl/index.

Yet, Big Daddy refuses to index certain sites in their entirety based upon how many and what type of links they have. How is this more comprehensive?

I’m rather frustrated about reading how the Big Daddy timeline and functionality has proceeded as expected, as I watch pages that show up in the regular index one day and the supplemental index the next. Back in the regular index and then back into the supplemental index again. Is this how Big Daddy is supposed to work?

Matt’s synopsis…

CrankyDave, the supplemental results are typically refreshed less often than the main results. If your page is showing up as supplemental one day and then as a regular result the next, the most likely explanation is that your page is near the crawl fringe. When it’s in the main results, we’ll show that url. If we didn’t crawl the url to show in the main results, then you’ll often see an earlier version that we crawled in the supplemental results. Hope that helps explain things. BTW, CrankyDave, your site seems like an example of one of those sites that might have been crawled more before because of link exchanges. I picked five at random and they were all just traded links. Google is less likely to give those links as much weight now. That’s the simple explanation for why we don’t crawl you as deeply, in my opinion.

What exactly does how often a page is crawled have to do with what index it belongs in? What exactly do links to a site have to do with what index a page belongs in?

Matt went on to say…

I picked five links to the domain at random and they were all reciprocal links. My guess is that’s the cause. I mentioned that example because CrankyDave still has an open road ahead of him; he just needs to concentrate more on quality links instead of things like reciprocal links if he wants to get more pages indexed. (Again, in my opinion. I was just doing a quick/dirty check.)

I see. Sites that choose to exchange links with good, relevant sites will be “punished” by having fewer pages indexed.

And in his opening post, Matt refers to a Health Care directory…

– Some one sent in a health care directory domain. It seems like a fine site, and it’s not linking to anything junky. But it only has six links to the entire domain. With that few links, I can believe that out toward the edge of the crawl, we would index fewer pages.

Let’s see, only 6 links so this site gets “punished” too.

Quite frankly, I don’t find this type crawl/index more comprehensive. I consider “penalizing” sites because they have too few links or happen to exchange links with highly relevant sites, by refusing to index all the pages of the site, a serious regression. One link should be enough to get any site crawled and indexed. How you rank them is another matter altogether. Which incidentally, brings up another question. How can you penalize a site for having too few links or reciprocal links and still pass on the ranking advantage of those links? How exactly does this make sense?

By intentionally penalizing in this manner, Google is forcing webmasters and site owners to force linking just to get their site crawled and indexed. This is lunacy.

PhilC over at the SEO Forum has a terrific perspective on this called The Madness of King Google that’s well worth the read.

Google has decided to deprive searchers and site owners by refusing to index good and useful pages based upon the number and kind of links they have. How is this a good thing?