bluebloomer blog

Bluebloomer – Not a pair of pants

Faster Faster!

leave a comment »

Yesterday I decided things were running a bit slow. When building search engines there are a few factors which really affect the speed.

  1. Word table size (all distinct words occurring in all pages)
  2. Page word table size (all words appearing on a page)

Bluebloomer’s tables were getting huge. So the last few days I’ve been optimizing.

One quick way of shrinking both tables is by using a more aggressive stop word filter. This filters out more commonly used words. Easy enough. Cool.

Bluebloomer had been storing word positions. The reason for this was to aid in phrase finding. This resulted in a massive page word table, not good. Yes, the phrase finding was perfectly accurate, but the cost was too high. So instead of being truely accurate, Bluebloomer now guesses which pages contain particular phrases and how important those phrases are to a page. The result is a greatly reduced page word table and faster search speeds.

So far that’s as far as I’ve got. I think the lesson here is that compromises must be made (unless you’re Google). Don’t let perfectionism get in the way of speed and usability.

Written by bluebloomer

June 16, 2009 at 1:00 am

Leave a Reply