Finding the Site
David Dorn examines how we use the WorldWide Wait/Web and ponders the usefulness of search engines
There was once a news item on UK websites being 'invisible' to search engines and it got me thinking a little about how I use the Web and find information. It struck me that search engines may not be the ideal tool. They should be, of course, but in a world where two million new sites enter the fray every month, it's fairly unreasonable to expect the engines to keep up on a day by day basis... or is it?
I think it's all down to the way they work. These days, search engines work on the keyword principle. Those keywords, though, have to be entered into each page manually (in real terms), as part of the gubbins that you, the surfer, don't see - the HTML Header code - as a tag. The search engine will find a site - usually as a result of its address being entered into the database by its webmaster - and will, at some point, follow links. At each page, it reads the Keywords tag, and list the keywords in it.
What happens next depends on the engine, but most compare the keyword list with the text that appears on the page, make a judgement on keyword relevance, and then rank the page accordingly in its database for each keyword. Sounds complicated? It is - and not only for the search engine, but also for the page author.
There are websites and newsletters out there that give advice on how to configure your page for maximum impact to search engines - and software to help you do it - but I can't help thinking that it's all pretty unnecessary, unwieldy and not particularly helpful to the surfer.
Let me tell you why. I spend quite a lot of time on the web, quite often researching a particular topic, and I do a lot of web searching. Generally speaking, I'll use a search engine to get the first step under way. Frequently, I've followed links from a site highly ranked on the engines, but which is, frankly, not very good, and navigated to a superb site, crammed full of information on the topic I'm researching.
A quick check back to the search engine will show the good site as either not being there at all, or it appearing on page 113 of 115 - way too many clicks down the list to be of immediate appeal to most people. Yet, as I've said, it's the bee's knees - the seminal site on the subject.
Why is this? It's because the author has spent too little time on his keywords and search engine optimisation, is why. Too little? My feeling is that s/he shouldn't need to spend any time on search engine optimisation, and here's why:
If the search engines are capable of comparing keywords to body text for relevance, then they must have an algorithm for determining what the keywords should be. If that's the case, then the search engines should rank according to content - perhaps on the frequency of key words (note the space there!) in the body text. If the word 'Fuchsia' exists on the page, and appears 35 times, the chances are that the page is about fuchsias. If it also has 'Mrs. Popple' and 'Standard', 'Bush' and 'Trailing', then the chances of it being a page on fuchsias are very high.
It's possible for the engines to maintain a list of related words against which to test for relevance, like the example above. The American Fuchsia Society, for instance, maintains 2000+ Fuchsia variety names - it would cause no great problems for a search engine to read them all in and link them to the keyword 'Fuchsia' as a cross check. Then, when it does its crawling around sites, it can assess the relevance of the site to the most repeated key word.
The technicalities aside, one other thing that would gall most people is the tendency for search engines to rely too heavily on this keyword relevance search. For instance, on my 'Fuchsia' search, while AOL's Netfind correctly brought up the America Fuchsia Society home page as its number 1 site, AltaVista UK gave me a guest house with 'Fuchsia' in its name! I guess their keyword tag must have fooled AltaVista.
It does make you wonder whether some of the search engines are really worth using, doesn't it?


