|
February 9, 1998
Web Site Administrivia
[I've been doing a lot of work on my web site, and thought I'd share
some of the problems and products I've been working on during this process.
Some of this stuff may be old hat to a few of you, but there's some
new stuff here as well. -- ed.]
Framed No More
When http://www.ehsco.com/ first came on-line, I made
heavy use of frames. However, as time has passed, I've come to the realization
that frames just don't work, although not for the reasons you might expect.
The biggest problem with frames is that they aren't compatible with
many HTTP clients. Duh. I'm not talking about browsers, though. Most
late-model browsers are frames-compliant (and most users are running
late-model browsers). The HTTP clients I'm referring to are some of the
search engine spiders that don't deal with frames appropriately. Some
of them, such as Lycos' spider, summarize Web pages with "this
site uses frames but your browser doesn't support them" error
messages. That doesn't quite convey the focused positioning statements
I've been working on for so long. Conversely, AltaVista's bot understands
frames too well, showing all
of the child documents found in a framed page. This isn't what I
wanted either.
Yeah, there are ways around this. By embedding content directly into
the <NOFRAMES> tag you can support these agents, but then you have
to manage the same content in two areas in order to support both the
frames-aware and the frames-ignorant clients. Managing frequently-changed
content in two locations is a recipe for trouble, and is more work than
I care to do.
Another option is to use custom-generated pages whenever a robot hits
my site. This happens more often than you might think. Some sites check
to see if the HTTP client is a search engine and then redirect it
to a custom-made page. Again, designing and building this kind of setup
is much more work than I care to do, considering the number of pages
on my site.
Frames are a problem for standard Web browsers as well, of course.
I would like to be able to determine if a browser is frames-compliant,
and if so, build a frame-based document that provides navigational controls
and other fixed elements appropriately. If the browser isn't aware of
frames, then merge all the docs together and send down one big, monolithic
file. Server-side development platforms like Allaire's Cold
Fusion let you do this, but the end-result isn't workable due to
reasons outside of Cold Fusion's control.
In order for this type of model to work, the server has to be able
to communicate with the client, asking for it's current state ("are
frames already loaded?", etc.). Without this knowledge, the server
just keeps re-framing new documents. Cookies, URL parameters and other
technologies just don't work here due to the dynamic nature of web linking.
Since a user can come into a page directly or indirectly, there's no
way to force a "start" and "end" to their session. Sites that are using
these techniques fail to function properly all of the time.
Another way to do this is with JavaScript, which provides the ability
to build
and destroy frames dynamically, although this approach falls down
for a couple of reasons. First of all, the various implementations of
JavaScript across the different browsers are so divergent that it's impossible
to rely on the language as an overall content management and presentation
language. The number
of IF statements required to make this work are phenomenal. Second,
many of the search engines don't ignore JavaScript, even though they
don't support it, meaning that the web pages are summarized with "function
build_frame()" and whatnot.
Overall, dynamic web documents just aren't going to work until we have
a stateful protocol that allows the server to communicate with the client
on a continual basis. HTTP 1.1 only addresses part of this problem with
the concept of "connection
management," although it is still very much a work-in-progress
with no real, usable implementations. Even then, it'll take years for
a significant number of clients to support this technology before it
will become functionally usable. In the meantime, I've eliminated frames
from my site. I've been beaten on this.
Search Engine Madness
After converting all of my Web pages to monolithic form, I went about
the task of updating the various search engines with pointers to the
new pages. Before doing this, I made sure to incorporate the appropriate META
tags, specifically making use of the DESCRIPTION
and KEYWORDS tags.
After playing with the various search engines and directories for a
couple of months, I've made some interesting discoveries. First of all,
some sites are really good about indexing your pages quickly. AltaVista, InfoSeek and HotBot are
the best at this, typically responding to new submissions within a couple
of days.
Meanwhile, Excite, Lycos and Northern
Light are the worst: none of these engines have re-visited my site
in over a month, even after multiple requests to do so. Lycos only did
it after I reported a bug in their service and then demanded they reindex
my server as pay-back. Excite hasn't even been accepting new submissions
for at least six weeks, although they've been promising
a new search bot would be available in "two weeks." This reminds
me of the running "two weeks" estimate in "The Money Pit" with Tom Hanks
and Shelley Long.
AltaVista seems to do the best job of completely scanning a remote
site. Once you submit the base URL, AltaVista will seek and index all
of your pages within a few days. With InfoSeek and HotBot, you will need
to provide each Web page URL individually in order to get the site completely
indexed quickly.
Another important aspect here is the elimination of old pages. Since
my older documents used frames, there were many orphaned files in the
conversion to monolithic pages. Almost none of these sites provide an
easy, direct method for removing old pages from their databases, instead
suggesting that they will re-walk your site within a couple of weeks,
and any dead files will be removed at that time. This doesn't
always work. Currently, InfoSeek is the only site that lets you delete
a page from their database. I would really like to see this feature added
to the other services.
From a webmaster perspective, AltaVista, InfoSeek or HotBot are all
great places to go for an up-to-date search of the Web. My personal favorite
is InfoSeek, simply because they have what I consider to be the most
usable interface.
Excite, Lycos and Northern Light are the worst of the bunch. If you're
using any of these services as a starting point for Web searches, I guarantee
that you are getting bad or out-dated results. WebCrawler used to be
great, but they recently switched to using Excite's technology and as
a result have dropped in value tremendously. Whereas they used to be
a regular visitor, my site doesn't even show up in their searches anymore.
What about Yahoo? Yahoo sucks. Because they rely on humans to add and
update entries in their database, the process of getting listed in their
system is an exercise in futility. Once something does get added, you'd
better not ever change it 'cause Yahoo probably won't update their entry.
This isn't just my opinion, either. A recent survey found that this inability
to communicate with Yahoo was a major problem with webmasters everywhere.
Even the folks who did somehow manage to get their sites listed complained
about the process. I don't even bother searching Yahoo anymore.
It seems like the only relevant links I get back are on the AltaVista
sub-page, so why not just go straight to AltaVista instead of supporting
Yahoo's madness?
How I Became A Porn King
AltaVista, InfoSeek and HotBot all use the DESCRIPTION meta tag for
the page summaries they store in their databases. This really makes it
easy for a user to see what the page is about without having to come
visit the site. Some sites don't use the DESCRIPTION tag, but instead
use the first few lines of text to build a summary. Since my site uses
a common menu in the top right hand corner, the menu elements are interpreted
as this text. On Lycos, for example, all
of my pages share the same description text.
Because of their heavy use of meta tags, AltaVista, InfoSeek and HotBot
are all fairly similar in the type of matches that they return. My logs
show that each of these search engines generate a substantial amount
of traffic that's appropriate to the content I'm providing.
The most common hits are for topics I've written about extensively
over the years, like DHCP and directory
services. There are also the odd hits for product reviews I've written
("RadioLAN" and "Sonic
Interpol" are the leaders). And of course there are hits for
things that don't make any sense at all, like "National Directory
of Education Programs inGeron." Yeah, I can see "Directory" matching,
but the overall phrase definitely shouldn't have returned a pointer to
my site. I feel particularly bad about this one because this visitor
spent a good fifteen minutes poking around my site, searching for "Page
115 August" and the like. Poor fella.
And then there are the hits I don't want at all, but because of some
poor wording on my part, I'm going to be getting them for a long time
coming. In "A Call to Arms" I
wrote that "the future problem with spam is not the 'meet horney
women' junk we get today, but the 'new from McDonalds' ads we will get
tomorrow." This seemed a fine way to summarize the essay, so I used
that in the DESCRIPTION meta tag for that page. Now my log is littered
with hits from folks looking to "meet horney women." On both InfoSeek and HotBot,
searching for "meet horney women" returns only my page. <groan>
So, META tags work, but sometimes they work too well. If you use them
- and I strongly suggest that you do - make sure they contain exactly
what you want people to see.
For those of you who are interested in finding out more about search
engines and how they work, I highly recommend that you subscribe to Danny
Sullivan's Search
Engine Report newsletter (subscribe at http://searchenginewatch.com/sereport/list.html).
Although it's quite large (it comes in two e-mail messages), it's an
easy read and extremely informative. It's a "must" for any
webmaster looking to leverage search engines to the hilt.
Miscellany
I've recently switched from running Apache on RedHat
Linux to Netscape's
FastTrack server on NT. The motivation for this was to make full
use of some NT-centric products, such as Allaire's Cold
Fusion and some log-file analyzers.
I had looked at Cold Fusion a while back and was somewhat unimpressed,
but this latest release truly dominates. It's got integrated support
for ODBC, LDAP, SMTP and POP3 protocols, among many others, allowing
me to build a very smart and highly-integrated web server. It also comes
with a slimmed-down version of the Verity
search engine, which is capable of indexing HTML files as well as
database contents. Because of the advanced integration features, I'm
now able to offer this newsletter in an HTML format for those of you
with modern day e-mail clients. If you want to get the HTML version,
go to http://www.ehsco.com/opinion/subscriber.html and
change your profile.
Regarding the log-file analysis products, it is my learned opinion
that most of these tools are little more than watered-down report writers.
If you really want to analyze your site traffic, dump the logs into a
database, and then use real data-analysis tools like Seagate
Crystal Reports to extract what you want. This model allows you to
conduct complex queries on the fly, whereas the log-file analyzers only
generate static summaries.
For example, using ODBC-based logs and Crystal Reports, I can find "the
top ten pages for the top ten visitors for last month," which is
impossible to do with WebTrends or
the other file-centric products. The most difficult aspect of this process
is getting the data into ODBC on a timely basis. Microsoft's IIS does
this directly already, allowing you to log all traffic into an ODBC database
(instead of a text file). I would request that Netscape, O'Reilly and
the other Web server vendors add this feature to their products as well.
Finally, on another note, I'm in the process of converting my ISDN
circuit to DSL, which has recently become available in my area. I'll
be sure to let you know how this goes.
That's all for now.
Regards,
Eric A. Hall
Top Dog, EHS Company
Written by Eric
A. Hall.
Copyright © 1998, EHS Company. net.Opinion is a trademark of EHS
Company. All rights reserved.
|