I am trying to get the Swish-E Spider to read links in HTML files. I have
found that the spider is VERY sensitive to the presence of certain bits of
HTML. For example,
The spider DOES work with this
<HTML>
<HEAD>
<title>Chris'</title>
</HEAD>
<BODY>
<BR>
<BR>
<A HREF="http://chris/hreftest.htm">
<!-- This HTML lets the user click on the image -->
<!-- (below) to link to the HTML file (above) -->
<IMG SRC="d:\My Documents\swish-e2.gif">
</A>
<BR>
<BR>
</BODY>
</HTML>
But the spider falls over when presented with this
<HTML>
<HEAD>
<title>Chris'</title>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
</HEAD>
<BODY>
<BR>
<BR>
<A HREF="http://chris/hreftest.htm">
<!-- This HTML lets the user click on the image -->
<!-- (below) to link to the HTML file (above) -->
<IMG SRC="d:\My Documents\swish-e2.gif">
</A>
<BR>
<BR>
</BODY>
</HTML>
I have narrowed the problem down to the presence of this single piece of
HTML
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
Why should this cause the spider to cease reading, or fail to read, links
to other files? This line was clearly placed in the HTML file by FrontPage
Express, and I know that FrontPage Express is not the most wonderful HTML
generator in the world, but I can't quite see why the spider would have
such trouble with it, especially if all the spider is doing is recursively
hunting for hrefs in the document.
Can someone shed some light on this problem?
Many thanks, in advance,
Chris Humphries
chrisjmh@vermilion99.freeserve.co.uk
Received on Tue Dec 21 04:40:28 1999