On Wed, Aug 02, 2006 at 01:17:36PM -0700, Ken Schweigert wrote:
> I'm having trouble getting Swish-e to write the correct title to the
> index. I use the "swishspider" to index the site because it is a
> dynamic site and uses mod_rewrite.
swishspider might be a problem.
> 1000 http://www.cedarhomes.com/cedar_homes/featured_articles_1//?
> expand=44 "?expand=44" 11555
(Isn't it amazing that modern email clients can't paste without
wrapping?)
Hum, this doesn't make much sense:
moseley(at)not-real.bumby:~/ken$ swishspider ken http://www.cedarhomes.com/cedar_homes/featured_articles_1
moseley@bumby:~/ken$ head ken.contents
Cedar Homes :: Cedar Homes :: FEATURED ARTICLES<br><!-- START TEMPLATE: portal/menu.php -->
<ul id="p7PMnav">
<li><a href="/site_map.php" title="Cedar Homes Site Map">SITE MAP</a></li>
<li><a href="/design_center/" title="Design Center">DESIGN CENTER</a></li>
<li><a href="http://www.cedarhomes.com/cedar_homes/contact_us_1" class="p7PMtrg">CONTACT US</a>
<ul>
<li><a href="http://www.cedarhomes.com/cedar_homes/contact_us_1/contact_our_design_staff">CONTACT OUR DESIGN STAFF</a>
</li><li><a href="http://www.cedarhomes.com/cedar_homes/contact_us_1/about_our_design_managers">ABOUT OUR DESIGN MANAGERS</a>
Try the new spider -- looks better.
moseley(at)not-real.bumby:~/ken$ /usr/local/lib/swish-e/spider.pl default http://www.cedarhomes.com/cedar_homes/featured_articles_1 | head
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'
Path-Name: http://www.cedarhomes.com/cedar_homes/featured_articles_1
Content-Length: 22973
Document-Type: html*
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Cedar Homes :: Cedar Homes :: FEATURED ARTICLES</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Which is weird because they both use Perl's LWP module to fetch the
remote document.
Looks like the swishspider program isn't fetching the document
correctly. But if I try it on other sites it looks fine:
moseley(at)not-real.bumby:~/ken$ swishspider ken http://slashdot.org/
moseley@bumby:~/ken$ head ken.contents
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Slashdot: News for nerds, stuff that matters</title>
<link rel="stylesheet" type="text/css" media="screen, projection" href="//images.slashdot.org/base.css?T_2_5_0_120">
<link rel="stylesheet" type="text/css" media="screen, projection" href="//images.slashdot.org/ostgnavbar.css?T_2_5_0_120">
That's really odd. You have validation errors on your site, but
nothing I see that would confuse things. And like I said,
swishspider, GET, and spider.pl all use the same Perl module to fetch
that page.
Anyway, any reason you are not using spider.pl to spider your pages?
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Wed Aug 2 18:46:04 2006