I just got up & running with Swish-e and I hit a few speedbumps along
the way, so I thought I'd share them
1) Spidering a site (-S http vs. -S prog + spider.pl)
The docs say that spider.pl is a better choice, but I found that it
really didn't work for me. Come to find out, spider.pl does a decoding
of the content and then recoding vs. swishspider just gets the content
and doesn't worry about coding. Then I found out some pages of our site
use 'utf-8' and others use 'ISO-8859-1' and there would be the odd
character that couldn't decode accordingly. I didn't find this out
until I hacked spider.pl:
my %opts = ( 'raise_error' => 1 );
$content = $request->decoded_content( %opts );
When I did this, spider.pl died when it was decoding content that had a
charset of utf-8 and there was a some odd character ("\x92") in there
(see error msg below). Took me way too long to figure that one out.
Perhaps we should raise_error by default. The warning that the document
had no content was good, but it would be better if a warning was fired
before that if the content couldn't be retrieved b/c it had the wrong
charset.
utf8 "\x92" does not map to Unicode at /usr/lib/perl/5.8/Encode.pm line 164.
at /usr/local/lib/swish-e/spider2.pl line 1139
The HTML Validator is a great tool to figure out where your source is
messing up. Come to find out, it was an included database value that
was everywhere. What a mess. :)
2) Using a template system
I was excited to see that you could use HTML::Template w/ the search
results, as that's our template language of choice, but I couldn't find
really good documentation on how to configure .swishcgi.conf accordingly
until I dove into the source code for swish.cgi. Here is my .swishcgi.conf:
return {
title => 'QSR magazine search results',
swish_binary => '/usr/local/bin/swish-e',
swish_index => '/var/www/qsr/web/search/index.swish-e',
template => {
package => 'SWISH::TemplateHTMLTemplate',
options => {
filename => 'swish.tmpl',
path => '/var/www/qsr/web/search',
die_on_bad_params => 0,
loop_context_vars => 1,
cache => 1,
},
},
}
I got stuck b/c I thought the file parameter was named 'file' and was
its own key/value vs. being nested in 'options'.
We may want to add that to the swish.cgi documentation:
http://swish-e.org/docs/swish.cgi.html
I hope this may help others - now that I'm up & running, I'm moving
along and am really enjoying it. Keep up the great work.
Cheers,
Jason
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Feb 21 09:49:22 2007