Skip to main content.
home | support | download

Back to List Archive

[swish-e] First time Swish-e user with some thoughts/feedback

From: Jason Purdy <jason(at)not-real.journalistic.com>
Date: Wed Feb 21 2007 - 14:51:25 GMT
I just got up & running with Swish-e and I hit a few speedbumps along 
the way, so I thought I'd share them

1) Spidering a site (-S http vs. -S prog + spider.pl)

The docs say that spider.pl is a better choice, but I found that it 
really didn't work for me.  Come to find out, spider.pl does a decoding 
of the content and then recoding vs. swishspider just gets the content 
and doesn't worry about coding.  Then I found out some pages of our site 
use 'utf-8' and others use 'ISO-8859-1' and there would be the odd 
character that couldn't decode accordingly.  I didn't find this out 
until I hacked spider.pl:

my %opts = ( 'raise_error' => 1 );
$content = $request->decoded_content( %opts );

When I did this, spider.pl died when it was decoding content that had a 
charset of utf-8 and there was a some odd character ("\x92") in there 
(see error msg below).  Took me way too long to figure that one out. 
Perhaps we should raise_error by default.  The warning that the document 
had no content was good, but it would be better if a warning was fired 
before that if the content couldn't be retrieved b/c it had the wrong 
charset.

utf8 "\x92" does not map to Unicode at /usr/lib/perl/5.8/Encode.pm line 164.
  at /usr/local/lib/swish-e/spider2.pl line 1139

The HTML Validator is a great tool to figure out where your source is 
messing up.  Come to find out, it was an included database value that 
was everywhere.  What a mess. :)

2) Using a template system

I was excited to see that you could use HTML::Template w/ the search 
results, as that's our template language of choice, but I couldn't find 
really good documentation on how to configure .swishcgi.conf accordingly 
until I dove into the source code for swish.cgi.  Here is my .swishcgi.conf:

return {
     title        => 'QSR magazine search results',
     swish_binary => '/usr/local/bin/swish-e',
     swish_index  => '/var/www/qsr/web/search/index.swish-e',
     template     => {
             package         => 'SWISH::TemplateHTMLTemplate',
             options         => {
                 filename            => 'swish.tmpl',
                 path                => '/var/www/qsr/web/search',
                 die_on_bad_params   => 0,
                 loop_context_vars   => 1,
                 cache               => 1,
             },
         },
}

I got stuck b/c I thought the file parameter was named 'file' and was 
its own key/value vs. being nested in 'options'.

We may want to add that to the swish.cgi documentation:
http://swish-e.org/docs/swish.cgi.html

I hope this may help others - now that I'm up & running, I'm moving 
along and am really enjoying it.  Keep up the great work.

Cheers,

Jason
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Feb 21 09:49:22 2007