On Wed, Feb 21, 2007 at 09:51:25AM -0500, Jason Purdy wrote:
> The docs say that spider.pl is a better choice, but I found that it
> really didn't work for me. Come to find out, spider.pl does a decoding
> of the content and then recoding vs. swishspider just gets the content
> and doesn't worry about coding. Then I found out some pages of our site
> use 'utf-8' and others use 'ISO-8859-1' and there would be the odd
> character that couldn't decode accordingly. I didn't find this out
> until I hacked spider.pl:
>
> my %opts = ( 'raise_error' => 1 );
> $content = $request->decoded_content( %opts );
>
> When I did this, spider.pl died when it was decoding content that had a
> charset of utf-8 and there was a some odd character ("\x92") in there
> (see error msg below). Took me way too long to figure that one out.
> Perhaps we should raise_error by default. The warning that the document
> had no content was good, but it would be better if a warning was fired
> before that if the content couldn't be retrieved b/c it had the wrong
> charset.
What about just printing $@?
Something like:
unless ( $content ) {
warn "Failed decode of $uri: $@\n" if $@;
my $empty = '';
output_content( $server, \$empty, $uri, $response )
unless $server->{no_index};
return;
}
> 2) Using a template system
>
> I was excited to see that you could use HTML::Template w/ the search
> results, as that's our template language of choice, but I couldn't find
> really good documentation on how to configure .swishcgi.conf accordingly
> until I dove into the source code for swish.cgi. Here is my .swishcgi.conf:
>
> return {
> title => 'QSR magazine search results',
> swish_binary => '/usr/local/bin/swish-e',
> swish_index => '/var/www/qsr/web/search/index.swish-e',
> template => {
> package => 'SWISH::TemplateHTMLTemplate',
> options => {
> filename => 'swish.tmpl',
> path => '/var/www/qsr/web/search',
> die_on_bad_params => 0,
> loop_context_vars => 1,
> cache => 1,
> },
> },
> }
>
> I got stuck b/c I thought the file parameter was named 'file' and was
> its own key/value vs. being nested in 'options'.
Ya, the example in the source isn't very clear:
xtemplate => {
package => 'SWISH::TemplateHTMLTemplate',
options => {
filename => 'swish.tmpl',
path => '@@pkgdatadir@@',
die_on_bad_params => 0,
loop_context_vars => 1,
cache => 1,
},
},
Not sure why it's not consistent.
BTW -- did you look at the search.cgi example? swish.cgi is kind of a
mess since it tries to do so much and if you have Perl experience then
it's not that hard to write a script that is customized to your needs.
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Feb 26 14:41:50 2007