Jason Purdy scribbled on 2/21/07 8:51 AM:
> I just got up & running with Swish-e and I hit a few speedbumps along
> the way, so I thought I'd share them
>
> 1) Spidering a site (-S http vs. -S prog + spider.pl)
> my %opts = ( 'raise_error' => 1 );
> $content = $request->decoded_content( %opts );
maybe set raise_error and then wrap decoded_content() in an eval() so that you
don't lose all your work to that failed doc?
> The HTML Validator is a great tool to figure out where your source is
> messing up. Come to find out, it was an included database value that
> was everywhere. What a mess. :)
>
in general, swish-e plows ahead with indexing despite failed parsing on all
levels: in spider.pl, in the libxml2 parser, etc. If the indexer fails on one
doc, it carps (to one degree or another) and plunges on to the next doc. That
design decision does seem a little reckless. OTOH, I suspect the lack of truly
incremental indexing means that you could lose hours of indexing work if a
single doc failed to parse late in the process.
But you're right: a more helpful error message would be appropriate here, imo.
> 2) Using a template system
>
> I was excited to see that you could use HTML::Template w/ the search
> results, as that's our template language of choice, but I couldn't find
> really good documentation on how to configure .swishcgi.conf accordingly
> until I dove into the source code for swish.cgi. Here is my .swishcgi.conf:
>
> return {
> title => 'QSR magazine search results',
> swish_binary => '/usr/local/bin/swish-e',
> swish_index => '/var/www/qsr/web/search/index.swish-e',
> template => {
> package => 'SWISH::TemplateHTMLTemplate',
> options => {
> filename => 'swish.tmpl',
> path => '/var/www/qsr/web/search',
> die_on_bad_params => 0,
> loop_context_vars => 1,
> cache => 1,
> },
> },
> }
>
> I got stuck b/c I thought the file parameter was named 'file' and was
> its own key/value vs. being nested in 'options'.
>
the file parameter is called 'filename' in your example above? or 'path'?
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Feb 26 14:04:16 2007