Skip to main content.
home | support | download

Back to List Archive

Re: PropertyNames not working in 2.1-dev-24

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Nov 19 2001 - 14:25:55 GMT
At 11:54 AM 11/19/2001 +0000, Julian Perry wrote:

>The thing that actually fixed the problem was
>forcing use of the libxml2 parser.

I used the HTML you posted and your config file with the built-in HTML
parser and it worked.

 > ./swish-e -f jindex -w not dkdk -p description 
# SWISH format: 2.1-dev-24
# Search words: not dkdk
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.005 seconds
1000 j.html "Title" 437 "UK based Wine Shop offering 1000's of wines for
delivery world-wide. Award winning Web site with wine games, quiz, and
extensive back-ground information."
.

So, perhaps there's something else in that HTML doc that's causing
problems.  Please send complete examples.

>I guess I'm
>happy with the fix, but surely it should have
>worked with the other parser (in the ideal world
>what we all live in)!

If the other parser worked perfectly I wouldn't have spent time adding
libxml2 to swish.  Something fun for everyone is to index your documents
using both parsers, then use swish's -T index_words_only on both indexes
and run diff to see the differences.

>I've tried that, and I've got a pretty recent
>version of perl5 (5.6.1) and I've loaded all
>the modules that seem to be required - but I
>still can't get it running:
>    Name "HTML::Tagset::linkElements" used only once: possible typo at
./swish-spider.pl line 503.
>
>and then a bunch of:
>    Use of uninitialized value in hash element at ./swish-spider.pl line 509.
>    Use of uninitialized value in hash element at ./swish-spider.pl line 509.
>    Use of uninitialized value in hash element at ./swish-spider.pl line 509.
>
>Any thoughts?

Not really.  I have a different spider.pl that you seem to have.  If you
can use CVS then we can use the same version and find the problem.  I just
installed new everything on a new machine (including a new perl, and all
the modules) and it went without a problem.  I also can't offer any help
without knowing your config, and the commands you are using.  It's a lot
easier, obviously, if I can duplicate your problem.

>Another problem, and I've been looking at this
>on 2.1-dev-24 because it's been a long-standing
>problem with 1.3.2, I get a Bus Error from swish
>when building an index.
>
>Can you suggest the best set of command line
>options to help debug this?  Failing that I
>guess I'll be looking at running under GDB.
>It fails towards the end, as I remember.

gdb is the way to go, I'd think.  Can you put together a few documents that
demonstrate the problem?


>I used to get problems when there were invalid
>characters in HREF's - i.e. single quotes, is
>swish particularly sensitive to things like
>that?

Not by design.  Unlikely with libxml2, but hard to say without seeing an
example.  When I first stated using libxml2 I had some problems with a few
HTML issue -- mostly with it hanging when swish tried to abort processing
in the middle of a doc.  AFAIK, that's all been fixed in current versions
of libxml2.

So, see if you can get together an example document and a config file that
demonstrates the problem you are having.

http://www.swish-e.org/2.2/docs/INSTALL.html#When_posting_please_provide_the_

Thanks,


Bill Moseley
mailto:moseley@hank.org
Received on Mon Nov 19 14:26:32 2001