Re: MetaName search not working, yet

From: Bill Moseley <moseley(at)>
Date: Tue Jan 29 2002 - 17:56:26 GMT
At 09:28 AM 01/29/02 -0800, Gordon Jessop wrote:
>Thanks to Bill for the upgrade hint!  Got swish-e 2.1-dev-25 to index (built
>*without* libxml2).

That may or may not work in your favor.  libxml2 will sometimes fix bad
html, but the default html parser may be able to parse things that are
really not html.

>However, I am now saddled with this problem: I have to index a series of
>jscript source files and the MetaNames function within swish-e does not seem
>to be catching.

Swish knows how to parse HTML, XML, and text.  So I'm not sure what you
will get from parsing javascript.

>    EnableAltSearchSyntax yes
>    SwishSearchOperators   AND OR NOT
>    SwishSearchDefaultRule AND

Those are not implemented, AFAIK.

>    FollowSymLinks no
>    FileRules filename contains "\.jsp$"

I think that should be the same as IndexOnly .jsp

>The Content:
>I have to index a series of jscript source files.  Each file would contain
>something like:
>// <title>Guns and Butter</title>
>globalPackage.description = '<meta_description>Some indexable words like
>supply and demand, guns and butter.</meta_description>';
> = '<meta_author>Gordon Jessop</meta_author>';
> = '1';
> = 'checked';
>globalPackage.blah = '123456';

I think you can only use <meta_description> with libxml2.  If I remember
correctly, the HTML parser thinks everything <foo> is an HTML tag.  Libxml2
knows what are HTML tags, so when I get passed a tag (this is in parser.c)
from libxml2, I know if it's a real HTML tag.  If not then I pretend it's a
metaname.  That's how that hack works.  That's probably why your metanames
are not working.

Back to your source file.  What exactly are you expecting to search for?
Once you know that you can adjust your content as necessary to make that
possible.  Are you trying to make a javascript library searchable?

This is what I'd do:  I would take and use that to grab your
files.  Then parse the file by regular expressions extracting out what you
need.  Then format as XML or HTML and send it off to swish.

That way you have full control over what is indexed, and under what
metanames and properties.  Does that make sense?

>Note: Due to imposed constraints, I am unable to use the proper <META
>Name="name" CONTENT="content"> syntax and have settled for the option
>described in the 2.2 docs (i.e. <meta_description>...</meta_description>)

Again, I think that's only for libxml2.  Can you remind me where that is in
the 2.2 docs?

Bill Moseley
Received on Tue Jan 29 17:57:00 2002