Re: (searching by metaname)

From: Bill Moseley <moseley(at)>
Date: Thu Feb 06 2003 - 22:31:32 GMT
On Thu, 6 Feb 2003, Rons Dixon wrote:

> > I am currently running swish-e version 2.2 and i would like to know=20
> > how do i limit my search to a property and the body of the document.
> >=20
> >=20
> >=20
> >=20
> > for example
> >=20
> > <title>Main Heading</title>
> > <doctitile>Test the waters<doctitle>
> > <body>
> > This is just a test to see if it would. This is the body of the file
> > </body>
> >=20
> > The above is the layout of my file structure. I want to limit my=20
> > search to <doctitle> and <body>

This URL shows how to limit to part of a document.

If you are indexing as HTML then <body> will be indexed as the default
metaname (called "swishdefault").  So to limit a search to both you would
search like one of these:

   -w foo OR doctitle=foo
   -w swishdefault=foo OR doctitle=foo
   -w swishdefault=(foo) OR doctitle=(foo)  (if "foo" is more than one word)

of course, if that's HTML then doctitle is not a valid HTML tag (although
with indexing with libxml2 (HTML2 parser type) you can do that to some
degree (use non HTML tags in the body).

If you are indexing XML then you would need to specify both the doctitle
and body as metanames.

  MetaNames doctitle body

but then <body> words will be indexed under the metaname "body" instead of
"swishdefault" and words in <doctitle> will be indexed in "doctitle"
metaname and the "body" metaname (since <doctitle> would probably need to
be within the body tag as in:

<title>Main Title</title>

$ ./swish-e -i 2.xml -c c  -T indexed_words
Indexing Data Source: "File-System"
Indexing "2.xml"
    Adding:[1:swishdefault(1)]   'main'   Pos:2  Stuct:0x7 ( HEAD TITLE FILE )
    Adding:[1:swishdefault(1)]   'title'   Pos:3  Stuct:0x7 ( HEAD TITLE FILE )
2.xml:6: error: Tag doctitle invalid
    Adding:[1:doctitle(10)]   'doctitle'   Pos:7  Stuct:0x89 ( META BODY FILE )
    Adding:[1:body(11)]   'doctitle'   Pos:7  Stuct:0x89 ( META BODY FILE )
    Adding:[1:body(11)]   'bodyword'   Pos:9  Stuct:0x89 ( META BODY FILE )


Bill Moseley
Received on Thu Feb 6 22:32:56 2003