On Thu, Sep 04, 2003 at 07:27:18AM -0700, John McGowan wrote:
> I'm indexing a site that has an entirely textual navigation system, and
> I want to configure swish-e to ignore those menus when indexing the
> site. The site is at http://www.emiliemcgowan.com/
Is your site generated from templates or dynamically?
http://swish-e.org/current/docs/SWISH-FAQ.html#How_do_I_prevent_indexing_parts_of_a_document_
> the menu code is in <td> tags, like the following...
>
> <TD CLASS="MENUTEXT"><A CLASS=MENUAT HREF="main.taf?p=0">Home</A></TD>
>
> but of course I don't want to ignore all TD's or all A's.
Well, if you tell swish-e that your documents are XML you can do this:
moseley@laptop:~$ cat 1.html
<html>
<head><title>Title</title>
</head>
<body>
<table><tr><td>first</td></tr></table>
<table><tr><td class="foo">second</td><td>second2</td></tr></table>
<table><tr><td>third</td></tr></table>
</body>
</html>
moseley@laptop:~$ cat c
DefaultContents XML2
XMLClassAttributes class
IgnoreMetaTags td.foo
moseley@laptop:~$ swish-e -c c -i 1.html -T indexed_words -v0
Adding:[1:swishdefault(1)] 'title' Pos:17 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'first' Pos:18 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'second2' Pos:31 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'third' Pos:32 Stuct:0x1 ( FILE )
Notice that "second" is not indexed.
But that only works for XML docs, and I'm not sure why that's a
limitation without spending some time looking at the code.
--
Bill Moseley
moseley@hank.org
Received on Thu Sep 4 15:50:43 2003