Re: [swish-e] partial indexing

From: Peter Karman <peter(at)>
Date: Wed Apr 01 2009 - 00:54:31 GMT
Zhou Xiang wrote on 3/31/09 4:09 PM:
> Hi Peter,
> Thank you!
> It works well now.
> Another question:
> When i tried to index only one page:
> What if I only want to index a specific meta tag (or meta tags) in the
> source file? And I do not want to index what is shown on the page.
> Say, i just want to index the meta name with "last name", so that if i
> search for "lnameAche", the page will be returned. (Pls see the source file
> of the page)
> I included the following line to the swish.config file:
> # Specify which meta names to include in the index
>   MetaNames employer
> It does not work.
> (What about the tag "last name"? It has two words.)
> Any ideas? Thanks!

If you just want to index the metadata from the page and not the content, you'll
have to filter your input (content) before passing to swish-e.

If your web pages are being generated dynamically, why not just generate
index-ready content instead?

Or alternately, spider your pages as-is, then pass them through a filter to
swish-e. A simple regex in a Perl script should strip out the <body> content:


and then pass to swish-e -S prog.

As for MetaNames with spaces in them, you'll have to filter those too.

 s,<meta name="(\S+)\ +(\S+)",<meta name="$1.$2",g;

and put:


etc., in your swish-e config.

Peter Karman  .  .  peter(at)
