Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] partial indexing

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Wed Apr 01 2009 - 00:54:31 GMT
Zhou Xiang wrote on 3/31/09 4:09 PM:
> Hi Peter,
> 
> Thank you!
> It works well now.
> 
> Another question:
> When i tried to index only one page:
> http://rust.cc.lehigh.edu/beyondsteel/swish-title.php
> 
> What if I only want to index a specific meta tag (or meta tags) in the
> source file? And I do not want to index what is shown on the page.
> Say, i just want to index the meta name with "last name", so that if i
> search for "lnameAche", the page will be returned. (Pls see the source file
> of the page)
> 
> I included the following line to the swish.config file:
> # Specify which meta names to include in the index
>   MetaNames employer
> 
> It does not work.
> 
> (What about the tag "last name"? It has two words.)
> 
> Any ideas? Thanks!

If you just want to index the metadata from the page and not the content, you'll
have to filter your input (content) before passing to swish-e.

If your web pages are being generated dynamically, why not just generate
index-ready content instead?

Or alternately, spider your pages as-is, then pass them through a filter to
swish-e. A simple regex in a Perl script should strip out the <body> content:

 s,<body.*?>.*</body>,,sgi;

and then pass to swish-e -S prog.

As for MetaNames with spaces in them, you'll have to filter those too.

 s,<meta name="(\S+)\ +(\S+)",<meta name="$1.$2",g;

and put:

 MetaNames last.name

etc., in your swish-e config.


-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Mar 31 20:54:34 2009