Skip to main content.
home | support | download

Back to List Archive

Re: Getting a description out of the html <body>

From: Eric Lease Morgan <eric_morgan(at)not-real.infomotions.com>
Date: Wed Mar 27 2002 - 13:09:29 GMT
Markus Strickler <markus@braindump.ms> wrote:

> I was just wondering how other people get a decent description out of an HTML
> page's body tag. Most of the time the first few characters are navigation or
> similar things. Is there a way to use for example the contents of <span
> id="content">..</span> or something similar as description?

I get a description (abstract) out of some of my searches, but I don't get
it out of the body tag. Rather, I extract it from a meta tag. Here's what I
do:

  1) create an HTML file with a meta tag named abstract
     and fill it with content
 
  2) index my document(s) making sure I include "PropertyNames abstract"
     "MetaNames abstract" in my swish configuration file

  3) search my resulting index using the -p flag as in "-p abstract"

  4) parse each line of the search results which will now contain the
     the abstract

  5) display the parse results

This technique also works very well for controlled vocabulary (subject)
terms, and since I know the shape of my swish search URL's, I can mark up
these controlled vocabulary terms in order to address the perennial problem
of, "Find me more like this one."

To see this in action, visit the following URL and search for "library":

  http://www.infomotions.com/search/

For a more detailed description of how I did this, see:

  http://www.infomotions.com/musings/smart-pages/

View the source of the Smart Pages page to see how the abstract is encoded
in the head of the document.

-- 
Eric Lease Morgan
http://www.infomotions.com/
Received on Wed Mar 27 13:09:33 2002