Skip to main content.
home | support | download

Back to List Archive

Re: Searching only a specific div class

From: Peter Karman <karman(at)not-real.cray.com>
Date: Fri Mar 12 2004 - 19:23:25 GMT
this seems related to an earlier post this week. make sure you declare 
div.product-authors as a PropertyName as well as a MetaName.

However, I still don't think that's going to help you. I'm not sure that 
the HTML parser (even when using libxml2, so the HTML2 parser) is smart 
enough to recognize tags in the <body>. I think it only works with 
<meta> tags in the <head>.

You might have better luck using the XML2 parser in your config, which 
should treat the tags as XML instead of HTML, and thus recognize your 
special tagset.

But Bill will probably give you a better answer than this.

pek

Thomas Sewell supposedly wrote on 3/12/04 1:00 PM:

> I have a site that is structured in html with multiple items per page, with sets of information about each item deliminated by div tags with a descriptive class attribute.
> 
> Shortened Example:
> <DIV class="content">
> <div class="product-details">
> <div class="product-authors">
> John Doe
> </div>
> </div>
> <div class="product-details">
> <div class="product-authors">
> Jane Doe
> </div>
> </div>
> </div>
> 
> Currently I am just indexing the full text of the page and the default swish properties for each page. The source is html, so I assume it's defaulting to use the HTML parser.
> 
> I would like to make a search available to search just the contents of the "Author" div's, for example.
> 
> I've been trying to define and use a property for the Author class, but without success.
> 
> I think I need to use some combination of metanames in the index config file and in the search cgi, but I've been unable to figure out the exact format to use.
> 
> I assume it's going to be something along the lines of:
> 
> UndefinedMetaTags ignore
> XMLClassAttributes class # Not supported by the HTML parser?
> MetaNames swishtitle swishdocpath swishdescription div.product-authors
> 
> in the index config file.
> 
> Is this possible? Would I have to convert to strict xhtml in order to use the XML parser to use the class attribute as a property/metatag? Or am I missing something else?
> 
> What occurs when I try the above is that the index appears to work (it reports "4 properties sorted." without any errors), but the search script returns "Unknown property name to sort by: Property 'div.authors' is not defined in index '<my index file>'" when I try to search by div.authors.
> 
> Anyone have an example of something like this working?
> 
> Thanks for any help,
> 
> Thomas Sewell

-- 
Peter Karman - Software Publications Engineer - Cray Inc
phone: 651-605-9009 - mailto:karman@cray.com
Received on Fri Mar 12 11:23:25 2004