Skip to main content.
home | support | download

Back to List Archive

Re: Relative Newbie Swish-e indexing query

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Nov 22 2002 - 01:01:38 GMT
At 04:37 PM 11/21/02 -0800, Tref Gare wrote:
>I'm trying to define them as attributes via the following lines in my
>swish.config
>
>MetaNames description event keywords oneLiner title
>XMLClassAttributes htmlLocation startDate endDate
># adding the property names line
>PropertyNames oneLiner keywords htmlLocation startDate endDate
>
>Anyone got any thoughts as to why I can't seem to access/reference/index
>them?

I'm not clear what you want to do.

XMLClassAttributes does this:

<event version="1.0">
    <datesList>
        <interval startDate="2002-11-21">
           firstone
        </interval>
    
        <interval startDate="2002-11-19">
           secondone
        </interval>
    </datesList>
</event>

> cat t
IndexContents XML2 .xml
XMLClassAttributes startDate

> ./swish-e -c t -i 2.xml -T parsed_tags  -v0                         
<event> (undefined meta name - no action)
<dateslist> (undefined meta name - no action)
<interval> (undefined meta name - no action)
<interval.2002-11-21> (undefined meta name - no action)
<interval> (undefined meta name - no action)
<interval.2002-11-19> (undefined meta name - no action)

So notice it's making a tag by combining the tag <interval> with the
*value* of the startDate attribute.

So by adding this to the config:

  PropertyNames interval.2002-11-19

you get

 ./swish-e -c t -i 2.xml -T properties  -v0            
          swishdocpath: 6 (  5) S: "2.xml"
          swishdocsize: 8 (  4) N: "235"
     swishlastmodified: 9 (  4) D: "2002-11-21 16:43:12"
   interval.2002-11-19:10 (  9) S: "secondone"

I doubt that's what you want.   Do you want to index the *value*?

This doesn't work well, but:

> cat 2.xml
<event version="1.0">
    <datesList>
        <interval startDate="2002-11-21" />
        <interval startDate="2002-11-19" />
    </datesList>
</event>

> cat t
IndexContents XML2 .xml
UndefinedXMLAttributes ignore
PropertyNames interval.startdate

> ./swish-e -c t -i 2.xml -T properties  -v0 
          swishdocpath: 6 (  5) S: "2.xml"
          swishdocsize: 8 (  4) N: "153"
     swishlastmodified: 9 (  4) D: "2002-11-21 16:52:22"
    interval.startdate:10 ( 21) S: "2002-11-21 2002-11-19"

Notice how now there's a swish-created "interval.startdate" metaname
(property in this example) which used the value from each one for the data.

There's a bunch of weird problems with this xml parsing.  For one thing
it's hard to index just some deeply nested content only.  That's because if
an outside tag is ignored then the inner tag is not seen.

Also, I think it's sometimes hard to convert the nested xml structure into
flattened metanames that swish-e uses.  XML gives a flexible way to
represent data, and that doesn't always map into a nice few config options
for swish-e.

If you have complex xml data where you only want to index specific parts
than it's probably smart to use -S prog and an XML SAX or DOM parser and
extract out the specific data you like.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Fri Nov 22 01:01:47 2002