Skip to main content.
home | support | download

Back to List Archive

Re: Relative Newbie Swish-e indexing query

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Nov 22 2002 - 02:36:24 GMT
At 05:35 PM 11/21/02 -0800, Tref Gare wrote:
>I'm trying to index a variety of elements and possibly attributes of the
>xml files.
>
>I then need to be able to search the index for texts and/or dates in
>those specific fields.
>
>Do I understand you right that this isn't quite in swish-e's scope, and
>if so is it the attribute stuff that is stretching the envelope or the
>indexing of the values.

Swish-e has a limited way to work with xml, yet xml  has a lot more
flexibility in representing complex data.  How's that for a vague answer?


>I have some control over the xml's design and could change most things
>into elements if that would bring the app back into swish-e's world, as
>it's basically there for this searching functionality.
>
>As such I've made some adjustments to the xml such that it appears like
>this
>
><event id="1341341234">
>	
><htmlLocation>http://blesdfs.sdlfsf.sflsdf/sfdfs.htm</htmlLocation
>	<eventTitle>Run Lola Run</eventTitle>
>	<oneLiner>it's a goodun</oneLiner>
>	<description>no a.. really really goodun</description>
>	... other stuff
>	<interval>
>		<startDate>22/11/2002</startDate>
>		<endDate>24/11/2002</endDate>
>	</interval>

That's basically what UndefinedXMLAttributes does, although that builds a
metaname from the combined tag and the attribute name.

http://swish-e.org/current/docs/SWISH-CONFIG.html#item_UndefinedXMLAttributes


>	<interval>
>		<startDate>25/11/2002</startDate>
>	</interval>

But I don't know your data.  You have two interval start dates, so I'm not
clear how you will search on those.


>To extract which I'm adding the following to my config file
>
>MetaNames oneLiner eventTitle htmlLocation startDate endDate
>PropertyNames oneLiner htmlLocation startDate endDate eventTitle
>
>
>All this seems to make sense to me however I'm still not getting the
>fields back field back (except strangely enough for the oneLiner
>element).

This is what I get (I like to use single words)

> cat 3.xml
<event id="1341341234">

<htmlLocation>htmltext</htmlLocation>                                      
        <eventTitle>eventitletext</eventTitle>
        <oneLiner>onelinertext</oneLiner>
        <description>descripttext</description>
        otherstuff
        <interval>
                <startDate>startdate1</startDate>
                <endDate>enddate1</endDate>
        </interval>
        <interval>
                <startDate>startdate2</startDate>
        </interval>
</event>

> cat tt
DefaultContents XML2
MetaNames oneLiner eventTitle htmlLocation startDate endDate
PropertyNames oneLiner htmlLocation startDate endDate eventTitle

> ./swish-e -c tt -i 3.xml -v0 -T indexed_words properties
      
    Adding:[1:htmllocation(12)]   'htmltext'   Pos:3  Stuct:0x1 ( FILE )
    Adding:[1:eventtitle(11)]   'eventitletext'   Pos:6  Stuct:0x1 ( FILE )
    Adding:[1:oneliner(10)]   'onelinertext'   Pos:9  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'descripttext'   Pos:15  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'otherstuff'   Pos:16  Stuct:0x1 ( FILE )
    Adding:[1:startdate(13)]   'startdate1'   Pos:17  Stuct:0x1 ( FILE )
    Adding:[1:enddate(14)]   'enddate1'   Pos:20  Stuct:0x1 ( FILE )
    Adding:[1:startdate(13)]   'startdate2'   Pos:25  Stuct:0x1 ( FILE )
          swishdocpath: 6 (  5) S: "3.xml"
          swishdocsize: 8 (  4) N: "451"
     swishlastmodified: 9 (  4) D: "2002-11-21 18:30:50"
              oneliner:15 ( 12) S: "onelinertext"
          htmllocation:16 (  8) S: "htmltext"
             startdate:17 ( 21) S: "startdate1 startdate2"
               enddate:18 (  8) S: "enddate1"
            eventtitle:19 ( 13) S: "eventitletext"

If you are planning on searching (limiting searches to) a data range then
look at the -L switch.  That's really a database function instead of a full
text search engine function, but it works reasonably well for indexes that
are not too huge.  Whatever that means.  You can only have one date of a
given property name per file/record, though.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Fri Nov 22 02:36:48 2002