At 05:35 PM 11/21/02 -0800, Tref Gare wrote:
>I'm trying to index a variety of elements and possibly attributes of the
>xml files.
>
>I then need to be able to search the index for texts and/or dates in
>those specific fields.
>
>Do I understand you right that this isn't quite in swish-e's scope, and
>if so is it the attribute stuff that is stretching the envelope or the
>indexing of the values.
Swish-e has a limited way to work with xml, yet xml has a lot more
flexibility in representing complex data. How's that for a vague answer?
>I have some control over the xml's design and could change most things
>into elements if that would bring the app back into swish-e's world, as
>it's basically there for this searching functionality.
>
>As such I've made some adjustments to the xml such that it appears like
>this
>
><event id="1341341234">
>
><htmlLocation>http://blesdfs.sdlfsf.sflsdf/sfdfs.htm</htmlLocation
> <eventTitle>Run Lola Run</eventTitle>
> <oneLiner>it's a goodun</oneLiner>
> <description>no a.. really really goodun</description>
> ... other stuff
> <interval>
> <startDate>22/11/2002</startDate>
> <endDate>24/11/2002</endDate>
> </interval>
That's basically what UndefinedXMLAttributes does, although that builds a
metaname from the combined tag and the attribute name.
http://swish-e.org/current/docs/SWISH-CONFIG.html#item_UndefinedXMLAttributes
> <interval>
> <startDate>25/11/2002</startDate>
> </interval>
But I don't know your data. You have two interval start dates, so I'm not
clear how you will search on those.
>To extract which I'm adding the following to my config file
>
>MetaNames oneLiner eventTitle htmlLocation startDate endDate
>PropertyNames oneLiner htmlLocation startDate endDate eventTitle
>
>
>All this seems to make sense to me however I'm still not getting the
>fields back field back (except strangely enough for the oneLiner
>element).
This is what I get (I like to use single words)
> cat 3.xml
<event id="1341341234">
<htmlLocation>htmltext</htmlLocation>
<eventTitle>eventitletext</eventTitle>
<oneLiner>onelinertext</oneLiner>
<description>descripttext</description>
otherstuff
<interval>
<startDate>startdate1</startDate>
<endDate>enddate1</endDate>
</interval>
<interval>
<startDate>startdate2</startDate>
</interval>
</event>
> cat tt
DefaultContents XML2
MetaNames oneLiner eventTitle htmlLocation startDate endDate
PropertyNames oneLiner htmlLocation startDate endDate eventTitle
> ./swish-e -c tt -i 3.xml -v0 -T indexed_words properties
Adding:[1:htmllocation(12)] 'htmltext' Pos:3 Stuct:0x1 ( FILE )
Adding:[1:eventtitle(11)] 'eventitletext' Pos:6 Stuct:0x1 ( FILE )
Adding:[1:oneliner(10)] 'onelinertext' Pos:9 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'descripttext' Pos:15 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'otherstuff' Pos:16 Stuct:0x1 ( FILE )
Adding:[1:startdate(13)] 'startdate1' Pos:17 Stuct:0x1 ( FILE )
Adding:[1:enddate(14)] 'enddate1' Pos:20 Stuct:0x1 ( FILE )
Adding:[1:startdate(13)] 'startdate2' Pos:25 Stuct:0x1 ( FILE )
swishdocpath: 6 ( 5) S: "3.xml"
swishdocsize: 8 ( 4) N: "451"
swishlastmodified: 9 ( 4) D: "2002-11-21 18:30:50"
oneliner:15 ( 12) S: "onelinertext"
htmllocation:16 ( 8) S: "htmltext"
startdate:17 ( 21) S: "startdate1 startdate2"
enddate:18 ( 8) S: "enddate1"
eventtitle:19 ( 13) S: "eventitletext"
If you are planning on searching (limiting searches to) a data range then
look at the -L switch. That's really a database function instead of a full
text search engine function, but it works reasonably well for indexes that
are not too huge. Whatever that means. You can only have one date of a
given property name per file/record, though.
--
Bill Moseley
mailto:moseley@hank.org
Received on Fri Nov 22 02:36:48 2002