Skip to main content.
home | support | download

Back to List Archive

Re: [XWarn] Re: Re: attribute value attaching to wrords

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Nov 14 2003 - 01:32:25 GMT
On Thu, Nov 13, 2003 at 04:19:22PM -0800, Dave Moreau wrote:
> Here's an example

Perfect! Yep I can duplicate that.

I'm not sure how easy it is to get right.  The parser is not very smart
in this regard.  That's somewhat on purpose so that the parser is
generic -- that is, it doesn't try to understand what the tags are.
Probably in this case it's easier since special handling is happing.

But here's why it is happening:

$ cat 2
start<b>bold</b>end

$ swish-e -i 2 -v0 -T indexed_words
    Adding:[1:swishdefault(1)]   'startboldend'   Pos:2  Stuct:0x49 ( EM BODY FILE )

Which is the right thing there.

In your case you have:

 <line indent="3em">the thunder of one heart beating--but</line>

which UndefineXMLAttributes translates into:

  <line><line.indent>3em</line.indent>the thunder ...</line>

Or another way to see the same thing:

$ cat 1.xml
<xml>
<foo>word</foo><bar>another</bar>
</xml>

$ swish-e -c c -i 1.xml -v0 -T indexed_words parsed_tags
<xml> (undefined meta name - no action)
<foo> (undefined meta name - no action)
<bar> (undefined meta name - no action)
    Adding:[1:swishdefault(1)]   'wordanother'   Pos:7  Stuct:0x1 ( FILE )

Since those are undefined they are basically ignored.  I'm not sure what
is the Correct Behavior, though.  Suggestions?


I'm curious.  Why are you using UndefinedXMLAttributes?



> 
> D:\SWISH-E2_4\test>type t
> IndexFile       D:\swish-e2_4\test\i
> IndexDir        D:\swish-e2_4\test
> UndefinedMetaTags       index
> UndefinedXMLAttributes  index
> IndexOnly .xml
> IndexContents   XML2    .xml
> 
> D:\SWISH-E2_4\test>type test.xml
> <stanza>
> <line>Winds</line>
> <line indent="3em"> steal warmth;</line>
> <line>Silence echoes</line>
> <line indent="3em">the thunder of one heart beating--but</line>
> </stanza>
> 
> D:\SWISH-E2_4\test>..\swish-e -c t -T indexed_words -v3
> Parsing config file 't'
> Indexing Data Source: "File-System"
> Indexing "D:\swish-e2_4\test"
> 
> Checking dir "D:/swish-e2_4/test"...
>   test.xml - Using XML2 parser -     Adding:[1:swishdefault(1)]   'winds'   
> Pos:5  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   '3em'   Pos:12  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'steal'   Pos:13  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'warmth'   Pos:14  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'silence'   Pos:15  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'echoes'   Pos:16  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   '3emthe'   Pos:21  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'thunder'   Pos:22  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'of'   Pos:23  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'one'   Pos:24  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'heart'   Pos:25  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'beating'   Pos:26  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'but'   Pos:27  Stuct:0x1 ( FILE )
> (13 words)
> 
> Removing very common words...
> no words removed.
> Writing main index...
> Sorting words ...
> Sorting 13 words alphabetically
> Writing header ...
> Writing index entries ...
>   Writing word text: Complete
>   Writing word hash: Complete
>   Writing word data: Complete
> 13 unique words indexed.
> 4 properties sorted.
> 1 file indexed.  175 total bytes.  13 total words.
> Elapsed time: 00:00:00 CPU time: 00:00:00
> Indexing done!
> 
> You can see, it indexed 3emthe
> 
> dave
> 
> _________________________________________________________________
> The new MSN 8: advanced junk mail protection and 2 months FREE* 
> http://join.msn.com/?page=features/junkmail
> 
> 

-- 
Bill Moseley
moseley@hank.org
Received on Fri Nov 14 01:33:11 2003