Skip to main content.
home | support | download

Back to List Archive

Re: period in meta name

From: Bill Moseley <moseley(at)>
Date: Fri Oct 04 2002 - 03:59:40 GMT
At 08:41 PM 10/03/02 -0700, Roy Tennant wrote:
>Sorry, I should have known better. And I realize from your answer that 
>I'm in big trouble. I have books that are contained entirely within 
>this tag:
><TEI.2 id="ark:/13030/ft2p30058m" bnum="bn5464">
>stuff here

I think you are in big trouble.

Look at this:

~/swish-e/src > cat c
defaultcontents XML2 .xml
UndefinedXMLAttributes auto

~/swish-e/src > cat 1.xml
<?xml version="1.0"?>
<TEI.2 id="ark:/13030/ft2p30058m" bnum="bn5464">
stuff here

~/swish-e/src > ./swish-e -c c -i 1.xml -T indexed_words  -v0
    Adding:[]   'ark'   Pos:4  Stuct:0x1 ( FILE )
    Adding:[]   '13030'   Pos:5  Stuct:0x1 ( FILE )
    Adding:[]   'ft2p30058m'   Pos:6  Stuct:0x1 ( FILE )
    Adding:[1:tei.2.bnum(11)]   'bn5464'   Pos:9  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'stuff'   Pos:13  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'here'   Pos:14  Stuct:0x1 ( FILE )

I'm not exactly sure why it's called UndefinedXMLAttributes.  But that
still indexes the content of the tag.  I mentioned this the other day, but
it would be nice if you could say:

   IgnoreMetaTags TEI.2

and avoid indexing that content -- but since the attributes are within that
tag they are all ignored.  Too many ways to parse xml, I fear.  Maybe we
can figure something better for the next release...

My suggestion would be use one of the CPAN XML parsers and pull out the
attribures you want indexed.

Bill Moseley
Received on Fri Oct 4 04:03:30 2002