Skip to main content.
home | support | download

Back to List Archive

RE: EncodeProperty breaks swishdocpaths with multiple spaces

From: <jcunning(at)not-real.coppermountain.com>
Date: Wed Jul 09 2003 - 19:00:03 GMT
-----Original Message-----
From: Jim Cunning 
Sent: Wednesday, July 09, 2003 11:04 AM
To: Multiple recipients of list
Subject: [SWISH-E] EncodeProperty breaks swishdocpaths with multiple
spaces


Hello,
I am trying to index (with swish-e 2.2.3) a number of files that have multiple consecutive spaces in their names. EncodeProperty() replaces multiple spaces with just one, so that the value of "swishdocpath" in the index is incorrect.  For example, the following test debug output from '-T REGEX PROPERTIES':

Indexing Data Source: "External-Program"
Indexing "./rfc-index.pl"
FStest/Vantedge Pegasus  Release FS.doc
Original String: 'FStest/Vantedge Pegasus  Release FS.doc'
replace FStest/Vantedge Pegasus  Release FS.doc =~ m[^[^/]+][]: Matched
  Result String: '/Vantedge Pegasus  Release FS.doc'
 - Using TXT parser -
Original String: 'FStest/Vantedge Pegasus  Release FS.doc'
replace FStest/Vantedge Pegasus  Release FS.doc =~ m[^([^/]+)/.*$][$1]: Matched
  Result String: 'FStest'
 (6199 words)
          swishdocpath: 6 ( 32) S: "/Vantedge Pegasus Release FS.doc"
          swishdocsize: 8 (  4) N: "0000000091483"
     swishlastmodified: 9 (  4) D: "2003-07-09 09:12:22"
              category:10 (  6) S: "FStest"

Removing very common words...
no words removed.
Writing main index...

I suspect the "if" statement at line 731 in docprop.c is incorrect:

            if ( (int)((unsigned char)*source) <= (int)' ' )

really should be:

            if ( (int)((unsigned char)*source) < (int)' ' )

Comments anyone?

Jim Cunning

----------------------------------
As a follow-up to my original post, I made the change suggested above to docprop.c, rebuilt and reindexed my document tree.  Things worked fine, so far as my application is concerned, but I am not certain whether the change broke things for other people and applications.

At least I'm past my original stumbling block.

Jim
Received on Wed Jul 9 19:01:57 2003