-----Original Message-----
From: Jim Cunning
Sent: Wednesday, July 09, 2003 11:04 AM
To: Multiple recipients of list
Subject: [SWISH-E] EncodeProperty breaks swishdocpaths with multiple
spaces
Hello,
I am trying to index (with swish-e 2.2.3) a number of files that have multiple consecutive spaces in their names. EncodeProperty() replaces multiple spaces with just one, so that the value of "swishdocpath" in the index is incorrect. For example, the following test debug output from '-T REGEX PROPERTIES':
Indexing Data Source: "External-Program"
Indexing "./rfc-index.pl"
FStest/Vantedge Pegasus Release FS.doc
Original String: 'FStest/Vantedge Pegasus Release FS.doc'
replace FStest/Vantedge Pegasus Release FS.doc =~ m[^[^/]+][]: Matched
Result String: '/Vantedge Pegasus Release FS.doc'
- Using TXT parser -
Original String: 'FStest/Vantedge Pegasus Release FS.doc'
replace FStest/Vantedge Pegasus Release FS.doc =~ m[^([^/]+)/.*$][$1]: Matched
Result String: 'FStest'
(6199 words)
swishdocpath: 6 ( 32) S: "/Vantedge Pegasus Release FS.doc"
swishdocsize: 8 ( 4) N: "0000000091483"
swishlastmodified: 9 ( 4) D: "2003-07-09 09:12:22"
category:10 ( 6) S: "FStest"
Removing very common words...
no words removed.
Writing main index...
I suspect the "if" statement at line 731 in docprop.c is incorrect:
if ( (int)((unsigned char)*source) <= (int)' ' )
really should be:
if ( (int)((unsigned char)*source) < (int)' ' )
Comments anyone?
Jim Cunning
----------------------------------
As a follow-up to my original post, I made the change suggested above to docprop.c, rebuilt and reindexed my document tree. Things worked fine, so far as my application is concerned, but I am not certain whether the change broke things for other people and applications.
At least I'm past my original stumbling block.
Jim
Received on Wed Jul 9 19:01:57 2003