Hello,
I am trying to index (with swish-e 2.2.3) a number of files that have multiple consecutive spaces in their names. EncodeProperty() replaces multiple spaces with just one, so that the value of "swishdocpath" in the index is incorrect. For example, the following test debug output from '-T REGEX PROPERTIES':
Indexing Data Source: "External-Program"
Indexing "./rfc-index.pl"
FStest/Vantedge Pegasus Release FS.doc
Original String: 'FStest/Vantedge Pegasus Release FS.doc'
replace FStest/Vantedge Pegasus Release FS.doc =~ m[^[^/]+][]: Matched
Result String: '/Vantedge Pegasus Release FS.doc'
- Using TXT parser -
Original String: 'FStest/Vantedge Pegasus Release FS.doc'
replace FStest/Vantedge Pegasus Release FS.doc =~ m[^([^/]+)/.*$][$1]: Matched
Result String: 'FStest'
(6199 words)
swishdocpath: 6 ( 32) S: "/Vantedge Pegasus Release FS.doc"
swishdocsize: 8 ( 4) N: "0000000091483"
swishlastmodified: 9 ( 4) D: "2003-07-09 09:12:22"
category:10 ( 6) S: "FStest"
Removing very common words...
no words removed.
Writing main index...
I suspect the "if" statement at line 731 in docprop.c is incorrect:
if ( (int)((unsigned char)*source) <= (int)' ' )
really should be:
if ( (int)((unsigned char)*source) < (int)' ' )
Comments anyone?
Jim Cunning
Received on Wed Jul 9 18:04:12 2003