Skip to main content.
home | support | download

Back to List Archive

EncodeProperty breaks swishdocpaths with multiple spaces

From: <jcunning(at)not-real.coppermountain.com>
Date: Wed Jul 09 2003 - 18:04:00 GMT
Hello,
I am trying to index (with swish-e 2.2.3) a number of files that have multiple consecutive spaces in their names. EncodeProperty() replaces multiple spaces with just one, so that the value of "swishdocpath" in the index is incorrect.  For example, the following test debug output from '-T REGEX PROPERTIES':

Indexing Data Source: "External-Program"
Indexing "./rfc-index.pl"
FStest/Vantedge Pegasus  Release FS.doc
Original String: 'FStest/Vantedge Pegasus  Release FS.doc'
replace FStest/Vantedge Pegasus  Release FS.doc =~ m[^[^/]+][]: Matched
  Result String: '/Vantedge Pegasus  Release FS.doc'
 - Using TXT parser -
Original String: 'FStest/Vantedge Pegasus  Release FS.doc'
replace FStest/Vantedge Pegasus  Release FS.doc =~ m[^([^/]+)/.*$][$1]: Matched
  Result String: 'FStest'
 (6199 words)
          swishdocpath: 6 ( 32) S: "/Vantedge Pegasus Release FS.doc"
          swishdocsize: 8 (  4) N: "0000000091483"
     swishlastmodified: 9 (  4) D: "2003-07-09 09:12:22"
              category:10 (  6) S: "FStest"

Removing very common words...
no words removed.
Writing main index...

I suspect the "if" statement at line 731 in docprop.c is incorrect:

            if ( (int)((unsigned char)*source) <= (int)' ' )

really should be:

            if ( (int)((unsigned char)*source) < (int)' ' )

Comments anyone?

Jim Cunning
Received on Wed Jul 9 18:04:12 2003