Hi,
If I index a directory as follows
-------------------- indexing start --------------------
swishlin:~/swishtest# swish-e -v 3 -c swish.conf
Parsing config file 'swish.conf'
Indexing Data Source: "File-System"
Indexing "files"
Checking dir "files"...
SDB-BS-01-Acidiol-480203.pdf - Using DEFAULT (HTML2) parser - (1307 words)
SDB-BS-02-Detergent.pdf - Using DEFAULT (HTML2) parser - (1011 words)
SDB-BS-03-Buffer Powder.pdf - Using DEFAULT (HTML2) parser - (1028 words)
SDB-BS-04-Enzym 150.pdf - Using DEFAULT (HTML2) parser - (1127 words)
SDB-BS-05-Ethidiumbromid-Farbstoffloesung.pdf - Using DEFAULT (HTML2) parser - (992 words)
SDB-BS-07-Kaliumsorbat-Granulat-105119.pdf - Using DEFAULT (HTML2) parser - (903 words)
SDB-BS-08-Ringertabletten-115525.pdf - Using DEFAULT (HTML2) parser - (774 words)
SDB-BS-09-Glycerin-2289.pdf - Using DEFAULT (HTML2) parser - (916 words)
SDB-BS-10-BCS.pdf - Using DEFAULT (HTML2) parser - (959 words)
SDB-BS-11-PCS.pdf - Using DEFAULT (HTML2) parser - (990 words)
SDB-BS-12-Rinse Concentrate.pdf - Using DEFAULT (HTML2) parser - (1299 words)
SDB-BS-13-Ammoniak-105428.pdf - Using DEFAULT (HTML2) parser - (1198 words)
SDB-BS-05-Ethidiumbromid.pdf - Using DEFAULT (HTML2) parser - (993 words)
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 2,054 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
2,054 unique words indexed.
4 properties sorted.
13 files indexed. 593,511 total bytes. 13,524 total words.
Elapsed time: 00:00:02 CPU time: 00:00:00
Indexing done!
-------------------- indexing end --------------------
and then search for files containing SBD-BS in file name, it does not list
these two files:
SDB-BS-05-Ethidiumbromid-Farbstoffloesung.pdf
SDB-BS-07-Kaliumsorbat-Granulat-105119.pdf
-------------------- searching start --------------------
swishlin:~/swishtest# swish-e -f index.swish-e -H 1 -w swishdocpath=sdb-bs*
# SWISH format: 2.4.3
# Search words: swishdocpath=sdb-bs*
# Removed stopwords:
# Number of hits: 11
# Search time: 0.002 seconds
# Run time: 0.053 seconds
1000 files/SDB-BS-01-Acidiol-480203.pdf "SDB-BS-01-Acidiol-480203.pdf" 16930
1000 files/SDB-BS-13-Ammoniak-105428.pdf "SDB-BS-13-Ammoniak-105428.pdf" 20078
1000 files/SDB-BS-12-Rinse Concentrate.pdf "SDB-BS-12-Rinse Concentrate.pdf" 49950
1000 files/SDB-BS-11-PCS.pdf "SDB-BS-11-PCS.pdf" 46071
1000 files/SDB-BS-10-BCS.pdf "SDB-BS-10-BCS.pdf" 42995
1000 files/SDB-BS-09-Glycerin-2289.pdf "SDB-BS-09-Glycerin-2289.pdf" 150080
1000 files/SDB-BS-08-Ringertabletten-115525.pdf "SDB-BS-08-Ringertabletten-115525.pdf" 15251
1000 files/SDB-BS-04-Enzym 150.pdf "SDB-BS-04-Enzym 150.pdf" 49771
1000 files/SDB-BS-03-Buffer Powder.pdf "SDB-BS-03-Buffer Powder.pdf" 47372
1000 files/SDB-BS-02-Detergent.pdf "SDB-BS-02-Detergent.pdf" 44943
1000 files/SDB-BS-05-Ethidiumbromid.pdf "SDB-BS-05-Ethidiumbromid.pdf" 47066
-------------------- searching end --------------------
If I shorten the name of the file to less than 41 characters the file shows up in
the search result...
-------------------- another run start --------------------
swishlin:~/swishtest# mv files/SDB-BS-07-Kaliumsorbat-Granulat-105119.pdf files/SDB-BS-07-Kaliumsorbat-Granulat-1051.pdf
swishlin:~/swishtest# swish-e -v 3 -c swish.conf
[....]
swishlin:~/swishtest# swish-e -f index.swish-e -H 1 -w swishdocpath=sdb-bs*
[....]
1000 files/SDB-BS-07-Kaliumsorbat-Granulat-1051.pdf "SDB-BS-07-Kaliumsorbat-Granulat-1051.pdf" 15938
-------------------- another run end --------------------
Is this a bug or am I missing some settings / parameters?
Thanks in advance
Sascha
My swish.conf:
IndexDir "files"
IndexOnly .doc .xls .pdf .txt .ppt .chm
FileRules filename contains ^~\$
MetaNames swishdocpath
WordCharacters abcdefghijklmnopqrstuvwxyzÄÖÜäöüß0123456789.-
FileFilter .pdf /root/swish/pdf2html.php
PropertyNamesMaxLength 1000 swishdocpath
Received on Thu Jul 20 05:14:41 2006