Skip to main content.
home | support | download

Back to List Archive

Indexing files without an extension

From: dennis lastor <dennis.lastor(at)>
Date: Tue Feb 07 2006 - 03:27:11 GMT
I am trying to index a wiki page that contains links to other wiki pages
without extensions.

For example one of the pages could be http://internal_site/Page_With_Text

I have read through several of the FAQs and threads but have not been able
find anything on this topic.  I have no trouble indexing PDFs, DOCs, TXT,
etc, and everything works GREAT!  I would just like to index these pages

I am using the "prog" method by running:

swish-e -S prog -c swish.conf

My swish.conf looks like:

# Example for spidering
# Use the "" program included with Swish-e

#Path to filters
FilterDir /tool/bin/

# Define what sites to index.  Just add to the bottom of this

SwishProgParameters default http://Internal_Site/WegPage1            =20

# ? DefaultContents HTML2
IndexContents HTML* .htm .html .shtml .pdf .doc .ppt .xls
StoreDescription HTML* <body> 300

# Look at PDFs
#FileFilter .pdf /tool/bin/pdftotext   "'%p' -"

#Break the word up into stemed words
FuzzyIndexingMode Stemming_en

# Show ALL info while indexing
IndexReport 3

CompressPositions yes

Whenever I run swish-e it correclty indexes all of the PDFs, etc..etc...but
not the internal wiki sites (without extensions)
but rather says there are no unique words to index.

I am also not sure if the 'CompressPositions yes' will compress the index
files or not.

Any help would be greatly appreciated.  Swish-e has been invaluable in
indexing our tech documents, and I would
love to have it index these wiki pages where most of our documents exists.

Thanks again!

Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
Received on Mon Feb 6 19:27:15 2006