I am trying to index a wiki page that contains links to other wiki pages
without extensions.
For example one of the pages could be http://internal_site/Page_With_Text
I have read through several of the FAQs and threads but have not been able
to
find anything on this topic. I have no trouble indexing PDFs, DOCs, TXT,
HTML,
etc, and everything works GREAT! I would just like to index these pages
without
extensions.
I am using the "prog" method by running:
swish-e -S prog -c swish.conf
My swish.conf looks like:
# Example for spidering
# Use the "spider.pl" program included with Swish-e
IndexDir spider.pl
#Path to filters
FilterDir /tool/bin/
# Define what sites to index. Just add to the bottom of this
SwishProgParameters default http://Internal_Site/WegPage1 =20
\
=20
http://Internal_Site/WebPage2
\
=20
http://Internal_Site/WebPage3
# ? DefaultContents HTML2
IndexContents HTML* .htm .html .shtml .pdf .doc .ppt .xls
StoreDescription HTML* <body> 300
# Look at PDFs
#FileFilter .pdf /tool/bin/pdftotext "'%p' -"
#Break the word up into stemed words
FuzzyIndexingMode Stemming_en
# Show ALL info while indexing
IndexReport 3
#compress
CompressPositions yes
Whenever I run swish-e it correclty indexes all of the PDFs, etc..etc...but
not the internal wiki sites (without extensions)
but rather says there are no unique words to index.
I am also not sure if the 'CompressPositions yes' will compress the index
files or not.
Any help would be greatly appreciated. Swish-e has been invaluable in
indexing our tech documents, and I would
love to have it index these wiki pages where most of our documents exists.
Thanks again!
Dennis
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Mon Feb 6 19:27:15 2006