Skip to main content.
home | support | download

Back to List Archive

Re: Indexing files without an extension

From: Bill Moseley <moseley(at)>
Date: Tue Feb 07 2006 - 03:39:49 GMT
On Mon, Feb 06, 2006 at 07:23:33PM -0800, dennis lastor wrote:
> I am trying to index a wiki page that contains links to other wiki
> pages without extensions.
> For example one of the pages could be
> http://internal_site/Page_With_Text

Should be no problem if you are using the spider.

> I have read through several of the FAQs and threads but have not
> been able to find anything on this topic.  I have no trouble
> indexing PDFs, DOCs, TXT, HTML, etc, and everything works GREAT!  I
> would just like to index these pages without extensions.

What's the problem?

> I am using the "prog" method by running:
> swish-e -S prog -c swish.conf
> My swish.conf looks like:
> # Example for spidering
> # Use the "" program included with Swish-e
> IndexDir
> #Path to filters
> FilterDir /tool/bin/

Don't need that.

> # Define what sites to index.  Just add to the bottom of this
> SwishProgParameters default http://Internal_Site/WegPage1            =20
>           \
>                                         =20
> http://Internal_Site/WebPage2
> \
>                                         =20
> http://Internal_Site/WebPage3

> # ? DefaultContents HTML2
> IndexContents HTML* .htm .html .shtml .pdf .doc .ppt .xls

Should not need that.  The spider tells swish what parser to use.

> Whenever I run swish-e it correclty indexes all of the PDFs, etc..etc...but
> not the internal wiki sites (without extensions)
> but rather says there are no unique words to index.

What happens if you point the spider at one of those wiki pages?
Turn on debbuging like is described in the docs.

> I am also not sure if the 'CompressPositions yes' will compress the index
> files or not.

Ignore that setting for now.

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:
Received on Mon Feb 6 19:39:50 2006