Skip to main content.
home | support | download

Back to List Archive

HTML2 problem

From: Rich Thomas <thomasr(at)not-real.buffalo.edu>
Date: Tue Feb 05 2002 - 19:06:50 GMT
I seem to have some bad HTML but I can;t figure out what the problem is.

Results:

# /usr/local/bin/swish-e -c swish.config -v 9
Indexing Data Source: "File-System"
Indexing "/ublin/E/E/"

Checking dir "/ublin/E/E/"...

In dir "/ublin/E/E/A/":

In dir "/ublin/E/E/A/0/":
  000.html - Using HTML2 parser -  (170 words)
  001.html - Using HTML2 parser -  (122 words)
  002.html - Using HTML2 parser -  (148 words)
  003.html
It just sits at this spot for ever....

My swish.config:

# cat swish.config
IndexDir /ublin/E/E/
StoreDescription HTML2 <body> 100000
DefaultContents HTML2
IndexOnly .html
ReplaceRules replace "/ublin/" "http://ublin.lib.buffalo.edu/webcat/bibcat/"
IgnoreWords english.txt

My command line:

/usr/local/bin/swish-e -c swish.config -v 9

When I change HTML2 to HTML it works.

I've tried using the -T debug options but can't seem to see any errors.

I compiled swish-e with  libxml2  (finally using the right one!)

The URL to the file is:
ublin.lib.buffalo.edu/webcat/bibcat/E/E/A/0/003.html

Any help would be appreciated.

Rich







*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Tue Feb 5 19:07:23 2002