Tian Xinchun wrote on 3/6/08 2:16 AM:
> Hi Bill,
>
> Thanks for your help, See below.
>
>> ------------------------------
>>
>> Message: 6
>> Date: Wed, 5 Mar 2008 06:11:42 -0800
>> From: Bill Moseley <moseley@hank.org>
>> Subject: Re: [swish-e] Searching remote mail archive problem
>> To: Swish-e Users Discussion List <users@lists.swish-e.org>
>> Message-ID: <20080305141142.GA6428@hank.org>
>> Content-Type: text/plain; charset=utf-8
>>
>> On Wed, Mar 05, 2008 at 08:03:06PM +0800, Tian Xinchun wrote:
>>> Hi Peter?
>>>
>>> I am sorry that I can not quite understand what you mean. Taking a example:
>>>
>>> $swish-e -c swish.conf -S prog
>>> Indexing Data Source: "External-Program"
>>> Indexing "spider.pl"
>>> External Program found: /usr/local/lib/swish-e/spider.pl
>>> /usr/local/lib/swish-e/spider.pl: Reading parameters from 'spider.conf'
>>> https://www.lbl.gov/lists.archives/theta13-eng.archive/:1: error:
>>> htmlParseStartTag: invalid element name
>>> <?xml version="1.0" encoding="ISO-8859-1"?>
>>> ^
>>> https://www.lbl.gov/lists.archives/theta13-eng.archive/:2: error: Misplaced
>>> DOCTYPE declaration
>>> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
>>> ^
>> You have two errors. That first one above is simply saying you are
>> trying to index an xml document with Libxml's *html* parser.
>> So you need to use the XML* parser type.
>>
>
> Actually, I have tried using XML*, but I still got the same error messages.
> Thanks for the information, and any plan on fixing it.
>
If you can provide us with a small, reproduce-able test case, then we can
attempt to fix the problem.
An example document and config file is all you should need to send.
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Mar 6 10:10:17 2008