That fixed it!
Thanks. I'll see if I can dig something else up.
rgs,
Robert Keith
> -----Original Message-----
> From: swish-e@sunsite.berkeley.edu
> [mailto:swish-e@sunsite.berkeley.edu]On Behalf Of moseley@hank.org
> Sent: Tuesday, April 29, 2003 11:58 PM
> To: Multiple recipients of list
> Subject: [SWISH-E] Re: Problem on Parser with TXT/HTML and Spider.pl
>
>
> On Tue, Apr 29, 2003 at 03:30:23PM -0700, Robert Keith wrote:
> >
> > I am having a strange problem indexing a combination of MSWord,
> .txt and PHP
> > documents using spider.pl and feeding this into swish-e. If I
> index the PHP
> > urls first, the documents are parsed and loaded as HTML. If I
> select the
> > MSWord and other documents, which are filtered by the spider.pl filter
> > routines, the MSWord and other documents are parsed as TXT
> (correctly), then
> > when the subsequent PHP and HTML documents are parsed, they are
> parsed as
> > TXT. The SwishSpiderConfig.pl file contains two entries, the
> URL with the
> > MSWord links, and the URL with only PHP links.
>
> This is a better fix (I actually tried it this time!)
>
> --- extprog.c.old 2003-04-29 23:51:34.000000000 -0700
> +++ extprog.c 2003-04-29 23:52:04.000000000 -0700
> @@ -272,7 +272,10 @@
>
> /* Set the doc type from the header */
> if ( docType )
> + {
> fprop->doctype = docType;
> + docType = 0;
> + }
>
>
> /* set real_path, doctype, index_no_content, filter,
> stordesc
> */
>
>
> That error doesn't show up on the dev version because the doctype is
> set on all files instead of just the filtered ones.
>
> Sorry for the trouble.
>
Received on Wed Apr 30 08:22:53 2003