Skip to main content.
home | support | download

Back to List Archive

RE: swish-e 2.4.3 windows 2003 iis success!

From: Revillini, James <JRevillini(at)not-real.txcc.commnet.edu>
Date: Thu Jun 23 2005 - 14:04:16 GMT
> -----Original Message-----
> From: Bill Moseley [mailto:moseley@hank.org]
> Sent: Wednesday, June 22, 2005 5:46 PM
> To: Revillini, James
> Cc: Multiple recipients of list
> Subject: Re: swish-e 2.4.3 windows 2003 iis success!
> 
> On Wed, Jun 22, 2005 at 05:05:53PM -0400, Revillini, James wrote:
> > RTF's are killing it now.  As soon as it runs into one, the output
file
> > from dirtree.pl goes like this:
> 
> By the way, this is all in the docs, but here's a quick executive
> summary:
> 

Thanks - I just figured out how to use -man.  I played with linux
command line a bit in the past but I forgot everything.  It's coming
back now.  I read the entire documentation for dirtree.pl and it was
very helpful.  Thanks for summarizing here.

> DirTree.pl finds files and then passes the file name to SWISH::Filter
> module.
> 
> SWISH::Filter uses MIME::Types to lookup the mime type of the file.
> Then all the available SWISH::Filter modules are scanned for a regular
> expression that matches the file's mime type.  When found that filter
> is used and the filter changes the content type to something else
> (like text/plain or text/html).
> 
> The individual filters normally need helper programs, like catdoc, to
> be installed before they will work.  The swish distribution on windows
> includes catdoc, IIRC.
> 
> When SWISH::Filter is done DirTree.pl then skips any files that are
> "binary", which only means they are not of some kind of text/* type.
> Really, it should only not skip if text/xml, text/plain, or text/html
> as that's all swish can index.  After all there's a lot of other text
> types:
> 
>     $ fgrep 'text/' /etc/mime.types | wc -l
>     62
> 
> You might want to add that test into DirTree.pl -- check for only
> those three mime types:
> 
>     unless ( $doc->content_type =~ m!^text/(?:plain|xml|html)$/ ) {
>         warn "Can't index $path because it's " . $doc->content_type .
> "\n";
>         return;
>     }
> 
> Anyway, that's how it all works.

Ok, thanks for the advice; we'll see how it goes.

> 
> 
> 
> 
> --
> Bill Moseley
> moseley@hank.org
> 
> Unsubscribe from or help with the swish-e list:
>    http://swish-e.org/Discussion/
> 
> Help with Swish-e:
>    http://swish-e.org/current/docs
>    swish-e@sunsite.berkeley.edu
Received on Thu Jun 23 07:04:24 2005