> -----Original Message-----
> From: Bill Moseley [mailto:moseley@hank.org]
> Sent: Wednesday, June 22, 2005 5:46 PM
> To: Revillini, James
> Cc: Multiple recipients of list
> Subject: Re: swish-e 2.4.3 windows 2003 iis success!
>
> On Wed, Jun 22, 2005 at 05:05:53PM -0400, Revillini, James wrote:
> > RTF's are killing it now. As soon as it runs into one, the output
file
> > from dirtree.pl goes like this:
>
> By the way, this is all in the docs, but here's a quick executive
> summary:
>
Thanks - I just figured out how to use -man. I played with linux
command line a bit in the past but I forgot everything. It's coming
back now. I read the entire documentation for dirtree.pl and it was
very helpful. Thanks for summarizing here.
> DirTree.pl finds files and then passes the file name to SWISH::Filter
> module.
>
> SWISH::Filter uses MIME::Types to lookup the mime type of the file.
> Then all the available SWISH::Filter modules are scanned for a regular
> expression that matches the file's mime type. When found that filter
> is used and the filter changes the content type to something else
> (like text/plain or text/html).
>
> The individual filters normally need helper programs, like catdoc, to
> be installed before they will work. The swish distribution on windows
> includes catdoc, IIRC.
>
> When SWISH::Filter is done DirTree.pl then skips any files that are
> "binary", which only means they are not of some kind of text/* type.
> Really, it should only not skip if text/xml, text/plain, or text/html
> as that's all swish can index. After all there's a lot of other text
> types:
>
> $ fgrep 'text/' /etc/mime.types | wc -l
> 62
>
> You might want to add that test into DirTree.pl -- check for only
> those three mime types:
>
> unless ( $doc->content_type =~ m!^text/(?:plain|xml|html)$/ ) {
> warn "Can't index $path because it's " . $doc->content_type .
> "\n";
> return;
> }
>
> Anyway, that's how it all works.
Ok, thanks for the advice; we'll see how it goes.
>
>
>
>
> --
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
> http://swish-e.org/Discussion/
>
> Help with Swish-e:
> http://swish-e.org/current/docs
> swish-e@sunsite.berkeley.edu
Received on Thu Jun 23 07:04:24 2005