Skip to main content.
home | support | download

Back to List Archive

RE: Indexing non HTML files... (PDF, DOC, ...)

From: Ibon Aizpurua <epiola(at)not-real.jet.es>
Date: Mon May 10 1999 - 08:56:48 GMT
I'm interesed too with this patch, because in one
intranet or in Internet too you can have a lot of document
they aren't HTML and .txt. So I think it's an interesting
idea to put it in the WWW.
I'm waiting......
                                    Ibon Aizpurua
                            University of the Basque Country
> Hi!
>
> In August last year I wrote a message in this eMail-list
> that Ive done some enhancements which enable swish (1.1) to index
> non-HTML files like PDF or other documents types (filter option).
>
> Since then I got occasionally requests how to do this and where to
> get the source. Due to the requests I'm adapting the small enhancements
> to swish-e 1.3.2.
>
> If there is a public interest, I would try to get a small webspace
> to provide the source - instead of sending it via email on each request.
>
>
> ---
> To describe the changes to swhis in short:
> new config directives:
>      FilterDir   <path-to-filter-progs>
>      FileFilter  <file-ext> <filterprog>
>
> e.g.:
>      FilterDir   /usr/local/etc/httpd/sbin/filters
>      FileFilter  .pdf   pdf-filter.sh
>      FileFilter  .doc   ms-wword-filter.sh
>      FileFilter  .ps    ps-filter.sh
>      FileFilter  .gz    gzip-filter.sh
>
> e.g. pdf-filter.sh - script:
> ---
> #!/bin/sh
> # Convert file in arg1 to txt on stdout
> /usr/local/bin/pdftotext "$1" - 2>/dev/null
> ---
>
>
> Regards Rainer
>
Received on Mon May 10 01:55:16 1999