Skip to main content.
home | support | download

Back to List Archive

Re: Configuration

From: Lars D. Noodén <lars(at)not-real.umich.edu>
Date: Thu Dec 15 2005 - 16:45:28 GMT
The filtering for PDF is the default, I have not changed that.  I have=20
added a filter for ODF.

'swish-filter-test -v ' works for both file formats, or at least two test=
=20
files.  Here are two excerpts from the out put of swish-filter-test:

 =09Document ./ODF/deliverable_1.pdf was  filtered.
 =09   Document:     ./ODF/deliverable_1.pdf  (./ODF/deliverable_1.pdf)
 =09   Content-Type: text/html
 =09   Parser type:  HTML*

  ...
 =09Document ./ODF/bio100_2.odt was  filtered.
 =09   Document:     ./ODF/bio100_2.odt  (./ODF/bio100_2.odt)
 =09   Content-Type: text/xml
 =09   Parser type:  XML*

According to the output from swish-e, the documents are being found:
 =09Checking dir "/Library/WebServer/ODF"...
 =09  bio100_1.odt - Using DEFAULT (HTML2) parser -  (1237 words)
 =09  bio100_2.odt - Using DEFAULT (HTML2) parser -  (1258 words)
 =09  deliverable_1.pdf - Using DEFAULT (HTML2) parser -  (100017 words)

Since the tests work, the problem seems to be with the config file, but=20
I'm not sure how to debug it.  The files (PDF and ODF) get processed by=20
the default filter, such as HTML2, but not run though either=20
SWISH::Filters::* module.

-Lars
Lars Nooden (lars@umich.edu)
 =09On the Internet, nobody knows you're a dog ...
 =09... until you start barking.

On Thu, 15 Dec 2005, Bill Moseley wrote:

> On Thu, Dec 15, 2005 at 11:08:38AM -0500, Lars D. Nood=E9n wrote:
>> Yes, I meant swish-filter-test.  That works.
>>
>> The filter is a SWISH::Filters::* filter and I want to make sure it gets
>> run.  It does not seem to be happening automatically, even for PDF.
>
> What's your code look like?  How does it compare with the other PDF
> module?
>
> Did you run swish-filter-test -v test.pdf and watch the output?
>
> Does it need a helper program?  Here you can see a filter looking for
> "pdftotext" and finding it:
>
>>> Loading filter: [SWISH/Filters/Pdf2HTML.pm]
> Filter: SWISH::Filters::Pdf2HTML=3DHASH(0x88b72f4): Find path of [pdftote=
xt] in /home/moseley/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/gam=
es:/usr/local/lib/swish-e
> Filter: SWISH::Filters::Pdf2HTML=3DHASH(0x88b72f4):   Not found at path [=
/home/moseley/bin/pdftotext]
> Filter: SWISH::Filters::Pdf2HTML=3DHASH(0x88b72f4):   Not found at path [=
/usr/local/bin/pdftotext]
> Filter: SWISH::Filters::Pdf2HTML=3DHASH(0x88b72f4):  * Found program at: =
[/usr/bin/pdftotext]
>
>
> And here's searching for a filter to handle application/pdf:
>
>>> Starting to process new document: application/pdf
> ++Checking filter [SWISH::Filters::Doc2txt=3DHASH(0x84322f4)] for applica=
tion/pdf
> ++Checking filter [SWISH::Filters::Doc2html=3DHASH(0x8440d40)] for applic=
ation/pdf
> ++Checking filter [SWISH::Filters::ID3toHTML=3DHASH(0x844c46c)] for appli=
cation/pdf
> ++Checking filter [SWISH::Filters::XLtoHTML=3DHASH(0x8351ea0)] for applic=
ation/pdf
> ++Checking filter [SWISH::Filters::Pdf2HTML=3DHASH(0x88b72f4)] for applic=
ation/pdf
> ++ application/pdf *WAS* filtered by SWISH::Filters::Pdf2HTML=3DHASH(0x88=
b72f4)
>
>
>
>>>> e.g. =09/\.pdf$/ to Pdf2HTML.pm
>>>> =09/\.od[tspmhgcif]$/ to ODF2xml.pm
>
> Those are checking for the file name.  The filters work by content
> type.
>
>
>


*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Thu Dec 15 08:45:29 2005