Skip to main content.
home | support | download

Back to List Archive

(no subject)

From: Chad Day <CDay(at)not-real.mindshare.net>
Date: Wed Nov 30 2005 - 19:44:33 GMT
I'm trying to find a way to properly highlight the search words in the
description for a PDF file.

=20

If the only result returned is the PDF, it works perfectly.  Returns the
relevant part of the document, keyword highlighted.

=20

However, if other results are returned, a mix of HTML files and PDFs for
instance.. the HTML files return correct results, the PDFs match the key
term, but the description starts at the top of the document, not the
part that contains the keyword.

=20

My simple configuration file:

=20

# Tell Swish-e what to index (same as -i switch above)

IndexDir http://mysite.org/index.php

IndexFile /usr/local/apache/htdocs/my.index=20

IndexOnly .php .txt .html .htm .pdf .xml .htm .shtml

=20

# Index the PDF files

FileFilter .pdf /usr/X11R6/bin/pdftotext '"%p" -'

=20

# Tell Swish-e that .txt files are to use the text parser.

IndexContents TXT* .txt .pdf

IndexContents XML* .xml

IndexContents HTML* .htm .html .shtml .php

=20

PropertyNamesMaxLength 1000 swishdescription

PropertyNameAlias swishdescription body

=20

StoreDescription TXT* 1000

=20

# Otherwise, use the HTML parser

DefaultContents HTML*

=20

Delay 1

=20

=20

I'm not sure how I want to work the PropertyNames* stuff and the
StoreDescription to pick up the relevant parts of the PDF file, and
couldn't find anything about it on the list archives.

=20

Also, a quick thanks to everyone here for all the useful info out
there.. it got me to this point at least, indexing PDFs correctly,
highlighting results, etc.. much quicker than I would have on my own.

=20

Thanks!

Chad Day




*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Wed Nov 30 11:44:43 2005