Re: Using a translated link for the 'found' hyperlink, rather than

From: J. David Boyd <david(at)>
Date: Tue Nov 22 2005 - 15:00:27 GMT
Bill Moseley wrote:
> On Mon, Nov 21, 2005 at 08:02:32AM -0800, J. David Boyd wrote:
>>Is it possible to hook into the index generation code of swish-e, and
>>insert my own translation code, such that when the indexer sees
>>S2171_TABLE.pdf, I can look in my translation table, and stuff the value
>>'Module 72 Tables' to be displayed in the hyperlink of the search
>>result?  Of course, the hyperlink still has to point to the original file.
> Easily.  It should already do that, but maybe you don't have titles in
> your pdf docs.
> Anyway, in SWISH::Filters::Pdf2HTML (assuming that's what you are
> using just set the title:
>     $title ||= lookup_title( $file_name );

I've been looking through SWISH::Filters::Pdf2HTML, and I just realize
more and more all the time that I'm no Perl expert.

I don't see any code that looks like what you have there.  I see code in
sub filter() that sets a title.  Do I monkey around in there?  That
looks, to me, like a good way to break something.

Now, as an alternative, I find that I can actually set a title in my PDF
file, using pdftk.  It's kind of convoluted, but it works okay.

I see that Pdf2HTML mentions that it can store the title, but it doesn't
work by default.  (By which I mean that I have manually set some titles
in my PDF files, run the index, perform a search, and it shows the file
name as the hyperlink, rather than the PDF file's internal title)

You may pass into SWISH::Filter's new method a tag to use as the html
<title> if found in the PDF info tags:

    my %user_data;
    $user_data{pdf}{title_tag} = 'title';

    $was_filtered = $filter->filter(
        document  => $filename,
        user_data => \%user_data,

Then if a PDF info tag of "title" is found that will be used as the HTML

Does this mean that if I copy the actual code (skipping comments, of
course), from the above quoted section, and place it into the sub new()
function, that I will be adding in the ability to read the titles?  If
so, where do I put it?  Before the return statement obviously (even to
me), but does it go inside of bless(), before it, after it?

Am I even close here?
