Skip to main content.
home | support | download

Back to List Archive

Ignoring session ids when distinguishing files as being different

From: Stefan Seiz <TalkLists(at)>
Date: Tue Feb 22 2005 - 11:30:49 GMT

i am new to Swish-e, coming from HtDig.

While evaluating swish-e, i discovered two show-stoppers for our enviroment.

Our Site is served dynamicaly ad the app-server includes sesison-ids in urls
which i can not turn off.

These session ids change and thus the swish-e will recognize
pages as being different, allthough they are in fact the same pages (just
the session-id changes).

Using htdig, i could work around this problem by one simple configuration
    url_rewrite_rules: (.*)&pb-id=.* \\1
    (where pb-id=XXXXX is my session id)

Is there anything similar in swish-e to make it ignore the session id when
it distinguishes between files being different.

2) Password protected PDF files.
All our PDFs are protected with the same password, so i can easily pass a
password to the command line options of pdftotext.

So i tried modifying
and tried to add "-opw MyPasswd" to the call to $self->run_pdftotext but
failed miserably. I tried many different variations of adding the -opw
option to pdftotext.

Can anyone help me out as how i need to add the -opw option to the call to

Stefan Seiz <>
Spamto: <>
Received on Tue Feb 22 03:30:51 2005