Skip to main content.
home | support | download

Back to List Archive

duplicate entries in DB after regex performed on URLs?

From: Chad Day <CDay(at)not-real.mindshare.net>
Date: Tue Dec 06 2005 - 20:39:32 GMT
I'm using spider.pl to index my Joomla! Site, and it's spidering it putting the PHP session variable (?PHPSESSID=askjhdskljashdk) on the end.  In the process of indexing the site, this tag changes a few times with a new session ID, so multiple copies of the same document get indexed.  Also, the link appears in the DB with said session variable in it.

I was able to modify my swish.conf file to remove the PHP session ID variables:

       ReplaceRules regex /\?PHPSESSID.*$//i
       ReplaceRules regex /&PHPSESSID.*$//i
       
but multiple entries still appear in the database for each document.  What am I doing wrong?

Thanks,
Chad Day
Developer
Mindshare Interactive Campaigns, LLC
202.654.0832 - www.mindshare.net 
Received on Tue Dec 6 12:39:33 2005