Skip to main content.
home | support | download

Back to List Archive

Re: Spidering phpBB

From: Shaffer, Chris <Chris.Shaffer(at)not-real.BellSouth.COM>
Date: Tue Aug 31 2004 - 20:05:33 GMT
Because phpBB search engine, in my opinion is really crummy...  For
those who don't know, here's how phpBBs search 'engine' works:

1.) when a post is made, all common language words are stripped out, as
well as all words that have already been indexed.
2.) what's left is put into a table, one record for each new word
3.) An intersection table is then updated with crosses for all
non-common words and the post number

That makes phrase and context searching impossible.

And yes, I'd love to be able to search our intranet site (including
forums) from one form...  That would be sweet...

As far as my problem crawling the forums...  I think I know what is
going one...  The session_id is changing occasionally, causing it to go
in circles...  Is there any way I can filter out something matching
'sid=....' from the end of the path before decides whether or
not its crawled it yet?

Chris Shaffer
Application Developer, BSTCAD/BSTProcess
BSTCAD Support Forums
(404) 927-1227

-----Original Message-----
[] On Behalf Of Greg Fenton
Sent: Tuesday, August 31, 2004 2:27 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Spidering phpBB

--- "Shaffer, Chris" <> wrote:
> The problem is, the page content changes constantly, due to a 'Members
> Online' section and a time stamp.  Also, you are correct that there
> are
> many different ways to access the same data...

BTW: why are you trying to indes a phpBB2 forum?  It has its own Search

There might be something you could do at the PHP-level to tie phpBB2's
search results with those from SWISH-E (I am assuming you are trying to
have searches for both your website and the forums in one location??).


Greg Fenton

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
Received on Tue Aug 31 13:06:41 2004