Skip to main content.
home | support | download

Back to List Archive

Re: Spidering phpBB

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Aug 31 2004 - 14:24:42 GMT
On Tue, Aug 31, 2004 at 07:14:42AM -0700, Shaffer, Chris wrote:
> I do...  How would swish know which page to go to, though?

Maybe it's not that easy.  From a *very* quick look it seem like
content is organized by topic:

http://www.phpbb.com/phpBB/viewtopic.php?t=134922

So, assuming there's a topic table that links to articles, maybe you
could index all the articles for a given topic under that topic id.
Then search results would point to the topic.

That's all guessing, but maybe something like that would work.

Otherwise, I'm not sure why spidering is looping.  Do you have a small
or test phpbb setup that you can test with?  The problem may be just
that there's too many different ways to access the same data -- or
just too many dynamically created links in general and it's taking too
much time to visit all of them.  You might just need to restrict what
type of URLs you will follow when spidering.  Like making sure there's
only a "t" parameter with a numeric value and ignore all the other
links.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Tue Aug 31 07:27:37 2004