Skip to main content.
home | support | download

Back to List Archive

Re: Swish-e is cataloguing pages that aren't linked

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Feb 07 2002 - 16:23:19 GMT
At 07:58 AM 02/07/02 -0800, Chris Blackstone wrote:
>I have a page on my site and there are absolutely no links to it except 
>for a commented out link on 1 page.
>The page that isn't linked to is being returned in search results, 
>however.
>I'm also having other pages being returned in search results that aren't 
>linked to.

That's how swish stays ahead of the competition.  Swish indexes 20% more
files than its closest competitor, and uses less energy, too.

I call pages that are in the web directory tree, but not linked, "orphans".
 If you are indexing with the spider, I don't know how spider.pl could find
those orphans.

>Is this expected behavior? This happens with yesterday's swish-e daily.

If you are using a swish with libxml2 linked in, you should be able to
index with HTMLLinksMetaName. This will index HREFs and then you should be
able to find what pages link to your page.

The other thing would be to run spider.pl (without swish) and capture
STDERR to a file, and set DEBUG_URL debug option.  IIRC, DEBUG_URL will
print each URL, and its parent.  So you could grep for the page in question
and see it's parent.

Or is it possible you are still using an old index?




-- 
Bill Moseley
mailto:moseley@hank.org
Received on Thu Feb 7 16:24:08 2002