Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Using ExtractPath to Exclude Some Subdirectory from Search Result

From: Peter Karman <peter(at)>
Date: Sat Sep 19 2009 - 03:07:23 GMT
Ronny Rahardjo wrote on 9/18/09 5:48 PM:
> Hi Peter,
> Please ignore my question no.1. I was able to figure out which
> it is called. However, could you please let me know how can I check
> whether my is using I found
> in the same folder as swish.config, but I don't see any reference in the

try putting a:

 die "yes, you are using me!";

statement at the top of and then run the

However, this line in the config you posted here:

SwishProgParameters default

suggests that you are using the default config, not your file.

> And secondly, how can I exclude "a href=#tab" link in

I'm think will ignore a link like '#tab' since that's just a
self-referential link. Example:

[karpet@pekmac:~/Sites]$ SPIDER_DEBUG=url,links default
/Users/karpet/bin/ Reading parameters from 'default'

 -- Starting to spider: http://localhost/~karpet/tab.html --
>> +Fetched 0 Cnt: 1 GET  http://localhost/~karpet/tab.html  200 OK text/html
141 parent: depth:0

Extracting links from http://localhost/~karpet/tab.html:

Looking at extracted tag '<a href="#tab">'
  tag did not include any links to follow or is a duplicate
Path-Name: http://localhost/~karpet/tab.html
Content-Length: 141
Last-Mtime: 1253329219
Document-Type: html*

  <title>test doc</title>

  foo bar <a href="#tab">nothing to see here</a> and more here


Summary for: http://localhost/~karpet/tab.html
Connection: Close:   1  (1.0/sec)
       Duplicates:   1  (1.0/sec)
      Total Bytes: 141  (141.0/sec)
       Total Docs:   1  (1.0/sec)
      Unique URLs:   1  (1.0/sec)
        text/html:   1  (1.0/sec)

So I think you need to run with your config against a test document
and see what kind of output you get. Turn on the debugging options like I
suggested. Ultimately, you're the only one who is going to discover the answer
to your problem. I'm just suggesting approaches to try.

Peter Karman  .  .  peter(at)
Users mailing list
Received on Fri Sep 18 23:07:23 2009