On Tue, Jul 01, 2003 at 11:37:12AM -0700, Ken-Yu Lin wrote:
> Using swish-e-2.2.3 (make with libxml2.so.2.5.7) on a Sun machine.
>
> Whenever I try to index this URL
> (http://groups.yahoo.com/group/SB-r-us/message/79), I get
> segmentation fault (core dumped).
Sorry, I can't duplicate.
> But with other websites, swish-e works just fine.
>
> BTW, I didn't use any special setting.
What un-special things did you use? Can you provide enough details to
reproduce your problem, or do I have to guess? ;)
moseley@bumby:~$ cat spider.config
@servers = (
{
base_url =>
'http://groups.yahoo.com/group/SB-r-us/message/79',
agent => 'swish-e spider http://swish-e.org/',
email => 'spider@hank.org',
max_indexed => 1,
},
);
1;
moseley@bumby:~$ /usr/local/lib/swish-e/spider.pl spider.config > test.html
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'spider.config'
/usr/local/lib/swish-e/spider.pl: Max indexed files Reached
Summary for: http://groups.yahoo.com/group/SB-r-us/message/79
Duplicates: 2 (0.2/sec)
Off-site links: 7 (0.6/sec)
Total Bytes: 3,495 (291.2/sec)
Total Docs: 1 (0.1/sec)
Unique URLs: 3 (0.2/sec)
moseley@bumby:~$ head test.html
Path-Name: http://groups.yahoo.com/group/SB-r-us/auth?check=G&done=%2Fgroup%2FSB-r-us%2Fmessage%2F79
Content-Length: 3495
Document-Type: html*
<HTML>
<HEAD>
moseley@bumby:~$ cat test.html | swish-e -S prog -i stdin
Indexing Data Source: "External-Program"
Indexing "stdin"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 58 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
58 unique words indexed.
4 properties sorted.
1 file indexed. 3495 total bytes. 78 total words.
--
Bill Moseley
moseley@hank.org
Received on Tue Jul 1 20:01:44 2003