On Tue, Jul 01, 2003 at 01:49:23PM -0700, Ken-Yu Lin wrote:
> Hi! Bill, I use the config file below to index just one URL using -S
> http method.
>
> and get the following result:
>
> 0> /home/kenyulin/swish-e/swish-e -S http -c test.config
> Indexing Data Source: "HTTP-Crawler"
> Indexing "http://groups.yahoo.com/group/SB-r-us/message/79"
> Segmentation fault (core dumped)
You will be happier in the long run using the -S prog and spider.pl.
The problem seems to be this:
moseley(at)not-real.bumby:~/swish-e/src$ GET http://groups.yahoo.com/robots.txt
User-agent: *
Disallow:
So swish-e isn't dealing correctly with an empty value. Let me look....
Yuck, that's some ugly code in there. Your classic buffer overrun.
You might try this patch:
moseley@bumby:~/swish-e/src$ cvs diff httpserver.c
Index: httpserver.c
===================================================================
RCS file: /cvsroot/swishe/swish-e/src/httpserver.c,v
retrieving revision 1.14
diff -u -r1.14 httpserver.c
--- httpserver.c 28 Mar 2003 16:31:35 -0000 1.14
+++ httpserver.c 1 Jul 2003 21:59:55 -0000
@@ -364,13 +364,18 @@
static char *isolatevalue(char *line, char *keyword, int *plen)
{
- /* Find the beginning of the value
- **/
- for (line += strlen(keyword); isspace((int)((unsigned char)*line)); line++ ) { /* cast to int 2/22/00 */
+
+ /* Find the beginning of the value **/
+ for (line += strlen(keyword); *line && isspace((int)((unsigned char)*line)); line++ ) { /* cast to int 2/22/00 */
+ }
+
+ if ( !strlen(line) )
+ {
+ *plen = 0;
+ return line;
}
- /* Strip off trailing spaces
- **/
+ /* Strip off trailing spaces **/
for (*plen = strlen(line); isspace((int)((unsigned char)*(line + *plen - 1))); (*plen)--) { /* cast to int 2/22/00 */
}
--
Bill Moseley
moseley@hank.org
Received on Tue Jul 1 22:03:17 2003