Skip to main content.
home | support | download

Back to List Archive

Query String Being Converted to HTML Entity

From: Jon Sorensen <jon(at)not-real.starkmedia.com>
Date: Tue Nov 23 2004 - 17:53:58 GMT
I have been trying to spider a site like so:

my %serverA =3D (
     base_url    =3D> =
'http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?id=
=3D183&use=3D&price=3D&psi=3D&order=3D1',
     keep_alive  =3D> 0,
     test_url    =3D> sub {
         my $uri =3D shift;
            if ($uri->path =3D~ /pressure_washer\.cfm/){
          return 1 ;}
         else {return 0;}
         },
    use_md5  =3D> 1,
    max_files   =3D> 30,   =20
);


@servers =3D ( \%serverA, );

#######################################

In the output, swish was getting hung up on "&psi=3D"  in the query =
string.
It was converting it to the character entity of the greek alphabet "Psi" =
(&psi;)
and getting caught in an infintite loop:

#######################################

Parsing config file =
'/www/search.starkmedia.com/cgi-bin/test/generac.cfg'
Indexing Data Source: "External-Program"
Indexing "spider.pl"
External Program found: /usr/local/lib/swish-e/spider.pl
/usr/local/lib/swish-e/spider.pl: Reading parameters from=20
'/www/search.starkmedia.com/cgi-bin/test/generac.spider.config'
http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?id=3D=
183&use=3D&price=3D&psi=3D&order=3D1=20
- Using HTML2 parser -  (221 words)
http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?ord=
er=3D2&id=3D214&use=3D&price=3D=CF=88=3D=20
- Using HTML2 parser -  (221 words)
http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?ord=
er=3D1&id=3D183&use=3D&price=3D=CF=88=3D=CF=88=3D=20
- Using HTML2 parser -  (221 words)
http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?ord=
er=3D3&id=3D154&use=3D&price=3D=CF=88=3D=CF=88=3D=20
- Using HTML2 parser -  (310 words)
http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?ord=
er=3D2&id=3D214&use=3D&price=3D=CF=88=3D=CF=88=3D=CF=88=3D=20
- Using HTML2 parser -  (221 words)
http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?ord=
er=3D4&id=3D138&use=3D&price=3D=CF=88=3D=CF=88=3D=CF=88=3D=20
- Using HTML2 parser -  (298 words)
http://www.generac-portables.com/pressure_washers/pressure_washer.cfm?ord=
er=3D1&id=3D183&use=3D&price=3D=CF=88=3D=CF=88=3D=CF=88=3D=CF=88=3D=20
- Using HTML2 parser -  (221 words)

######################################

Is this a bug in some part of swish-e ? I wouldn't think (but I'm not a =
programmer) it would convert this to an entitiy=20
since it's missing the semicolon at the end
and it's in a URL


*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Tue Nov 23 09:54:05 2004