Skip to main content.
home | support | download

Back to List Archive

(no subject)

From: Jones, David H <david.h.jones(at)not-real.boeing.com>
Date: Mon Oct 23 2006 - 15:44:58 GMT
Bill or others,

Thanks for the suggestion; I spent some time looking at spider.pl and
various config files, but have seen the light yet.

Can you be more explicit about how to tell the spided to index the
image/svg+xml or .svg content type?  In the latest release of Swish-e I
don't see any example of SwishSpiderConfig.pl or some similar config
file.  As I look down the options in:
 http://swish-e.org/docs/spider.html#configuration_options

I see the use_default_config, which shows the config for the "default"
mode:

    @servers =3D (
    {
        email               =3D> =
'swish@user.failed.to.set.email.invalid',
        link_tags           =3D> [qw/ a frame /],
        keep_alive          =3D> 1,
        test_url            =3D> sub {  $_[0]->path !~
/\.(?:gif|jpeg|png)$/i },
        test_response       =3D> $response_sub,
        use_head_requests   =3D> 1,  # Due to the response sub
        filter_content      =3D> $filter_sub,
    } );

It's not clear to me what to change in this to get the spider to process
svg.

Can you be more explicit.  For testing purposes I've included a .svg
file.

Thanks again.

Dave Jones


-----Original Message-----
From: Bill Moseley [mailto:moseley@hank.org]=20
Sent: Saturday, October 21, 2006 11:35 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Problems indexing svg files?


On Sat, Oct 21, 2006 at 10:59:42PM -0700, Jones, David H wrote:
>=20
> > SwishE is skipping .svg files with mime type  image/svg+xml , giving

> > the message:
> >=20
> > Now fetching=20
> > [http://e352837.nw.nos.boeing.com:8080/hyperslate/IndexTest/pub.svg]
> > ..
> > .Status: 200. image/svg+xml
> > Skipping
> > http://e352837.nw.nos.boeing.com:8080/hyperslate/IndexTest/pub.svg:
> > Wrong content type: image/svg+xml.

You have to tell the spider to index that content type.  It won't be
default.

> > My config file says the following:
> >=20
> > IndexOnly .htm .html .svg

I think IndexOnly is only when indexing the file system.


> > <g id=3D"node2" class=3D"node"><title>A Separate Peace</title></g>
> >=20
> > I haven't found many examples of indexing XML.  Would it be:
> >=20
> > MetaNames g.title
> > Or
> > MetaNames title

I suspect MetaNames title

--=20
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list:=20
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu




*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Mon Oct 23 08:45:06 2006