Skip to main content.
home | support | download

Back to List Archive

(no subject)

From: SMALL, SHERIDAN <S.SMALL(at)not-real.staff.covcollege.ac.uk>
Date: Thu May 18 2006 - 14:17:45 GMT
Hi,

=20

I must not have explained the problem clearly.

=20

I want to be able to search by swishdefault and swishtitle and then
limit the search using select_by_meta to limit the results to files in
my /courses/ directory or /news/ directory but not include other files
with courses or news in them.

=20

I have also tried=20

meta_groups =3D> {

            site =3D>  [qw/courses News/],

        },

But I but this will not work as my select_by_meta metaname

=20

Here are swish.conf & a swishcgi.conf files which give me everything
except for limit a search to the News directory.

=20

swish.conf

=20

# Use the "spider.pl" program included with Swish-e

IndexDir spider.pl

IndexFile /var/www/search/index.swish-e

# Define what site to index

SwishProgParameters /var/www/search/swishspider.conf

IndexContents HTML* .htm .html .shtml .php

IndexContents TXT* .pdf .doc .ppt .xls

UndefinedMetaTags index

MetaNames swishdocpath swishtitle News courses

ExtractPath News regex
!^http://nymphswww.covcollege.ac.uk/sections/([^/]+)/.*$!$1!=20

ExtractPath courses regex
!^http://nymphswww.covcollege.ac.uk/([^/]+)/.*$!$1!

StoreDescription HTML* <body> 10000

StoreDescription TXT* 10000

=20

swishcgi.conf=20

=20

return {

        title        =3D> 'Search Website',

      page_size =3D> 10,

      sorts =3D> [qw/swishrank swishtitle /],

        swish_binary =3D> '/usr/local/bin/swish-e',

        swish_index  =3D> '/var/www/search/index.swish-e',

      metanames       =3D> [qw/ swishdefault swishtitle /],

      name_labels =3D> {

            swishdefault        =3D> 'Search All',

            swishtitle          =3D> 'Page Title',

            swishrank         =3D> 'Rank',

            swishlastmodified         =3D> 'Modified',

        },

select_by_meta  =3D> {

            #method      =3D> 'radio_group',  # pick: radio_group,
popup_menu, or checkbox_group

            method      =3D> 'checkbox_group',

            #method      =3D> 'popup_menu',

            columns     =3D> 1,

            metaname    =3D> 'courses',     # Can't be a metaname used
elsewhere!

            values      =3D> [qw/ courses /],

            labels  =3D> {

               courses    =3D> 'Only search courses: ',

=20

              # News    =3D> 'News',

            },

            description =3D> '',

        },

      template =3D> {

            package         =3D> 'SWISH::TemplateToolkit',

            file            =3D> 'swish.tt',

            options         =3D> {

                INCLUDE_PATH    =3D> '/var/www/search',

                #PRE_PROCESS     =3D> 'config',

            },

        },

    }

=20

=20

Regards,

Sheridan=20

=20

-----Original Message-----
From: Peter Karman [mailto:peter@peknet.com]=20
Sent: 18 May 2006 14:48
To: SMALL, SHERIDAN
Cc: Multiple recipients of list
Subject: Re: [SWISH-E]

=20

If you are trying to limit ALL searches to docs with specified paths,=20

better to not put anything else in your index in the first place. See=20

the filter_content() callback in spider.pl docs.

=20

SMALL, SHERIDAN scribbled on 5/18/06 5:59 AM:

Hi,

=20

I am having some trouble using the select_by_meta directive in the
swishcgi.conf file I have also tried using variations of ExtractPath
with no sucsess.=20

=20

The problem is that I want to limit the search to paths with values
"/courses/" and "/news/" not "courses" or "news".

=20

The reason is that I have URLs like the following:-

=20

http://nymphswww.covcollege.ac.uk/email.php?title=3DA/AS%20levels%20-%20P=
a
rt%20Time%20Courses

=20

http://nymphswww.covcollege.ac.uk/courses/template.php?debug=3D1&cat=3Dnu=
ll&
code=3DIDAAD1F1&idx=3Dnull&title=3DBTEC%20Introductory%20Diploma%20in%20A=
rt%20
%20and%20Design

=20

I do not want the first included only the second including /courses/.

=20

And

=20

http://nymphswww.covcollege.ac.uk/email.php?title=3DNews%20item%20-%20A%2=
0
Slice%20of%20Success

=20

http://nymphswww.covcollege.ac.uk/sections/News/index.php?id=3D227&title=3D=
A
%20Slice%20of%20Success

=20

Note the /News/ which I want is preceeded by /sections/.

I have tried using=20

=20

select_by_meta  =3D> {

=20

            ***

=20

            metaname    =3D> 'swishdocpath',     # Can't be a metaname
used elsewhere!

=20

            values      =3D> [qw '/courses/', '/news/' ],

=20

            ***

=20

But the "/"s are ignored.

I am indexing over http using base_url    =3D>
'http://nymphswww.covcollege.ac.uk/index.php',

=20

I have tried=20

=20

ExtractPath News regex
!^http://nymphswww.covcollege.ac.uk/sections/([^/]+)/.*$!$1!=20

=20

ExtractPath site regex
!^http://nymphswww.covcollege.ac.uk/([^/]+)/.*$!$1!

=20

(Both work one at a time.)

=20

With both =20

=20

metaname    =3D> 'site', 'News',     # Can't be a metaname used =
elsewhere!

=20

Or else with both of these together.

=20

select_by_meta  =3D> {

=20

            ***

=20

            metaname    =3D> 'site',     # Can't be a metaname used
elsewhere!

=20

            values      =3D> [qw/ courses /],

=20

            ***

=20

select_by_meta  =3D> {

=20

            ****

=20

            metaname    =3D> 'News',     # Can't be a metaname used
elsewhere!

=20

            values      =3D> [qw/ News /],

=20

            ****

=20

But only the second gets displayed.

=20

And=20

=20

MetaNames swishdocpath swishtitle site

=20

ExtractPath site regex
!^http://nymphswww.covcollege.ac.uk/sections/([^/]+)/.*$!$1! ExtractPath
site regex !^http://nymphswww.covcollege.ac.uk/([^/]+)/.*$!$1!

=20

With

=20

select_by_meta  =3D> {

=20

            ***

=20

            metaname    =3D> 'site',     # Can't be a metaname used
elsewhere!

=20

            values      =3D> [qw/ courses News /],

=20

            ***

=20

Only the first regex gets used.

=20

Is there a way to get the search I want?

=20

Thanks in advance,

=20

Sheridan Small

______________________________

=20

Website Co-ordinator

=20

City College Coventry

Butts Centre

The Butts

COVENTRY

CV1 3GD

=20

024 7679 1540

______________________________

=20

>=20

>=20

>=20

>=20

> *********************************************************************

> Due to deletion of content types excluded from this list by policy,

> this multipart message was reduced to a single part, and from there

> to a plain text message.

> *********************************************************************

>=20

=20

--=20

Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com

=20




*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Thu May 18 07:17:45 2006