If you are trying to limit ALL searches to docs with specified paths,
better to not put anything else in your index in the first place. See
the filter_content() callback in spider.pl docs.
SMALL, SHERIDAN scribbled on 5/18/06 5:59 AM:
> Hi,
>
> =20
>
> I am having some trouble using the select_by_meta directive in the
> swishcgi.conf file I have also tried using variations of ExtractPath
> with no sucsess.=20
>
> =20
>
> The problem is that I want to limit the search to paths with
> values"/courses/" and "/news/" not "courses" or "news".
>
> The reason is that I have URLs like the following:-
>
> =20
>
> http://nymphswww.covcollege.ac.uk/email.php?title=3DA/AS%20levels%20-%20P=
> a
> rt%20Time%20Courses
>
> =20
>
> http://nymphswww.covcollege.ac.uk/courses/template.php?debug=3D1&cat=3Dnu=
> ll&
> code=3DIDAAD1F1&idx=3Dnull&title=3DBTEC%20Introductory%20Diploma%20in%20A=
> rt%20
> %20and%20Design
>
> =20
>
> I do not want the first included only the second including /courses/.
>
> =20
>
> And
>
> http://nymphswww.covcollege.ac.uk/email.php?title=3DNews%20item%20-%20A%2=
> 0
> Slice%20of%20Success
>
> =20
>
> http://nymphswww.covcollege.ac.uk/sections/News/index.php?id=3D227&title=3D=
> A
> %20Slice%20of%20Success
>
> =20
>
> Note the /News/ which I want is preceeded by /sections/.
>
> =20
>
> I have tried using=20
>
> =20
>
> select_by_meta =3D> {
>
> ***
>
> metaname =3D> 'swishdocpath', # Can't be a metaname
> used elsewhere!
>
> values =3D> [qw '/courses/', '/news/' ],
>
> ***
>
> But the "/"s are ignored.
>
> =20
>
> I am indexing over http using base_url =3D>
> 'http://nymphswww.covcollege.ac.uk/index.php',
>
> I have tried=20
>
> =20
>
> ExtractPath News regex
> !^http://nymphswww.covcollege.ac.uk/sections/([^/]+)/.*$!$1!=20
>
> ExtractPath site regex
> !^http://nymphswww.covcollege.ac.uk/([^/]+)/.*$!$1!
>
> =20
>
> (Both work one at a time.)
>
> =20
>
> With both =20
>
> =20
>
> metaname =3D> 'site', 'News', # Can't be a metaname used =
> elsewhere!
>
> =20
>
> Or else with both of these together.
>
> =20
>
> select_by_meta =3D> {
>
> ***
>
> metaname =3D> 'site', # Can't be a metaname used
> elsewhere!
>
> values =3D> [qw/ courses /],
>
> ***
>
> =20
>
> select_by_meta =3D> {
>
> ****
>
> metaname =3D> 'News', # Can't be a metaname used
> elsewhere!
>
> values =3D> [qw/ News /],
>
> ****
>
> But only the second gets displayed.
>
> =20
>
> And=20
>
> =20
>
> MetaNames swishdocpath swishtitle site
>
> ExtractPath site regex
> !^http://nymphswww.covcollege.ac.uk/sections/([^/]+)/.*$!$1! ExtractPath
> site regex !^http://nymphswww.covcollege.ac.uk/([^/]+)/.*$!$1!
>
> =20
>
> With
>
> select_by_meta =3D> {
>
> ***
>
> metaname =3D> 'site', # Can't be a metaname used
> elsewhere!
>
> values =3D> [qw/ courses News /],
>
> ***
>
> =20
>
> Only the first regex gets used.
>
> =20
>
> Is there a way to get the search I want?
>
> =20
>
> Thanks in advance,
>
> =20
>
> Sheridan Small
>
> =20
>
> ______________________________
>
> =20
>
> Website Co-ordinator
>
> City College Coventry
> Butts Centre
> The Butts
> COVENTRY
> CV1 3GD
> =20
> 024 7679 1540
>
> ______________________________
>
> =20
>
>
>
>
> *********************************************************************
> Due to deletion of content types excluded from this list by policy,
> this multipart message was reduced to a single part, and from there
> to a plain text message.
> *********************************************************************
>
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Thu May 18 06:48:44 2006