Skip to main content.
home | support | download

Back to List Archive

Re: Fun with ? _ in ExtractPath

From: Bill Conlon <bill(at)not-real.tothept.com>
Date: Thu Oct 28 2004 - 21:30:14 GMT
Well, this has perplexed me for about 4 hours.  Briefly, your regex 
didn't work for me.

I had been running swish-e-2.5.1-2004-06-24, when I posted my question.

I've used a succession of ExtractPath statements, allowing me to strip 
the property down to '?999'.  When I try to remove '?':

ExtractPath uid1 remove /?

I get a segmentation fault -- ./swish-index: line 5:  1503 Segmentation 
fault

I haven't had any luck quoting or unquoting the '?' -- this gives me a 
regex compilation error when I try to spider.

So, I backed down to 2.4.2, but my content (a pdf download referenced 
by _uid1=999) would not get indexed, with user supplied function #1 
death 'Can't locate object method "export_to_level" via package 
"MP3::Tag" at /usr/local/lib/swish-e/perl/SWISH/Filter.pm line 662.

So I tried today's daily, but got failures indexing regular files:  
Can't call method "as_string" on an undefined value at 
/usr/local/lib/swish-e/spider.pl line 780.

Suggestions welcome.

On Thursday, October 28, 2004, at 12:22  PM, Bill Moseley wrote:

> On Thu, Oct 28, 2004 at 12:18:35PM -0700, Bill Conlon wrote:
>> Given a url such as:
>>
>> http://domain.com/appfile.ext?_uid1=999
>>
>> I want to store the number 999 in the index.
>
> Look at man SWISH-CONFIG.  I didn't try this, but something like:
>
>   ExtractPath uid1 regex !^.+uid1=(\d+)$!$1!
>
>
>
>>
>> So I set up spider.config:
>>
>> PropertyNames uid1
>> ExtractPath uid1 remove "http://domain.com/appfile.ext"
>> ExtractPath uid1 remove uid1=
>>
>> First ExtractPath gives ?_uid1=999
>> Second ExtractPat gives ?_999
>>
>> How do I remove ?_
>>
>> thx
>>
>>
>
> -- 
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
>    http://swish-e.org/Discussion/
>
> Help with Swish-e:
>    http://swish-e.org/current/docs
>    swish-e@sunsite.berkeley.edu
>
Received on Thu Oct 28 14:30:17 2004