Re: [swish-e] Swish3 vs Omega

From: Peter Karman <peter(at)>
Date: Thu Feb 05 2009 - 15:14:38 GMT
Kevin Bowling wrote on 02/05/2009 09:08 AM:
> On Thursday 05 February 2009 07:32:46 Peter Karman wrote:
>> Kevin Bowling wrote on 02/04/2009 11:57 PM:
>>>> Ranking should be somewhat better.
>>> I look forward to the better rankings!
>> have you tried the different RankScheme options in Swish-e 2.4?
>>> What really kills performance is PDF/PS to HTML conversions on my box. 
>>> It would be really nice to thread the indexing and converting so it
>>> doesn't block on this case.
>> you can do that yourself. Just filter your PDF/PS separately and cache
>> the output, then index your cache. That's a common approach.
> I assume I am not the only person in the world indexing PDF and PS.  
> Everything in FOSS search seems to be DIY, but really everybody has similar 
> requirements.  I'm actually only indexing the PS as text (with pstotext) since 
> Swish-e doesn't have a suitable script included.  Again, I'm sure plenty of 
> people do this.  Why not include robust scripts to deal with PDF, PS, DOC, et 
> al.

SWISH::Filter should handle all of those save .ps I think. It would be
easy to add one -- why not give it a try based on what you do already
and submit a patch?

The latest SWISH::Filter is on CPAN:

> Also, using 'file' or MIME information to index rather than file extensions, 
> even when run on the local file system, would be pretty nice.  I know there is 
> a lot of data on my box that isn't indexed because of this.
>>>> If all you need is a local filesystem indexer for a website with 200k+
>>>> docs (which I would call medium-sized -- these days folks deal with
>>>> multi-million doc collections), and you don't need UTF-8 or incremental
>>>> indexing, Swish-e 2.4.5 is about as good as it gets. Don't let its age
>>>> fool you. :)
>>> Yes improvements and a nicer interface are really all I would like to
>>> see. It's 2009 and the interface looks like it is 10+ years old (not that
>>> that is a bad thing, but a 'modern' interface would be nice as well).
>> by 'interface' do you mean the swish.cgi script? or the options to the
>> swish-e cli? or ...?
> Yes.. swish.cgi. A nice, full featured and modern interface that could be 
> themed or integrated into other pages would really round out the turn-key 
> search solution.
> This all seems like low hanging fruit.  We both agree that Swish-e is fast and 
> has good results.  I think improving the included interface and filters would 
> go a long way.  Hopefully you can start pushing releases as well.  Long 
> release cycles are no good for FOSS.

agreed, in theory. low hanging fruit and pushing releases still requires
tuits however. have any to share?

> I hope I don't come off sounding selfish or sound like 'do this work for me 
> please'.  I just use this on a non-commercial site 
> (  Hopefully I am at least providing useful 
> critique as an end user.

become more than an end user and send along a PS filter. :)

Peter Karman  .  peter(at)  .

