Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] indexing with DirTree.pl - help needed

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Sun Jan 20 2008 - 04:27:30 GMT
Matt Miller wrote on 1/18/08 10:13 AM:
> On Dec 21, 2007 11:03 PM, Peter Karman <peter@peknet.com> wrote:
> \> > * I'd like the name of the file to show up in searches by "Title &
>>> Body" in swish.cgi even if swish-e doesn't know how to filter the
>>> contents, including text documents with no extension. This is not
>>> happening. What can I do to make this happen?
>>
>> see http://swish-e.org/docs/swish-config.html#nocontents
> 
> I have NoContents set up but am getting errors still. Particularly
> with .indd files and files with no extension.
> For example: Failed to set content type for document
> '/var/local/liver/Newsletter/fall 2006
> newsletter/Finals/prometheus_newsletter_winter_2007.indd'
> 

That error is coming from SWISH::Filter as it tries to figure out whether or not 
it can convert the file. The NoContents directive isn't recognized by DirTree.pl 
  or SWISH::Filter, but only by the swish-e command that those 2 feed content to.
So SWISH::Filter will try and fail to recognize whether it needs filtering or not.

I had never heard of .indd files. I suspect neither has the MIME::Types perl 
module. Check if you have MIME::Types installed on your system. If so, you can 
read its docs on how to add a custom type. If not, then modify SWISH::Filter to 
add a type for .indd. Look for the %mime_types hash.

Files with no extension are trickier, and SWISH::Filter ought to be smarter 
about that common case. Patches to SWISH::Filter are welcome.


>>> sub check_dir {
>>>     my $dir = shift;
>>>     return ! m[^\.]; # don't process .directories
>> should be:
>>
>>       return $dir =~ m[^\.]; # don't process .directories
> 
> 
>     return if $path =~ /\/\./;  # don't index .files
> 
>     return if $dir =~ /\/\./; # don't process .directories
> 
> are the two regexs that worked for me...
> 

cool. I see DirTree.pl sets the no_chdir option in File::Find, which would 
explain the full paths being passed to the check_* functions. Shame on me for 
not checking that first before suggesting a fix.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Sat Jan 19 23:27:33 2008