Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Empty index error in an index that isn't empty

From: Parker, Peter A CONTRACTOR WRAIR-Wash DC <Peter.Parker(at)not-real.AMEDD.ARMY.MIL>
Date: Tue Sep 18 2007 - 17:45:09 GMT
Thanks for your help guys. I figured out what the problem was. I noticed
that the index file was always at 100mb when the indexing stopped. I
check ulimit and there was a default limit on max filesize for that
account of, you guessed it, 100mb. Since as Swish-e's site instructs, I
was not running the process as root this stopped the indexing service
once the file reached 100mb. A quick edit to limits.conf fixed the
problem.

About the filefilter directives. It is my understanding that catdoc
handles conversion of both doc and ppt files to text
(http://freshmeat.net/projects/catdoc/), and it is not nessesary to have
a separate directive for those in the configuration file. Thanks again.

Peter

-----Original Message-----
From: users-bounces@lists.swish-e.org
[mailto:users-bounces@lists.swish-e.org] On Behalf Of
users-request@lists.swish-e.org
Sent: Tuesday, September 18, 2007 12:00 PM
To: users@lists.swish-e.org
Subject: Users Digest, Vol 9, Issue 13

Send Users mailing list submissions to
	users@lists.swish-e.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://lists.swish-e.org/listinfo/users
or, via email, send a message with subject or body 'help' to
	users-request@lists.swish-e.org

You can reach the person managing the list at
	users-owner@lists.swish-e.org

When replying, please edit your Subject line so it is more specific than
"Re: Contents of Users digest..."


Today's Topics:

   1. Re: Empty index file(s) error (Peter Karman)
   2. [autoreply] Out of the Office (Patrick O'Lone)
   3. Empty index error in an index that isn't empty (Thomas R. Bruce)
   4. Re: Empty index error in an index that isn't empty (Peter Karman)


----------------------------------------------------------------------

Message: 1
Date: Mon, 17 Sep 2007 11:26:40 -0500
From: Peter Karman <peter@peknet.com>
Subject: Re: [swish-e] Empty index file(s) error
To: Swish-e Users Discussion List <users@lists.swish-e.org>
Message-ID: <46EEAAC0.2030106@peknet.com>
Content-Type: text/plain; charset=UTF-8



On 09/14/2007 04:15 PM, William M Conlon wrote:
> The indexing process is not completing, hence the temp files.
> 
> Take a look at the indexer output.
> 
> Bill
> 
> 
> On Sep 14, 2007, at 2:03 PM, Parker, Peter A CONTRACTOR WRAIR-Wash DC
> wrote:
> 
>> Greetings,
>> I have recently completed installation of Swish-e on an apache server

>> machine with the follows details:
>>
>> Swish-e version: 2.4.5
>> Apache version: 2.0.52
>>
>> I now have approximately 50 files in the directory indexed, including

>> Word, Excel and Powerpoint documents and PDFs. I have gone through 
>> the steps outlined for indexing non-text file. Initially, when there 
>> were only about 7 files in the html directory the indexing worked 
>> fine and command line searches worked flawlessly. Now after adding 
>> more files to the directory (about 50 files), the indexing is not 
>> working as it was.
>>

My guess is one of the filter helper programs (pdftotext, catdoc, etc)
are choking the indexer and not delivering all the content you expect.
Encodings are often an issue; there are others.

>> FileFilter .pdf share/doc/swish-e/examples/filter-bin/_pdf2html.pl

Try running that pdf2html script by itself on some docs.

Also, I don't see any FileFilter lines for .doc, .ppt etc. You might
want to try DirTree.pl script instead, since it has all the filtering
stuff work with SWISH::Filter instead of FileFilter config opts.

--
Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/



------------------------------

Message: 2
Date: Mon, 17 Sep 2007 11:28:01 -0500
From: "Patrick O'Lone" <polone@townnews.com>
Subject: [swish-e] [autoreply] Out of the Office
To: Swish-e Users Discussion List <users@lists.swish-e.org>
Message-ID:
	<200709171628.l8HGS1P5019829@mail1.ch2.l3.sys.townnews.com>

I will be out of the office the week of September 17th - 21st on my
honeymoon. I have recieved your e-mail, but I won't be responding until
September 24th. For urgent issues, please call 800-293-9576 or email
requests@townnews.com.



------------------------------

Message: 3
Date: Tue, 18 Sep 2007 05:34:04 -0400
From: "Thomas R. Bruce" <trb2@cornell.edu>
Subject: [swish-e] Empty index error in an index that isn't empty
To: Swish-e Users Discussion List <users@lists.swish-e.org>
Message-ID: <46EF9B8C.4020307@cornell.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Folks:

I'm getting empty index errors from an index that isn't empty.  Here's 
what I know about the problem.

a) What changed?  The search apparatus is constructed in Mason, using 
mod_perl and the swish-e perl API.  We recently upgraded mod_perl and 
Mason, with some difficulty.

b) However, there are six or seven similar collections on the same 
machine, separately indexed, and all the others are operating fine using

the same code, so it is unlikely that the mod_perl or Mason setups are 
at fault.  It is possible that the broken index is the only one that has

been reindexed since the change in the perl scaffolding.

c) Command line searches work fine, as do searches using the API from 
inside perl scripts run from the command line, as do searches of other 
indices from within the world of mod_perl.

d) The index returns sensible things in response to swish-e -T whatever 
when run from the command line.

e) Permissions on the index files are wide open.

It looks for all the world like it might be a library mismatch problem, 
given the symptoms, but I can't figure out why it would show up in only 
one index of many that are searched with the same code.  I suppose the 
most helpful thing would be to know under what circumstances swish-e 
throws the empty index error, aside from when indexes are empty (grin).

Best,
Tb.


------------------------------

Message: 4
Date: Tue, 18 Sep 2007 08:33:43 -0500
From: Peter Karman <peter@peknet.com>
Subject: Re: [swish-e] Empty index error in an index that isn't empty
To: Swish-e Users Discussion List <users@lists.swish-e.org>
Message-ID: <46EFD3B7.6070809@peknet.com>
Content-Type: text/plain; charset=UTF-8



On 09/18/2007 04:34 AM, Thomas R. Bruce wrote:

> b) However, there are six or seven similar collections on the same 
> machine, separately indexed, and all the others are operating fine
using 
> the same code, so it is unlikely that the mod_perl or Mason setups are

> at fault.  It is possible that the broken index is the only one that
has 
> been reindexed since the change in the perl scaffolding.
> 


so if you just replace the path to the failing index with the path to an
index
that you know works, the code works?

> c) Command line searches work fine, as do searches using the API from 
> inside perl scripts run from the command line, as do searches of other

> indices from within the world of mod_perl.
> 

would need to see some code snips to be able to even hope of reproducing
the
problem.

what version of Swish-e? What version of SWISH::API ?

-- 
Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/



------------------------------

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users


End of Users Digest, Vol 9, Issue 13
************************************
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Sep 18 13:45:41 2007