Skip to main content.
home | support | download

Back to List Archive

Re: Indexing PDFs on Windows - Revisited....

From: Anthony Baratta <Anthony(at)not-real.2plus2partners.com>
Date: Thu Sep 23 2004 - 17:44:40 GMT
Bill Moseley wrote:

> Not sure what the issue is yet, but I'll read on.  Let me know if any
> of this doesn't make sense.  The short answer, I think, is to not use
> a spider.pl config file at first, and let it use the default config
> file.

OK - I've moved to the default spider.pl setup. Seems to be much better. 
Still having issues with PDFs but these issues do not appear to be 
related to swish.

> Also, you might find it less load on the web server to use keep_alive
> than using a one second delay.  And faster indexing, too.

I'm confused about how to tweak the default setup to use the keep_alive 
versus only the delay.

>>Common to both examples, the StoreDescription does not appear to be acted 
>>on. I have no descriptions available via <swishdescription>, I get some 
>>Date Time String (e.g " Local Time : 1:12:01 PM PT") instead.
> 
> Oh, you were asking about storing the descriptions:
> 
>     $ cat c
>     DefaultContents HTML*
>     StoreDescription HTML* <body> 50
> 
>     $ swish-e -e -S prog -i stdin -c c -v0 < pport 
> 
>     $ swish-e -w port -m1 -p swishdescription -H0
>     1000 http://test.portofoakland.com/pdf/boar_shee_040622.pdf "boar_shee_040622.pdf" 124467 "C JOHN PROTOPAPPAS President PATRICIA A. SCATES Fi"
> 
> Not sure where that first "C" (before John) comes from, but that's a separate issue.
> But that's the 50 chars stored in the description.

OK - I see that now. Appears that the PDFs are gettting descriptions but 
my html/asp pages are not. I think this might be becuase my body tags 
have attributes?

e.g.
     <body leftmargin="0" topmargin="0" rightmargin="0"
      marginwidth="0" marginheight="0">

Would this interfer with the

      StoreDescription HTML* <body> 320

directive? I'm currently running a test with the directive like this:

      StoreDescription HTML* '<body leftmargin="0" topmargin="0" 

           rightmargin="0" marginwidth="0" marginheight="0">' 320

Will be interesting to see if that does anything or not.
Received on Thu Sep 23 10:44:55 2004