Skip to main content.
home | support | download

Back to List Archive

(no subject)

From: Shaffer, Chris <Chris.Shaffer(at)not-real.bellsouth.com>
Date: Fri Feb 11 2005 - 18:53:47 GMT
Hi...  I've gotten swish-e (using spider.pl) to crawl a couple of our
intranet sites.  The filters seem to be working okay for excel.  And it
seems to be looking at word documents.  However, (using swish.cgi), I
don't get any descriptions for those word docs.

I'm calling swish-e with 'swish-e -c swish.config -S prog' on a fedora
core 2 box.

I've installed catdoc, excel perl modules, and xpdf.  Swish-filter-test
seems to work fine.

Any idea where I can look?  I have no idea where to begin digging.  Here
are my config files, if that'll help:

Spider.config
--------------------
my ($filter_sub, $response_sub ) =3D swish_filter();

my %main_site =3D (
	base_url    =3D> 'http://.../',
	email       =3D> 'chris.shaffer@bellsouth.com',
	debug       =3D> 'errors, url, info',
	delay_sec   =3D> 0,
	test_response       =3D> $response_sub,
	filter_content      =3D> $filter_sub,
);

my %bstcad_site =3D (
	base_url    =3D> 'http://..../',
	email       =3D> 'chris.shaffer@bellsouth.com',
	debug       =3D> 'errors, url, info',
	delay_sec   =3D> 0,
	test_response       =3D> $response_sub,
	filter_content      =3D> $filter_sub,
);

my %ecars_site =3D (
	base_url    =3D> 'http://.../',
	email       =3D> 'chris.shaffer@bellsouth.com',
	debug       =3D> 'errors, url, info',
	delay_sec   =3D> 0,
	test_response       =3D> $response_sub,
	filter_content      =3D> $filter_sub,   =20
);

@servers =3D (\%ecars_site, \%bstcad_site, \%main_site);
1;
---------------------

Swish.config
------------------------
# Use spider.pl as the external program:
IndexDir spider.pl
IndexOnly .html .htm .xml .doc .pdf .xls .ppt
DefaultContents HTML*
StoreDescription HTML* <body> 200000
# And pass the name of the spider config file to the spider:
SwishProgParameters spider.config
-------------------------


Chris Shaffer
Application Developer, BSTCAD/BSTProcess
BSTCAD Support Forums
<http://forums.ecars.bst.bls.com/viewforum.php?f=3D2>=20
chris.shaffer@bellsouth.com
(404) 927-1227


*****
"The information transmitted is intended only for the person or entity =
to which it is addressed and may contain confidential, proprietary, =
and/or privileged material.  Any review, retransmission, dissemination =
or other use of, or taking of any action in reliance upon, this =
information by persons or entities other than the intended recipient is =
prohibited.  If you received this in error, please contact the sender =
and delete the material from all computers."  118




*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Fri Feb 11 10:53:53 2005