Skip to main content.
home | support | download

Back to List Archive

Memory Problems while indexing

From: Klingensmith, Rick <klingensmith(at)not-real.hr.msu.edu>
Date: Thu Sep 11 2003 - 15:33:37 GMT
I experienced this problem when installing SWISH-s and resolved it by using
the -e option and pointed the Temp-Dir to a drive that has lots of space
(almost 100GB). Now it appears to be an issue again. I've also tried
indexing by in a command window using my admin account and receive the same
error. We use the following command to execute the index:
D:\ProgramFiles\Swish-E\swish-e -S http -e -c
D:\ProgramFiles\Swish-E\conf\siteindex.config. The config file looks like
this:

 

# ----- SiteIndex.config - Spider using "http" method -------

#

#  Please see the swish-e documentation for

#  information on configuration directives.

#  Documentation is included with the swish-e

#  distribution, and also can be found on-line

#  at http://swish-e.org

#

#

#  This example demonstrates how to use the

#  the "http" method of spidering.

#

#  Indexing (spidering) is started with the following

#  command issued from the "d:\Program Files\Swish-e" directory:

#

#     swish-e -S http -c Siteindex.config

#

#  Note: You should have the current Bundle::LWP bundle

#  of perl modules installed.  This was tested with:

#     libwww-perl-5.53

#

#  ** Do not spider a web server without permission **

#

#---------------------------------------------------

 

# Include our site-wide configuration settings:

 

IncludeConfigFile D:/ProgramFiles/Swish-E/conf/Settings.config

 

# Specify the URL (or URLs) to index:

IndexDir http://www.hr.msu.edu/hrsite

 

 

# If a server goes by more than one name you can use this directive:

 

# EquivalentServer http://swish-e.org  http://www.swish-e.org

 

 

 

# This defines how many links the spider should

# follow before stopping.  A value of 0 configures the spider to

# traverse all links. The default is 5

# The idea is to limit spidering, but seems of questionable use

# since depth may not be related to anything useful.

 

MaxDepth 10

 

# The number of seconds to wait between issuing

# requests to a server.  The default is 60 seconds.

 

Delay 1

 

 

# Skip pages with Meta tag "noindex"

 

obeyRobotsNoIndex yes

 

 

# (default /var/tmp)  The location of a writeable temp directory

# on your system.  The HTTP access method tells the Perl helper to place

# its files there.  The default is defined in src/config.h and depends on

# the current OS.

 

TmpDir D:/Inetpub/Indexes/Temp

 

 

# The "http" method uses a perl helper program to fetch each document

# from the web called "swishspider" and is included in the src directory of

# the swish-e distribution.

 

SpiderDirectory D:/ProgramFiles/Swish-E

 

# Put the index files in the Inetpub/Indexes directory

IndexFile D:/Inetpub/Indexes/SiteIndex.New.index

 

 

# end of SiteIndex Config file

 

I am receiving the following warning in my log files from the indexing job:
Warning: Configuration setting for TmpDir 'D:/Inetpub/Indexes/Temp' will be
overridden by environment setting 'C:\DOCUME~1\rek\LOCALS~1\Temp' which does
not exist. When I look in the specified temp directory I've found SWISH-e
work files so I'm not sure if this is a problem or not.

 

The summaries of the last good index on 9/8 look like: 

1468 files indexed.  39839610 total bytes.  810188 total words.

Elapsed time: 00:32:05 CPU time: 00:32:05

Indexing done!

 

We are using the latest windows version of Swish-e on a Windows 2000 server.

 

The archives and FAQ point to the -e option to fix memory issues. What have
I missed?

 

Rick

 

Richard Klingensmith

MSU Human Resources Information Systems

1407 S. Harrison Road Ste. 40

East Lansing, MI 48823

(517) 432-4636 ext. 155

klingensmith@hr.msu.edu

 




*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Thu Sep 11 15:33:46 2003