you're on the wrong list. This is Swish-e, not Swish++.
try over at http://homepage.mac.com/pauljlucas/software/swish/
oscaruser@programmer.net wrote on 09/06/2005 10:28 AM:
> Folks,
> SWISH++ 6.1.2. I am trying to index 66,542 files and used the following command:
>
> $ cat myfiles-wget.log | httpindex -e'html:*' > idx-results.log
>
> The contents of idx-results.log looks as follows. After awhile the process exits, output of idx-results.log abruptly halts and swish++.index is left at file size 0. It takes a long time to run, and is very annoying to see this result. My swish++.conf file is below as well. What is going wrongly??
> TIA
>
> $ tail idx-results.log
>
> createtopic.php?method=newtopic&forum=7&sid=200509011 (488 words)
> createtopic.php?method=newtopic&forum=7&sid=200509012 (488 words)
> createtopic.php?method=newtopic&forum=7&sid=200509013 (488 words)
> createtopic.php?method=newtopic&forum=7&sid=200509014 (488 words)
> createtopic.php?method=newtopic&forum=7&sid=200509015 (488 words)
> createtopic.php?method=newtopic&forum=7&sid=200509016 (488 words)
> createtopic.php?method=newtopic&forum=7&sid=200509017 (488 words)
> createtopic.php?method=newtopic&<EOF>
>
> $ cat swish++.conf
>
> Incremental no
> #
> # used by: index; when "yes", same as the -I option.
> #
> # When "yes", incrementally index files and add them to an existing
> # index.
>
> IndexFile /home/www-data/public_html5/swish++.index
> #
> # used by: index, search; same as the -i option.
> #
> # The name of the index file either generated or searched.
>
> #LaunchdCooperation no
> #
> # used by: search; same as the -l option
> #
> # If "search" is run as a daemon, cooperate with Mac OS X's launchd(8) by
> # not "daemonizing" itself since launchd handles that. When "yes", this
> # forces "SearchBackground no".
> #
> # This option is available only under Mac OS X, should be used only for
> # version 10.4 (Tiger) or later, and only when search will be started via
> # launchd.
>
> PidFile /var/run/search.pid
> #
> # used by: search; same as the -P option
> #
> # If "search" is run as a daemon, record its process ID in this file.
>
> RecurseSubdirs no
> #
> # used by: index, extract; when "no", same as the -r option.
> #
> # When "no", do not recursively index the files in subdirectories, that
> # is when a directory is encountered, all the files in that directory are
> # indexed (modulo the filename patterns specified via the IncludeFile,
> # ExcludeFile, or ExtractFile variables), but subdirectories encountered
> # are ignored and therefore the files contained in them are not indexed.
> # (This variable is most useful when specifying the directories and files
> # via standard input.) The default is to index the files in
> # subdirectories recursively.
>
> ResultsMax 20
> #
> # used by: search; same as the -m option.
> #
> # The maximum number of results to return overriding the compiled-in
> # default (which is usually 100).
>
> ResultSeparator " "
> #
> # used by: search; same as the -R option
> #
> # The string to separate the parts in a search result when ResultsFormat
> # is "classic". Either single or double quotes can be used to preserve
> # whitespace. Quotes are stripped only if they match.
>
> ResultsFormat classic
> #
> # used by: search; same as the -F option
> #
> # The output format of search results: either "classic" or "XML".
>
> SearchBackground yes
> #
> # used by: search; when "no", same as the -B option.
> #
> # When "yes" and SearchDaemon is not "none", automatically detach from
> # the terminal and run in the background.
> #
> # This option is overridden by "LaunchdCooperation yes".
>
> SearchDaemon none
> #
> # used by: search; same as the -b option.
> #
> # When not "none", run "search" as a daemon process listening to either a
> # Unix domain ("unix") or TCP socket ("tcp") or both ("both") for
> # requests.
>
> SocketAddress *:1967
> #
> # used by: search; same as the -a option.
> #
> # Default IP address and port of the TCP socket; used only when
> # SearchDaemon is either "tcp" or "both".
>
> SocketFile /home/www-data/public_html5/tmp/search.socket
> #
> # used by: search; same as the -u option.
> #
> # Default name of the Unix domain socket file; used only when
> # SearchDaemon is either "unix" or "both".
>
> SocketQueueSize 511
> #
> # used by: search; same as the -q option.
> #
> # Maximum number of queued connections for a socket; used only when
> # SearchDaemon is not "none". The default 511 value is taken from
> # httpd.h in Apache:
> #
> # It defaults to 511 instead of 512 because some systems store it
> # as an 8-bit datatype; 512 truncated to 8-bits is 0, while 511
> # is 255 when truncated.
> #
> # If it's good enough for Apache, it's good enough for us.
>
> SocketTimeout 10
> #
> # used by search; same as the -o option.
> #
> # Number of seconds a client has to complete a search request before
> # being disconnected. This is to prevent a client from connecting, not
> # completing a request, and causing the thread servicing the request to
> # wait forever. This is used only when SearchDaemon is not "none".
>
> StemWords no
> #
> # used by: search; when "yes", same as the -s option.
> #
> # Perform stemming (suffix stripping) on words during searches. Words
> # that end in the wildcard character are not stemmed.
>
> #StopWordFile custom_stop_word_file
> #
> # used by: index, extract; same as the -s option.
> #
> # The name of a file containing the set of stop-words to use instead of
> # the built-in set.
>
> StoreWordPositions yes
> #
> # used by: index; when "no", same as the -P option.
> #
> # Store word positions during indexing needed to do "near" searches.
> # Storing said data approximately doubles the size of the generated
> # index.
>
> TempDirectory /home/www-data/public_html5/tmp
> #
> # used by: index
> #
> # Directory to use for temporary files during indexing. If your OS
> # mounts swap space on /tmp, as indexing progresses and more files get
> # created in /tmp, you will have less swap space, indexing will get
> # slower, and you may run out of memory. If this is the case, you can
> # specify a directory on a real filesystem, i.e., one on a physical
> # disk. The directory must exist.
>
> ThreadsMin 5
> ThreadsMax 100
> #
> # used by: search; same as the -t or -T option, respectively.
> #
> # The minimum/maximum number of simultanous threads, respectively; used
> # only when SearchDaemon is not "none".
>
> ThreadTimeout 30
> #
> # used by: search; same as the -O option.
> #
> # Number of seconds until an idle spare thread times out and destroys
> # itself; used only when SearchDaemon is not "none".
>
> TitleLines 12
> #
> # used by: index; same as the -t option.
> #
> # For HTML and XHTML files only, the maximum number of lines into a file
> # to look at for HTML and XHTML <TITLE> tags. The default is 12. Larger
> # numbers slow indexing.
>
> Verbosity 4
> #
> # used by: index, extract; same as the -v option.
> #
> # Print additional information to standard output during indexing or
> # extraction. The verbosity levels are 0-4; see index(1) or extract(1)
> # for details.
>
> WordFilesMax infinity
> #
> # used by: index; same as the -f option.
> #
> # The maximum number of files a word may occur in before it is discarded
> # as being too frequent. The default is infinity.
>
> WordPercentMax 101
> #
> # used by: index; same as the -p option.
> #
> # The maximum percentage of files a word may occur in before it is
> # discarded as being too frequent. If you want to keep all words
> # regardless, specify 101.
>
> WordsNear 10
> #
> # used by: search; same as the -n option.
> #
> # The maximum number of words apart two words can be to be considered
> # "near" each other.
>
> WordThreshold 250000
> #
> # used by: index; same as the -W option.
> #
> # The word count past which partial indicies are generated and merged
> # since all the words are too big to fit into memory at the same time.
> # If you index and your machine begins to swap like mad, lower this
> # value. The above works OK in a 64MB machine. A rule of thumb is to
> # add 250000 words for each additional 64MB of RAM you have. These
> # numbers are for a SPARC machine running Solaris. Other machines
> # running other operating systems use memory differently. You simply
> # have to experiment. Only the super-user can specify a value larger
> # than the compiled-in default.
>
> # the end
>
>
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Tue Sep 6 08:33:12 2005