Skip to main content.
home | support | download

Back to List Archive

Indexing errors and also Indexing time

From: <Mike.Fountain(at)not-real.worldspan.com>
Date: Wed Feb 08 2006 - 14:17:38 GMT
Couple questions on Indexing errors and indexing time:

How fast should indexing run on a properly configured system?   Its taking
me a little over 1 minute to index about 200 files.  I'm using DirTree
piped to the index command.  Off the top of my head, think this box is an
800MHZ CPU with 128MB RAM running Ubuntu linux.

Is 1 minute or so for that few files ok?  I've got nothing to compare it
to, so no idea how fast an index should run.

Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 40,752 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
40,752 unique words indexed.
5 properties sorted.
173 files indexed.  39,043,196 total bytes.  315,064 total words.
Elapsed time: 00:01:13 CPU time: 00:00:05
Indexing done!




Watching the detailed output, it looks like what really slows down the
indexing is PDF files.  Some of them parse quick with errors, some of them
seem to grind for quite awhile before spitting out an error:

/www/pages/support/vendors/cisco/6500arch.pdf - Using HTML2 parser -
(16364 words)
Error (6594296): Internal: got 'EI' operator
Error (9730251): Internal: got 'EI' operator
/www/pages/support/vendors/cisco/Catalyst 4500 Update 2.pdf - Using HTML2
parser -  (9963 words)
Error (2539944): Unknown operator ''
Error (2539944): Internal: got 'EI' operator
Error (7208699): Unknown operator 'c'
Error (7208699): Internal: got 'EI' operator
/www/pages/support/vendors/cisco/Catalyst Update 1.pdf - Using HTML2 parser
-  (19283 words)
/www/pages/support/vendors/cisco/Cisco Config register.xls - Using HTML2
parser -  (228 words)


Googling doesn't give me any idea what is causing those errors.  Anyone on
here have any ideas?



The other question I have is - Is the searc features of the web site
available while the site is reindexing?  If my site grows to the point
where it takes 5-10 minutes to index, or does it build a temp file and then
do a quick swap out of the indexes once its done?
Received on Wed Feb 8 06:17:50 2006