Skip to main content.
home | support | download

Back to List Archive

Quick indexing

From: RAGHAVV <RAGHAVV(at)not-real.inf.com>
Date: Thu May 27 1999 - 15:01:39 GMT
 
I want to index all the HTML pages that are accessed by my corporate users
while surfing the Internet. I am planning to develop a plugin that will sit
with the web-proxy and will capture all the HTMLs before giving them to the
users. The corporate web-administrator can view these pages by providing
querying on the indexed database. This query can be done at any time of the
day.
 
As all the search engine tools are designed to deliver search results much
faster than indexing HTML pages, I am not sure which tool to choose for this
requirement. Has anybody ever tried real time indexing with Swish-E? I would
like to know if Swish-E can be suitable for this requirement. As an
approximation I would like the system to be able to index upto 5 HTMLs /
sec. Also I need to keep these HTML pages for 7 days. Which will be around
210K pages.
 
Thanks,
Raghvendra Varma,
Infosys Technologies Limited
 
Received on Thu May 27 15:23:02 1999