Skip to main content.
home | support | download

Back to List Archive

RE: http method v. file system method Performance

From: <stephane_meier(at)not-real.non.hp.com>
Date: Fri Feb 18 2000 - 15:28:34 GMT
Jim,

Yes, I meant the html title. It only means that the title in index2 is not  
included in the new index. All keywords in index2 are used though.

It looks like the merge was implemented to merge non overlapping indexes.
Swish-e cannot merge 2 title, as it does not know which one is better. It  
looks like it was implemented to load the first index, then add the new  
data from other. In the process, the keywords of index2 are merged though.

It seems that a (simple?) change in the merge could allow it to ignore any  
data in file2 already present in index1, thus doing a real merge with  
replacement. You would just need to make it ignore any file already in the  
index.

Stephane
-----Original Message-----
From: JIM BRANNAN /HP-Roseville,om4  
Sent: 18 février 2000 16:09
To: swish-e@sunsite.berkeley.edu
Cc: JIM BRANNAN /HP-Roseville,om4
Subject: [SWISH-E] RE: http method v. file system method Performance


Stephane,

what does "- the titles are from index1" mean?  does this really
mean the stuff in index 2 is useless?  Maybe I do not know what
"titles" refers to - my first thought was the html title....

Jim

-----Original Message-----
From: stephane_meier@non.hp.com [mailto:stephane_meier@non.hp.com]
Sent: Friday, February 18, 2000 02:24
To: swish-e@sunsite.berkeley.edu
Subject: FW: [SWISH-E] RE: http method v. file system method Performance


Merging indexes

You can merge with "swish-e -M index1 index2 merged_index". I noticed 2    
things:
- the titles are from index1
- all keywords of index1 and index2 will be present
So it can be okay for an update, but a full re-indexing would remove the    
deleted keywords.

You would do it this way:
"swish-e -M update_index old_index new_index"

It's also possible to merge multiple indexes in one time.

Stephane

-----Original Message-----
From: JIM BRANNAN /HP-Roseville,om4    
Sent: 18 février 2000 06:47
To: swish-e@sunsite.berkeley.edu
Cc: JIM BRANNAN /HP-Roseville,om4
Subject: [SWISH-E] RE: http method v. file system method Performance


Hi,

I have completed a performance test indexing 18000 HTML files.
Here's the data. This is FYI.

server: HPUX  K460  4Way,  2GB memory
disk: Nike array, RAID5, 64MB cache

ncsa_httpd web server

swish-e can index  42 docs/minute on this same server via http
swish-e can index 110 docs/minute on this same server via file sys

The 42 docs per min test was done off hours - system was reasonable quiet.
The 110 docs per minutes was done during the business day and saw peaks
of 150 docs per minute.


So, I'm switching to file sys access.  No I have to figure out how to
identify and index the new files and then just merge the indexes.

Jim
Received on Fri Feb 18 10:32:16 2000