Skip to main content.
home | support | download

Back to List Archive

[swish-e] Duplicates after merge

From: Gregorio Hernández Piris <grego(at)not-real.eleka.net>
Date: Thu Feb 21 2008 - 10:45:37 GMT
Hello,

I'm using swish-e 2.4.3, and I have a duplicate URL problem when merging 
to indexes. I have  a "big" index and do an incremental indexing of XML 
files everyday.  But yesterday, I changed one file and when doing the 
merge I got the same url (swishdocpath) twice, with different dates. 
Shouldn't I get only one (the newest one)?

index1:


$ swish-e -f index1 -w KEYWORD=Gregorio -x"%p -- %D\n"
# SWISH format: 2.4.3
# Search words: KEYWORD= Gregorio
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.008 seconds
http://localhost/page/bilaketadatuak?gid=332884 -- 2008-02-19 23:30:07 CET

index2:

 $ swish-e -f index2 -w KEYWORD=Gregorio -x"%p -- %D\n"
# SWISH format: 2.4.3
# Search words: KEYWORD=Gregorio
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.010 seconds
http://localhost/page/bilaketadatuak?gid=332884 -- 2008-02-05 01:26:42 CET

Merged index:

swish-e -f index.merged-w KEYWORD=Gregorio -x"%p -- %D\n"
# SWISH format: 2.4.3
# Search words: GAKOAK=sorgortasuna
# Removed stopwords:
# Number of hits: 2
# Search time: 0.000 seconds
# Run time: 0.011 seconds
http://localhost/bilaketadatuak?gid=332884 -- 2008-02-05 01:26:42 CET
http://localhost/bilaketadatuak?gid=332884 -- 2008-02-19 23:30:07 CET


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Feb 21 05:45:39 2008