Skip to main content.
home | support | download

Back to List Archive

Re: [SWISH-E:217] Index files?

From: Giulia Hill <ghill(at)not-real.library.berkeley.EDU>
Date: Fri Apr 03 1998 - 20:45:47 GMT
About what info the swish-e indexes files contain:

Attached is an example of the index of a file that contains only two
words: "filename" associated with the meta name "keyword" and "tiger" not
associated with any meta name.

PIECE BY PIECE:

00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000013390000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000001359
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000001446000000000000137600000000000013770000000000001429

This part contain the offesets for the characters that you specified in
the variable indexchars (in swish.h) plus, in order: the offset in the
index file of the metanames, the offset of the stopwords, the offset of
the list of files, and the offset of the list of files' offsets.

The offset of the char indicates the point in the index file where you
will find the first word that starts with that char.
-----------------------------------------
filename: 1 58047 1 3
tiger: 1 58047 1 1

These are the entries themselves. For each word you have sets of 4
numbers, one for each of the files in which they are present.
The first number indicates the file number. The second is the rank which
should indicate some sort of relevance of the word in the file - but it is
not reliable. The third is the structure in which the word appear, that is
if is in the title, body, and so on. The forth is the metaname to which it
is associated, a value of 1 indicates that there is no metaname associated
with it. See below about metanames.
---------------------------------------------------------
/home/ghill/swish/dir8/records/80.html "80.html" 66

This is just the list of the files that have been indexed.
-----------------------------------------------------------
0000000000001377

The list of offesets where the files can be found in the index file, since
there is only one file, there is only one offset.
-----------------------------------------------------------
title keyword

The possible meta names that have been spcecified in the user's
configuration file. As mentioned before, 1 means no metaname, after that
number are assigned in order, so "title" is 2 and "keyword" is 3.
-----------------------------------------------------------

Pretty much the indexes of swish and swish-e are organized in the same
way, apart from the fact that swish does not have any metaname.

I hope that this was of help.

Giulia

/* ----------------------------------------------------------- */
/*  Giulia Hill                        Library Systems Office  */
/*                                     386 Doe Library         */
/*  ghill@library.berkeley.edu         U.C. Berkeley,          */
/*                                     Berkeley, CA 94720      */
/* ----------------------------------------------------------- */


On Fri, 27 Mar 1998, NICKLAS ANDERSSON wrote:

> Hello, 
> 
> we are two students from Sweden studying in Granada, Spain
> for the moment. We are doing our degree-project here and are
> working with Swish. First we download a website with "wget"
> and then we index the site with swish. Our problem is that
> we would like so search the index file from a JAVA-program
> directly to speed up the search-process. Is there any way
> for us to get the information how the file is build up?
> We would be very pleased if a friedly soul out there know
> how to search directly in the index file.
> Is the indexfiles for swish and swish-e build up the same way?
> 
> Thank you in advance!
> 
> Regards,
>   Nicklas & Michael
>   nicklas@technologist.com
> 


<META name="keyword" content="filename">
tiger

Received on Fri Apr 3 12:53:55 1998