Skip to main content.
home | support | download

Back to List Archive

RE: how to get a description

From: <Jeffrey.Grunstein(at)not-real.ny.frb.org>
Date: Tue Nov 19 2002 - 20:47:37 GMT
In order for Swish-E to be able to display a summary, it needs to store the
contents of your documents.
If you want, you can even highlight the search string within the summary.
It's very cool!

Am I correct to assume that your users will be doing searches from a web
form and not from a command line?
If so, the web page you use has to deal with setting the right switches.

Try using the swish.cgi that comes in the /examples directory of the
Swish-E distribution.
That's the one I'm using.  I had to modify it to make it look right for our
web site, but I left the
guts intact.

As for your bad directives, I noticed that you had HTML* in there.  I don't
think you can have the * in there.
Did you try it just as I said below?   If you did and it still doesn't
work,  post the relevant parts of your
config file.  Also, it may be that you don't have the HTML2 parser
installed.  You can try using HTML instead
of HTML2 in the IndexContents and StoreDescription directives.




                                                                                                                                                  
                    "Wolf, Dena"                                                                                                                  
                    <dena.wolf@orcinc.       To:     Multiple recipients of list <swish-e@sunsite.berkeley.edu>                                   
                    com>                     cc:                                                                                                  
                    Sent by:                 Subject:     [SWISH-E] RE: how to get a description                                                  
                    swish-e@sunsite.be                                                                                                            
                    rkeley.edu                                                                                                                    
                                                                                                                                                  
                                                                                                                                                  
                    11/19/2002 03:04                                                                                                              
                    PM                                                                                                                            
                    Please respond to                                                                                                             
                    dena.wolf                                                                                                                     
                                                                                                                                                  
                                                                                                                                                  




I am doing this on the web, so I need my indexing to store the
descriptions?
Users will just be searching for words on the website, and I want a
document
summary or excerpt to appear below the links to the documents that contain
the words they are looking for. Does this make sense.  They will not be
entering any switches when they search on the web.  I put in the HTML2
lines
& still get bad directive for those 3 lines.  I have been going at this for
4 days now :(

Thanks a bunch for your email!

-----Original Message-----
From: Jeffrey.Grunstein@ny.frb.org [mailto:Jeffrey.Grunstein@ny.frb.org]
Sent: Tuesday, November 19, 2002 2:17 PM
To: dena.wolf@orcinc.com; swish-e@sunsite.berkeley.edu
Subject: Re: [SWISH-E] how to get a description



Try this in your config file:

IndexContents HTML2 .html
IndexContents HTML2 .htm
StoreDescription HTML2 <BODY> 100000

# To index PDF files as well, try something like this...
FilterDir /opt/sfw/bin
FileFilter .pdf pdftotext "'%p' -"
IndexContents TXT .pdf
StoreDescription TXT 250000

This will store the BODY tag text of all files that end in .htm and .html,
using the HTML2 parser.
If you're running a slower machine and performance is an issue, lower the
100,000 number to somthing
smaller.  If you have mostly smaller HTML files, this number can be lower
and you won't lose any content
when the descriptions are stored.

The command you listed looks like something you'd use to create the index.
As long as your config
file is right, you don't need to do anything else to store your
descriptions.  You just need the right switches
when doing your search.

Try doing a search like this once you've created the new index file:
cgi-bin/swish-e -w <your search string> -f index.swish -x '%t -
%p\n%d\nlast updated %D\trank %r\tsize %l bytes\n\n'

This will actually return a lot more info than just the description.  The
%d part shows the description.

Take a look at
http://www.swish-e.org/current/docs/SWISH-RUN.html#Searching_Command_Line_Ar

guments

and scroll down to the
section titled "-x formatstring (extended output format)".







                    "Wolf, Dena"

                    <dena.wolf@orcinc.       To:     Multiple recipients of
list <swish-e@sunsite.berkeley.edu>
                    com>                     cc:

                    Sent by:                 Subject:     [SWISH-E] how to
get a description
                    swish-e@sunsite.be

                    rkeley.edu





                    11/19/2002 01:33

                    PM

                    Please respond to

                    dena.wolf









Two questions; Ive been reading the past archives that deal with this and
am
understanding a little but don't know if I am doing this at all right.
My indexing is working and I am getting results now.  Now what I am trying
to do is to get a chunk of the body of the document in the results page
that
has say 40 words of the document body in it that includes the search word
or
not.

In my config file:
IndexFile index.swish
#MetaNames keywords description
IndexReport 3
FollowSymLinks no
IgnoreTotalWordCountWhenRanking yes
ReplaceRules replace "/export/home/orcsolar/html/" "http://www.orcinc.com/"
ReplaceRules remove "html/"
IgnoreLimit 50 1000
FileRules pathname contains members
IndexComments 0
IndexOnly .html .doc .xls .htm .ppt .txt .pdf
IndexContents HTML* .html .htm
StoreDescription HTML <body> 40
NoContents .gif .xbm .au .mov .mpg .ps

I added the IndexContents line & the StoreDescription line.  I get a bad
directive error for both of those 2 new lines.  Why? I checked that there
is
no space.

Also, in my index command line, how do I add something to make the
description run (assuming i get the indexing to work).
Right now my line says: cgi-bin/swish-e -c cgi-bin/orcsolar/config -i html
-v -f index.swish
Can I put -p swishdescription somewhere in that line?  If so where?

I'm sorry I am having so much trouble trying to get all this to work.
Thanks
for your help.

Dena Wolf
Web Developer
Organization Resources Counselors, Inc.
212-852-0387

E-mail: dena.wolf@orcinc.com
URL: http://www.orcinc.com
Received on Tue Nov 19 20:47:47 2002