Skip to main content.
home | support | download

Back to List Archive

Re: Adding files from external site - suggestions?

From: Rob de Santos AFANA <rdesantos(at)not-real.afana.com>
Date: Mon Apr 19 2004 - 16:24:00 GMT
> On Wed, Apr 14, 2004 at 09:05:58AM -0700, Rob de Santos AFANA wrote:
> > This is done.  All the files are .asp files but saved as .asp.html
to 
> > make them visible to Swish-e.
>
> Bill Moseley wrote: 
> That should not be necessary.  Swish doesn't do anything 
> special with ".html" files unless told to.

Understood.  This is easily changed via the wget options.  

> > The problem now is that it does not appear that Swish-e is indexing 
> > the necessary directory in total: 
> > http://www.afana.com/www.othersite.com/afl/
> 
> You can use -v (indexing verbose) to see what files are being 
> indexed. You can also use -T properties to list the files as 
> they are indexed. So you should be able to see what files are 
> indexed.  Use -T and -v and you might get an idea how 
> ReplaceRules is working.

It seems ReplaceRules is working just fine.  Because I am using -S prog
and not -S fs (see below) not all the files in the directory in question
are indexed, but the rest of the site is spidered just fine and indexed.

 
> > Apparently, the other 600 files in my directory are skipped.
Because 
> > they are extracted from the dynamically generated pages at the other

> > site they aren't necessarily linked in a "spiderable" chain from the

> > index file but all of them need to be indexed.
> 
> Makes sense.  So either use -S fs method to index (instead of 
> spidering) or maybe try the --convert-links option of wget.  
> Read the wget man page for details.

I know about --convert-links and it doesn't do what I need.  It's simply
a matter of getting this one directory included in the index at this
point.  Wget is getting all the files, the rest of the swish-e index is
working just fine.  

So, is there a way via the configuration file to tell Swish-e to index
this one directory via the "fs" method? and still do the rest of the
site via spidering? Or do I need to run two indexes, merge them, and
rename it [I gather from reading the docs that when using swish-e -m the
out_index must not previously exist, so it would have to be renamed each
time to the one used for searching.] 

Perhaps I can use multiple configuration files so swish-e does each task
in one indexing job? Thanks in advance for any advice.

Regards, 

-Rob
AFANA.com
Received on Mon Apr 19 09:24:00 2004