I am recovering from a nasty system crash (let's not go into that)
wherein I lost some of my files - including all of my swish-e config
stuff. After rebuilding everything, I'm seeing a problem that I have
seen before but for the life of me, I can't remember or figure out what
I did to solve it. I am prepared to be embarrassed by the solution (not
a new experience for me).
The problem is that I can't index my entire site. Indexing works on
some of the directories but not others. Here's a partial list of some
of my document root directories and files:
drwxr-xr-x 16 root root 4096 Dec 5 14:06 FRANK
drwxr-xr-x 13 root root 4096 Dec 4 15:45 Peggy-Sue
drwxr-xr-x 25 root root 8192 Dec 4 15:45 SFCC
drwxr-xr-x 17 root sys 4096 Jul 25 2002 TSC
drwxrwxrwx 9 root sys 16384 Dec 7 00:08 Weather
drwxr-xr-x 26 root sys 4096 Dec 5 15:19 emily
-rwxr--r-- 1 root fhunt 17784 Dec 6 22:43 index.html
I can index FRANK, Peggy-Sue, Weather and emily.
I cannot index SFCC, TSC or index.html
robots.txt works to block indexing (disallow: /FRANK/ works, etc) but it
doesn't matter what the entry for SFCC, TSC and index.html are:
This:
User-agent: *
Disallow: /FRANK/
Disallow: /SFCC/
Disallow: /TSC/
Disallow: /Peggy-Sue/
Disallow: /Weather/
Disallow: /index.html
works the same as this:
User-agent: *
Disallow: /FRANK/
##Disallow: /SFCC/
##Disallow: /TSC/
Disallow: /Peggy-Sue/
Disallow: /Weather/
##Disallow: /index.html
I'm running version 2.4.3 on RH 9 (2.4.20-31.9)
Here's the run string: /usr/local/bin/swish-e -S prog -c
/web/httpd/bin/swish_index/swish.conf
Here's the config file:
IndexDir spider.pl
IndexFile /web/httpd/bin/swish_index/index.swish-e
SwishProgParameters default http://www.frankhunt.com/
Metanames swishtitle swishdocpath
StoreDescription HTML* <body> 10000
IndexReport 3
I have checked the robots.txt file with an on-line syntax checker and it
is good. I can index other sites. I'm going crazy here.
Ideas?
--
Frank Hunt
Confused Linux Admin
General Nuisance
Web Weasel
Received on Wed Dec 7 07:39:05 2005