Skip to main content.
home | support | download

Back to List Archive

[swish-e] swish-e not indexing symbolically linked directories

From: Cutts III, James H. <CuttsJ(at)>
Date: Tue Feb 26 2008 - 17:57:56 GMT
I am having an interesting problem where swish-e has stopped indexing
files after I moved them to a new disk and used a symbolic link to
reference the directory.  (It's all an effort to manage disk space.)
The discussion below has a very Linux bias, but since it's only swish-e
that is having problems with the symbolically linked directories, I
thought I'd post the problem here first.

Here is the setup:
  OS: Linux
  Swish-e Version: 2.4.3 (Yes, I know there's a newer version.)
  Directory Structure: (I've used ... to shorten the paths)
  /usr/.../docs/kbase => symbolic link 
       => /usr1/.../docs/kbase/


I have parallel directory structures on 3 different disks (/usr/,
/usr1/, /usr2/).  Disk /usr/ is near capacity. Disk /usr1/ is near
capacity. Disk /usr2/ had a bunch of space.

My swish-e setup was working very well.  I could successfully index
files with commands similar to:

swish-e -e -i /usr/.../docs/kbase/SEC/2001/ -f cori_2001.index -c
cori_only.conf -S fs
swish-e -e -i /usr/.../docs/kbase/SEC/2002/ -f cori_2002.index -c
cori_only.conf -S fs
swish-e -e -i /usr/.../docs/kbase/SEC/2003/ -f cori_2003.index -c
cori_only.conf -S fs

Notice that these commands refer to /usr/.../docs/kbase  which is a
symbolic link to

I thought - /usr1 is running out of disk space, I can move some of the
files to /usr2, drop in a symbolic link, just like I did with the
/kbase/ directory and I'll have more room to fill up.  So I did the
following steps:
1) I build a directory structure on /usr2/ parallel to the directory
structure on /usr/ (and /usr1/).
2) Created directories: /usr2/.../docs/kbase/SEC/2002/ and
3) Moved the files from the /usr1/ directory to the parallel directory
on /usr2/.
4) Deleted the 2002/ and 2003/directories on /usr1. 
5) Created the symbolic links in /usr1/.../docs/kbase/SEC/ to point to
the /usr2/ directories.
6) I tested the ability to get to the files from the OS using the
original path of /usr/.../docs/kbase/SEC/2001.  I was able to list the
files (ls) using the /usr/... path, I was able to change to the
directory (cd) using the /usr/... path.  I was able to look at the files
(cat) using the /usr/... path.
7) These directories are published on our web site, so I made sure I was
able to get to the files from the web server using the original /usr/...
path.  It worked.

  Revised Directory Structure:
  /usr/.../docs/kbase => symbolic link 
       => /usr1/.../docs/kbase/

  /usr1/.../docs/kbase/SEC/2002/ => symbolic link 
       => /usr2/.../docs/kbase/SEC/2003/
  /usr1/.../docs/kbase/SEC/2003/ => symbolic link 
       => /usr2/.../docs/kbase/SEC/2003/

Since the OS and the Web Server were both able to see the files I
assumed everything are good to go.  Then over night my swish-e indexing
script ran.  It FAILED to index the two directories that I moved.  I
didn't change the swish-e indexing script because it was working prior
to the move and since the OS was still locating the files with the same
path, I assumed that swish-e using the -S fs (file system) switch would
also have no problem with locating the files.

However, swish-e doesn't seem to find any files in the directories.
When I run the above swish-e command for the /2002/ directory I get
  Indexing Data Source: "File-System"
  Indexing "/usr/.../docs/kbase/SEC/2002/"

  Checking dir "/usr/.../docs/kbase/SEC/2002"...

  Removing very common words...
  no words removed.
  Writing main index...
  err: No unique words indexed!

Not what I was expecting!

When I changed the command to specify the correct physical path:
  swish-e -e -i /usr2/.../docs/kbase/SEC/2002/ -f cori_2002.index -c
cori_only.conf -S fs
Everything worked well.

  Parsing config file '/usr/swish-e/cori_only-swish-e.conf'
  Indexing Data Source: "File-System"
  Indexing "/usr2/.../docs/kbase/SEC/2002/"

  Checking dir "/usr2/.../docs/kbase/SEC/2002"...

  In dir "/usr2/.../docs/kbase/SEC/2002/2002-01":
    18880_ex10_7.html - Using HTML2 parser -  (3504 words)
    460412_w67576exv10w2.html - Using HTML2 parser -  (2 words)

  Removing very common words...
  no words removed.
  Writing main index...
  Sorting words ...
  Sorting 132,954 words alphabetically
  Writing header ...
  Writing index entries ...
    Writing word text: Complete
    Writing word hash: Complete
    Writing word data: Complete
  132,954 unique words indexed.
  4 properties sorted.
  51,485 files indexed.  3,132,720,547 total bytes.  66,696,448 total
  Elapsed time: 00:11:47 CPU time: 00:10:15
  Indexing done!

While it's not a really big issue to modify the swish-e script to point
to the physical directory, I am confused why swish-e could NOT follow
the path to the files when there were 2 (two) symbolic links in the
path, but it worked very nicely with just 2 (one) symbolic link present.
Any thoughts?


James H. Cutts III

Computer Project Manager
Contracting and Organizations Research Institute
University of Missouri - Columbia
143C Mumford Hall
Columbia, MO
Phone: (573) 882-6181
E-mail: <> 
Programming is the eternal competition between programmers who try to
make apps more and more idiot proof and the universe that makes dumber
idiots. So far, the universe is winning... 
Users mailing list
Received on Tue Feb 26 12:57:59 2008