Skip to main content.
home | support | download

Back to List Archive

[swish-e] swish-e not indexing symbolically linked directories

From: Cutts III, James H. <CuttsJ(at)not-real.missouri.edu>
Date: Tue Feb 26 2008 - 17:57:56 GMT
I am having an interesting problem where swish-e has stopped indexing
files after I moved them to a new disk and used a symbolic link to
reference the directory.  (It's all an effort to manage disk space.)
The discussion below has a very Linux bias, but since it's only swish-e
that is having problems with the symbolically linked directories, I
thought I'd post the problem here first.

Here is the setup:
  OS: Linux
  Swish-e Version: 2.4.3 (Yes, I know there's a newer version.)
  Directory Structure: (I've used ... to shorten the paths)
  /usr/.../docs/
  /usr/.../docs/kbase => symbolic link 
       => /usr1/.../docs/kbase/

  /usr1/.../docs/kbase/SEC/
  /usr1/.../docs/kbase/SEC/2000/ 
  /usr1/.../docs/kbase/SEC/2001/ 
  /usr1/.../docs/kbase/SEC/2002/
  /usr1/.../docs/kbase/SEC/2003/
  /usr1/.../docs/kbase/SEC/2004/  

I have parallel directory structures on 3 different disks (/usr/,
/usr1/, /usr2/).  Disk /usr/ is near capacity. Disk /usr1/ is near
capacity. Disk /usr2/ had a bunch of space.

My swish-e setup was working very well.  I could successfully index
files with commands similar to:

swish-e -e -i /usr/.../docs/kbase/SEC/2001/ -f cori_2001.index -c
cori_only.conf -S fs
swish-e -e -i /usr/.../docs/kbase/SEC/2002/ -f cori_2002.index -c
cori_only.conf -S fs
swish-e -e -i /usr/.../docs/kbase/SEC/2003/ -f cori_2003.index -c
cori_only.conf -S fs

Notice that these commands refer to /usr/.../docs/kbase  which is a
symbolic link to
/usr1/...docs/kbase/.

I thought - /usr1 is running out of disk space, I can move some of the
files to /usr2, drop in a symbolic link, just like I did with the
/kbase/ directory and I'll have more room to fill up.  So I did the
following steps:
1) I build a directory structure on /usr2/ parallel to the directory
structure on /usr/ (and /usr1/).
2) Created directories: /usr2/.../docs/kbase/SEC/2002/ and
/usr2/.../docs/kbase/SEC/2003/. 
3) Moved the files from the /usr1/ directory to the parallel directory
on /usr2/.
4) Deleted the 2002/ and 2003/directories on /usr1. 
5) Created the symbolic links in /usr1/.../docs/kbase/SEC/ to point to
the /usr2/ directories.
6) I tested the ability to get to the files from the OS using the
original path of /usr/.../docs/kbase/SEC/2001.  I was able to list the
files (ls) using the /usr/... path, I was able to change to the
directory (cd) using the /usr/... path.  I was able to look at the files
(cat) using the /usr/... path.
7) These directories are published on our web site, so I made sure I was
able to get to the files from the web server using the original /usr/...
path.  It worked.

  Revised Directory Structure:
  /usr/.../docs/
  /usr/.../docs/kbase => symbolic link 
       => /usr1/.../docs/kbase/

  /usr1/.../docs/kbase/SEC/
  /usr1/.../docs/kbase/SEC/2000/ 
  /usr1/.../docs/kbase/SEC/2001/ 
  /usr1/.../docs/kbase/SEC/2002/ => symbolic link 
       => /usr2/.../docs/kbase/SEC/2003/
  /usr1/.../docs/kbase/SEC/2003/ => symbolic link 
       => /usr2/.../docs/kbase/SEC/2003/
  /usr1/.../docs/kbase/SEC/2004/  

Since the OS and the Web Server were both able to see the files I
assumed everything are good to go.  Then over night my swish-e indexing
script ran.  It FAILED to index the two directories that I moved.  I
didn't change the swish-e indexing script because it was working prior
to the move and since the OS was still locating the files with the same
path, I assumed that swish-e using the -S fs (file system) switch would
also have no problem with locating the files.

However, swish-e doesn't seem to find any files in the directories.
When I run the above swish-e command for the /2002/ directory I get
back:
  Indexing Data Source: "File-System"
  Indexing "/usr/.../docs/kbase/SEC/2002/"

  Checking dir "/usr/.../docs/kbase/SEC/2002"...

  Removing very common words...
  no words removed.
  Writing main index...
  err: No unique words indexed!
  .

Not what I was expecting!

When I changed the command to specify the correct physical path:
  swish-e -e -i /usr2/.../docs/kbase/SEC/2002/ -f cori_2002.index -c
cori_only.conf -S fs
Everything worked well.

  Parsing config file '/usr/swish-e/cori_only-swish-e.conf'
  Indexing Data Source: "File-System"
  Indexing "/usr2/.../docs/kbase/SEC/2002/"

  Checking dir "/usr2/.../docs/kbase/SEC/2002"...

  In dir "/usr2/.../docs/kbase/SEC/2002/2002-01":
    18880_ex10_7.html - Using HTML2 parser -  (3504 words)
  ...<snip>...
    460412_w67576exv10w2.html - Using HTML2 parser -  (2 words)

  Removing very common words...
  no words removed.
  Writing main index...
  Sorting words ...
  Sorting 132,954 words alphabetically
  Writing header ...
  Writing index entries ...
    Writing word text: Complete
    Writing word hash: Complete
    Writing word data: Complete
  132,954 unique words indexed.
  4 properties sorted.
  51,485 files indexed.  3,132,720,547 total bytes.  66,696,448 total
words.
  Elapsed time: 00:11:47 CPU time: 00:10:15
  Indexing done!

While it's not a really big issue to modify the swish-e script to point
to the physical directory, I am confused why swish-e could NOT follow
the path to the files when there were 2 (two) symbolic links in the
path, but it worked very nicely with just 2 (one) symbolic link present.
Any thoughts?

Thanks,

James H. Cutts III

Computer Project Manager
Contracting and Organizations Research Institute
<http://cori.missouri.edu/> 
University of Missouri - Columbia
143C Mumford Hall
Columbia, MO
65211
 
Phone: (573) 882-6181
E-mail: cuttsj@missouri.edu <mailto:cuttsj@missouri.edu> 
 
Programming is the eternal competition between programmers who try to
make apps more and more idiot proof and the universe that makes dumber
idiots. So far, the universe is winning... 
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Feb 26 12:57:59 2008