I am having an interesting problem where swish-e has stopped indexing
files after I moved them to a new disk and used a symbolic link to
reference the directory. (It's all an effort to manage disk space.)
The discussion below has a very Linux bias, but since it's only swish-e
that is having problems with the symbolically linked directories, I
thought I'd post the problem here first.
Here is the setup:
OS: Linux
Swish-e Version: 2.4.3 (Yes, I know there's a newer version.)
Directory Structure: (I've used ... to shorten the paths)
/usr/.../docs/
/usr/.../docs/kbase => symbolic link
=> /usr1/.../docs/kbase/
/usr1/.../docs/kbase/SEC/
/usr1/.../docs/kbase/SEC/2000/
/usr1/.../docs/kbase/SEC/2001/
/usr1/.../docs/kbase/SEC/2002/
/usr1/.../docs/kbase/SEC/2003/
/usr1/.../docs/kbase/SEC/2004/
I have parallel directory structures on 3 different disks (/usr/,
/usr1/, /usr2/). Disk /usr/ is near capacity. Disk /usr1/ is near
capacity. Disk /usr2/ had a bunch of space.
My swish-e setup was working very well. I could successfully index
files with commands similar to:
swish-e -e -i /usr/.../docs/kbase/SEC/2001/ -f cori_2001.index -c
cori_only.conf -S fs
swish-e -e -i /usr/.../docs/kbase/SEC/2002/ -f cori_2002.index -c
cori_only.conf -S fs
swish-e -e -i /usr/.../docs/kbase/SEC/2003/ -f cori_2003.index -c
cori_only.conf -S fs
Notice that these commands refer to /usr/.../docs/kbase which is a
symbolic link to
/usr1/...docs/kbase/.
I thought - /usr1 is running out of disk space, I can move some of the
files to /usr2, drop in a symbolic link, just like I did with the
/kbase/ directory and I'll have more room to fill up. So I did the
following steps:
1) I build a directory structure on /usr2/ parallel to the directory
structure on /usr/ (and /usr1/).
2) Created directories: /usr2/.../docs/kbase/SEC/2002/ and
/usr2/.../docs/kbase/SEC/2003/.
3) Moved the files from the /usr1/ directory to the parallel directory
on /usr2/.
4) Deleted the 2002/ and 2003/directories on /usr1.
5) Created the symbolic links in /usr1/.../docs/kbase/SEC/ to point to
the /usr2/ directories.
6) I tested the ability to get to the files from the OS using the
original path of /usr/.../docs/kbase/SEC/2001. I was able to list the
files (ls) using the /usr/... path, I was able to change to the
directory (cd) using the /usr/... path. I was able to look at the files
(cat) using the /usr/... path.
7) These directories are published on our web site, so I made sure I was
able to get to the files from the web server using the original /usr/...
path. It worked.
Revised Directory Structure:
/usr/.../docs/
/usr/.../docs/kbase => symbolic link
=> /usr1/.../docs/kbase/
/usr1/.../docs/kbase/SEC/
/usr1/.../docs/kbase/SEC/2000/
/usr1/.../docs/kbase/SEC/2001/
/usr1/.../docs/kbase/SEC/2002/ => symbolic link
=> /usr2/.../docs/kbase/SEC/2003/
/usr1/.../docs/kbase/SEC/2003/ => symbolic link
=> /usr2/.../docs/kbase/SEC/2003/
/usr1/.../docs/kbase/SEC/2004/
Since the OS and the Web Server were both able to see the files I
assumed everything are good to go. Then over night my swish-e indexing
script ran. It FAILED to index the two directories that I moved. I
didn't change the swish-e indexing script because it was working prior
to the move and since the OS was still locating the files with the same
path, I assumed that swish-e using the -S fs (file system) switch would
also have no problem with locating the files.
However, swish-e doesn't seem to find any files in the directories.
When I run the above swish-e command for the /2002/ directory I get
back:
Indexing Data Source: "File-System"
Indexing "/usr/.../docs/kbase/SEC/2002/"
Checking dir "/usr/.../docs/kbase/SEC/2002"...
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
.
Not what I was expecting!
When I changed the command to specify the correct physical path:
swish-e -e -i /usr2/.../docs/kbase/SEC/2002/ -f cori_2002.index -c
cori_only.conf -S fs
Everything worked well.
Parsing config file '/usr/swish-e/cori_only-swish-e.conf'
Indexing Data Source: "File-System"
Indexing "/usr2/.../docs/kbase/SEC/2002/"
Checking dir "/usr2/.../docs/kbase/SEC/2002"...
In dir "/usr2/.../docs/kbase/SEC/2002/2002-01":
18880_ex10_7.html - Using HTML2 parser - (3504 words)
...<snip>...
460412_w67576exv10w2.html - Using HTML2 parser - (2 words)
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 132,954 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
132,954 unique words indexed.
4 properties sorted.
51,485 files indexed. 3,132,720,547 total bytes. 66,696,448 total
words.
Elapsed time: 00:11:47 CPU time: 00:10:15
Indexing done!
While it's not a really big issue to modify the swish-e script to point
to the physical directory, I am confused why swish-e could NOT follow
the path to the files when there were 2 (two) symbolic links in the
path, but it worked very nicely with just 2 (one) symbolic link present.
Any thoughts?
Thanks,
James H. Cutts III
Computer Project Manager
Contracting and Organizations Research Institute
<http://cori.missouri.edu/>
University of Missouri - Columbia
143C Mumford Hall
Columbia, MO
65211
Phone: (573) 882-6181
E-mail: cuttsj@missouri.edu <mailto:cuttsj@missouri.edu>
Programming is the eternal competition between programmers who try to
make apps more and more idiot proof and the universe that makes dumber
idiots. So far, the universe is winning...
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Feb 26 12:57:59 2008