On Fri, Mar 16, 2007 at 05:25:53PM +1300, Lesley Walker wrote:
> However, if we do that, we get a zillion warning messages saying that
> null characters have been been substituted with newlines in .gif files.
> This would seem to indicate that Swish-e is attempting (rather
> uselessly) to index the content of the .gif files.
That's a bug, I suspect.
> > IndexOnly .html .htm .txt .cnt .gif .shm .xbm .au .mov .mpg .doc .pdf
> > NoContents .gif .xbm .au .mov .mpg
You have to tell swish that it is not a HTML file -- and it defaults
to assume that everything you are indexing is HTML.
Try adding:
IndexContents TXT .gif .xbm .au .mov .mpg
You will still get the warning about the embedded null chars.
I suspect you could get the swish-e source and modify file.c to not
do that substitution if the file is flagged index_no_contents.
That code to substitute nulls was problematic from the start many
years ago. Maybe this is all it takes:
Index: src/file.c
===================================================================
--- src/file.c (revision 1899)
+++ src/file.c (working copy)
@@ -279,7 +279,7 @@
/* JFP - substitute null chars, VFC record may have null char in reclen word, try to discard them */
- if ( is_text && strlen( (char *)buffer ) < bytes_read )
+ if ( !fprop->index_no_content && is_text && strlen( (char *)buffer ) < bytes_read )
{
int i;
int j = 0;
Do you really need to index your file names? It's not a feature that
seems to be used very often.
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Mar 16 02:08:47 2007