Skip to main content.
home | support | download

Back to List Archive

Re: Skipping articles while sorting

From: Thomas Dowling <tdowling(at)>
Date: Fri Jan 02 2004 - 22:33:11 GMT
Bill Moseley wrote:

>I'm planning on adding a feature to ignore articles while sorting.  The
>setting would be per-property.  So, for example:
>  SkipArticles swishtitle the a an
>Here's your chance for input:
>1) What should the config directive be called?  Is SkipArticles ok?
>2) Should it remove more than one word?  That is, should a title:
>   <title>The A An Word</title>
>be sorted as "Word" or as "A An Word"?
>3) I have not decided if this feature will only work during indexing
>or not.  If it works only during indexing then sorting when searching
>*multiple* indexes might not work as expected (because the indexes would
>be sorted differently when indexing and when searching).  But, I also
>want to keep the sorting code as lean as possible for speed reasons.

FWIW, this can be a frustratingly difficult job to tackle.  There will 
inevitably be documents with titles like "A & P hiring cashiers, 
baggers" or the famous "THE Journal" (Texas Higher Education, if memory 
serves).  You also need to think now about multilingual support, which 
opens cans of worms like interpreting if "Die" is a German article or an 
English word.

This is why libraries gave up on the job years ago and rely on catalog 
records to tell any sorting routine how many characters to skip over.  :-\

I'd recommend doing any article trimming at sort time rather than index 
time, with a way in the user interface to turn it off (or on, depending 
on what the default is set to)

Thomas Dowling
OhioLINK - Ohio Library and Information Network
Received on Fri Jan 2 22:33:24 2004