Re: [swish-e] Indexing BLOG contents

From: Peter Karman <peter(at)>
Date: Mon Mar 26 2007 - 20:29:02 GMT
Don Hamilton scribbled on 3/26/07 3:09 PM:
> Hi List.
> I must be missing something obvious, but...
> Part of our website is a collection of wordpress blogs. I am
> told/asked:  'google finds them! why doesn't swish-e?'. I can't see an
> obvious 'directive' to make this happen.

WordPress stores it's data in a database (mysql iirc). So you'd need to either 
(a) spider your site using the -S prog/ method, or (b) pull the blog 
text directly from the db and pass it to -S prog (see SWISH::Prog::DBI on the 
CPAN or the example mysql script in the distrib).

> IndexFile abcindex
> IndexDir e:\libraryweb\public_html
> IndexOnly .htm .html .shtml .pdf
> IndexContents HTML .html .htm .shtml
> FileFilter .pdf       c:/swish-e/lib/swish-e/pdftotext.exe   "%p -"
> FollowSymLinks No
> FileRules dirname contains xxx
> FileRules dirname contains yyy
> FileRules dirname contains  aaa
> FileRules filename contains bbbb

the config above looks like you are just indexing the filesystem. Your WordPress 
blog entries aren't stored there, I'm guessing.

