On Thu, May 29, 2003 at 04:31:45AM -0700, Patrick Tinley wrote:
> Hi all,
>
> I'm just wondering if its possible to have the Summary Report
> (see below) which appears at the end of the indexing process
> detached & e-mailed to myself.
> Indexing is on a weekly cronjob.
> I'd like to keep a record of how the site grows.
Others may be better at answering this.
Cron normally emails any output to the user (see man 5 crontab if you
can set the email address).
You have output from two things below:
> Summary for: http://www.mysite.com/index.shtml
> Duplicates: 535,248 (214.4/sec)
> Off-site links: 12,996 (5.2/sec)
> Total Bytes: 246,865,706 (98904.5/sec)
> Total Docs: 16,824 (6.7/sec)
> Unique URLs: 17,329 (6.9/sec)
That's from spider.pl and it is written to stderr. That output can be
disabled by setting SPIDER_QUIET=1 in your environment when indexing.
> Removing very common words...
[...]
> 66602 unique words indexed.
> 9 properties sorted.
> 16824 files indexed. 246865706 total bytes. 12192895 total words.
> Elapsed time: 00:42:28 CPU time: 00:04:02
And that's from the swish-e binary and, by default, is written to
stdout. You can use the -E <file> to append swish-e's output to a file,
or without <file> send the output to stderr instead of stdout. Or just
redirect stdout to a file.
So that gives you a few options since you can pick which output goes
where. You might capture the spider output one place and the swish-e
summary someplace else.
Note that when indexing swish-e shows a progress report and uses \r to
overwrite its percent complete messages. Those will be ugly so you will
probably want to filter swish-e's output.
What I'd do is save output to a file while indexing. If swish-e exits
with a non-zero exit code then email the entire file (just use cat in
cron and then it should automatically be sent). If swish-e exits
without an error exit code then use grep to extract out just the data
you want emailed, or write a little script to append interesting data to
a (comma separated values?) file for later processing.
--
Bill Moseley
moseley@hank.org
Received on Thu May 29 13:04:21 2003