This post was supposed to talk about setting up a web server and configuring
Webalizer, but things have changed in the lovey-stats
landscape since I last
posted.
It is now a two-pronged approach, with the log downloading split off into its
own cron job. I wrote a quick one-liner kinda script using aws-cli
’s sync
command to pull down logs every hour, then lovey-stats
just runs its log
aggregating stuff on the synced log directory.
# S3-SYNC
#!/bin/bash
export AWS_CONFIG_FILE="/path/to/.aws/config" # use this aws config
/usr/local/bin/aws s3 sync s3://BUCKET/logs/ /path/to/sync/log/ --delete
Run this in cron hourly to have a tidy copy of your logs local. On a Pi Model 1, this takes about 10 minutes on a directory with 6 months worth of logs.
For lovey-stats
, I changed the default to always run an entire month’s worth
of stats. Before it would just run the current day, but that wasn’t very robust;
if it errored out on a run or didn’t run properly, I would often have to go in
and manually run the script to make sure I was getting the correct statistics.
This also eliminates the need for complex logic to rotate the access.log from
which Webalizer generates its charts. I had an issue before with certain log
entries getting repeated at one point, which skewed the download numbers, so
that’s gone now.
This makes the stats run quite a bit less efficient, but the robustness is appreciated. Now instead of having 24 hours to rectify an issue before having to go manual, I have 28-31 days. Here’s how I set the date now:
HOUR="$(date +%-H)"
if [ ! -z "$1" ]; then
DATE="$1"
elif [ "$HOUR" -lt 10 ]; then
DATE="$(date --date=yesterday +%Y-%m)"
else
DATE="$(date +%y-%m)"
fi
Just realized that this is hard-coded for the Japan Standard time zone…I should file a bug report to make this TZ-agnostic!
Local web serving and Webalizer configuring in another post. Just a hint: get yourself a Raspberry Pi to handle this stuff for you. Even a model 1 works.