This post was supposed to talk about setting up a web server and configuring Webalizer, but things have changed in the lovey-stats landscape since I last posted.

It is now a two-pronged approach, with the log downloading split off into its own cron job. I wrote a quick one-liner kinda script using aws-cli’s sync command to pull down logs every hour, then lovey-stats just runs its log aggregating stuff on the synced log directory.

# S3-SYNC
#!/bin/bash
export AWS_CONFIG_FILE="/path/to/.aws/config" # use this aws config
/usr/local/bin/aws s3 sync s3://BUCKET/logs/ /path/to/sync/log/ --delete

Run this in cron hourly to have a tidy copy of your logs local. On a Pi Model 1, this takes about 10 minutes on a directory with 6 months worth of logs.

For lovey-stats, I changed the default to always run an entire month’s worth of stats. Before it would just run the current day, but that wasn’t very robust; if it errored out on a run or didn’t run properly, I would often have to go in and manually run the script to make sure I was getting the correct statistics. This also eliminates the need for complex logic to rotate the access.log from which Webalizer generates its charts. I had an issue before with certain log entries getting repeated at one point, which skewed the download numbers, so that’s gone now.

This makes the stats run quite a bit less efficient, but the robustness is appreciated. Now instead of having 24 hours to rectify an issue before having to go manual, I have 28-31 days. Here’s how I set the date now:

HOUR="$(date +%-H)"
if [ ! -z "$1" ]; then
    DATE="$1"
elif [ "$HOUR" -lt 10 ]; then
    DATE="$(date --date=yesterday +%Y-%m)"
else
    DATE="$(date +%y-%m)"
fi

Just realized that this is hard-coded for the Japan Standard time zone…I should file a bug report to make this TZ-agnostic!

Local web serving and Webalizer configuring in another post. Just a hint: get yourself a Raspberry Pi to handle this stuff for you. Even a model 1 works.