Python Download Counter
Objective
Set up an automatic download counter for ISO files hosted on downloads.linuxvillage.org, with:
- Daily parsing of Apache logs
- Deduplication by IP + file + day (to avoid false duplicates from HTTP 206 Partial Content requests in resumed downloads)
- Month-by-month counter aggregation in a JSON file
- Automatic generation of a public page listing available ISOs
- Automatic generation of a private monthly statistics page
Prerequisites
- Python 3 installed (/usr/bin/python3)
- Root access to the server
- Apache logs available at /var/log/apache2/downloads-ssl-access.log*
- cron package installed: apt install cron
Script Installation
Create the directory and place the script there (attachment to this page, to be renamed without the final .txt):
# Place download_stats.py in this directory
chmod 750 /opt/download_stats/download_stats.py
.stats Directory Structure
The script automatically creates and manages the hidden .stats directory at the root of the download space:
This directory will contain:
- counts.json : cumulative history of monthly counters
- index.html : private statistics page
The statistics page is accessible at the URL:
https://downloads.linuxvillage.org/.stats/
This URL is accessible but not publicly referenced.
Cron Configuration
As root:
Add the following line (daily execution at 01:00 UTC):
JSON Reset
If needed (structure change, counter reset):
/usr/bin/python3 /opt/download_stats/download_stats.py
Generated Pages
The script generates two HTML pages on each run:
- Public page
- /var/www/file-server/index.html → lists available ISOs with their size, upload date, and links to checksums (.md5, .sha512). Content is automatically detected by directory scanning. The page follows the graphic identity of https://linuxvillage.org (colors, fonts, layout).
- Statistics page (private)
- /var/www/file-server/.stats/index.html → table of downloads month by month, with per-file totals and grand total.
Script Logic
- Log parsing
- The script scans all downloads-ssl-access.log* files (including gzipped files from log rotation). Only GET and HEAD requests for .iso files with HTTP status codes 200 or 206 are processed.
- Deduplication
- For each retained line, a key (ip, file, day) is constructed. If this key has already been seen in the same run, the line is ignored. This prevents counting a resumed download multiple times when it generates multiple 206 requests.
- Monthly JSON structure
- Counters are stored by file and by month (YYYY-MM). On each run, the script merges new counters with the historical data by taking the maximum of the two values → this preserves data prior to the log retention window (14 days) without creating double-counting.
- Directory scanning
- The list of files displayed on the HTML pages is built dynamically on each run. Adding a new ISO version requires no manual intervention.
Attachment
The download_stats.py.txt file attached to this page is the source Python script.
Rename to download_stats.py after download, then: