Wiki source code of Compteur de téléchargements Python
Last modified by Mélodie on 2026/05/07 13:56
Show last authors
| author | version | line-number | content |
|---|---|---|---|
| 1 | == Objective == | ||
| 2 | |||
| 3 | Set up an automatic download counter for ISO files hosted on ##downloads.linuxvillage.org##, with: | ||
| 4 | |||
| 5 | * Daily parsing of Apache logs | ||
| 6 | * Deduplication by IP + file + day (to avoid false duplicates from HTTP 206 Partial Content requests in resumed downloads) | ||
| 7 | * Month-by-month counter aggregation in a JSON file | ||
| 8 | * Automatic generation of a public page listing available ISOs | ||
| 9 | * Automatic generation of a private monthly statistics page | ||
| 10 | |||
| 11 | == Prerequisites == | ||
| 12 | |||
| 13 | * Python 3 installed (##/usr/bin/python3##) | ||
| 14 | * Root access to the server | ||
| 15 | * Apache logs available at ##/var/log/apache2/downloads-ssl-access.log*## | ||
| 16 | * ##cron## package installed: ##apt install cron## | ||
| 17 | |||
| 18 | == Script Installation == | ||
| 19 | |||
| 20 | Create the directory and place the script there (attachment to this page, to be renamed without the final ##.txt##): | ||
| 21 | |||
| 22 | {{code language="bash"}} | ||
| 23 | mkdir -p /opt/download_stats | ||
| 24 | # Place download_stats.py in this directory | ||
| 25 | chmod 750 /opt/download_stats/download_stats.py | ||
| 26 | {{/code}} | ||
| 27 | |||
| 28 | == .stats Directory Structure == | ||
| 29 | |||
| 30 | The script automatically creates and manages the hidden ##.stats## directory at the root of the download space: | ||
| 31 | |||
| 32 | {{code language="bash"}} | ||
| 33 | mkdir /var/www/file-server/.stats | ||
| 34 | {{/code}} | ||
| 35 | |||
| 36 | This directory will contain: | ||
| 37 | |||
| 38 | * ##counts.json## : cumulative history of monthly counters | ||
| 39 | * ##index.html## : private statistics page | ||
| 40 | |||
| 41 | The statistics page is accessible at the URL: | ||
| 42 | ##https:~/~/downloads.linuxvillage.org/.stats/## | ||
| 43 | |||
| 44 | This URL is accessible but not publicly referenced. | ||
| 45 | |||
| 46 | == Cron Configuration == | ||
| 47 | |||
| 48 | As root: | ||
| 49 | |||
| 50 | {{code language="bash"}} | ||
| 51 | crontab -e | ||
| 52 | {{/code}} | ||
| 53 | |||
| 54 | Add the following line (daily execution at 01:00 UTC): | ||
| 55 | |||
| 56 | {{code language="bash"}} | ||
| 57 | 0 1 * * * /usr/bin/python3 /opt/download_stats/download_stats.py | ||
| 58 | {{/code}} | ||
| 59 | |||
| 60 | == JSON Reset == | ||
| 61 | |||
| 62 | If needed (structure change, counter reset): | ||
| 63 | |||
| 64 | {{code language="bash"}} | ||
| 65 | echo '{}' > /var/www/file-server/.stats/counts.json | ||
| 66 | /usr/bin/python3 /opt/download_stats/download_stats.py | ||
| 67 | {{/code}} | ||
| 68 | |||
| 69 | == Generated Pages == | ||
| 70 | |||
| 71 | The script generates two HTML pages on each run: | ||
| 72 | |||
| 73 | ; Public page | ||
| 74 | : ##/var/www/file-server/index.html## → lists available ISOs with their size, upload date, and links to checksums (.md5, .sha512). Content is automatically detected by directory scanning. The page follows the graphic identity of https://linuxvillage.org (colors, fonts, layout). | ||
| 75 | |||
| 76 | ; Statistics page (private) | ||
| 77 | : ##/var/www/file-server/.stats/index.html## → table of downloads month by month, with per-file totals and grand total. | ||
| 78 | |||
| 79 | == Script Logic == | ||
| 80 | |||
| 81 | ; Log parsing | ||
| 82 | : The script scans all ##downloads-ssl-access.log*## files (including gzipped files from log rotation). Only GET and HEAD requests for ##.iso## files with HTTP status codes 200 or 206 are processed. | ||
| 83 | |||
| 84 | ; Deduplication | ||
| 85 | : For each retained line, a key ##(ip, file, day)## is constructed. If this key has already been seen in the same run, the line is ignored. This prevents counting a resumed download multiple times when it generates multiple 206 requests. | ||
| 86 | |||
| 87 | ; Monthly JSON structure | ||
| 88 | : Counters are stored by file and by month (##YYYY-MM##). On each run, the script merges new counters with the historical data by taking the maximum of the two values → this preserves data prior to the log retention window (14 days) without creating double-counting. | ||
| 89 | |||
| 90 | ; Directory scanning | ||
| 91 | : The list of files displayed on the HTML pages is built dynamically on each run. Adding a new ISO version requires no manual intervention. | ||
| 92 | |||
| 93 | == Attachment == | ||
| 94 | |||
| 95 | The ##download_stats_en.py.txt## file attached to this page is the source Python script. | ||
| 96 | Rename to ##download_stats_en.py## after download, then: | ||
| 97 | |||
| 98 | {{code language="bash"}} | ||
| 99 | chmod 750 /opt/download_stats/download_stats_en.py | ||
| 100 | {{/code}} |