Last modified by Mélodie on 2026/05/07 13:56

Show last authors
1 == Objective ==
2
3 Set up an automatic download counter for ISO files hosted on ##downloads.linuxvillage.org##, with:
4
5 * Daily parsing of Apache logs
6 * Deduplication by IP + file + day (to avoid false duplicates from HTTP 206 Partial Content requests in resumed downloads)
7 * Month-by-month counter aggregation in a JSON file
8 * Automatic generation of a public page listing available ISOs
9 * Automatic generation of a private monthly statistics page
10
11 == Prerequisites ==
12
13 * Python 3 installed (##/usr/bin/python3##)
14 * Root access to the server
15 * Apache logs available at ##/var/log/apache2/downloads-ssl-access.log*##
16 * ##cron## package installed: ##apt install cron##
17
18 == Script Installation ==
19
20 Create the directory and place the script there (attachment to this page, to be renamed without the final ##.txt##):
21
22 {{code language="bash"}}
23 mkdir -p /opt/download_stats
24 # Place download_stats.py in this directory
25 chmod 750 /opt/download_stats/download_stats.py
26 {{/code}}
27
28 == .stats Directory Structure ==
29
30 The script automatically creates and manages the hidden ##.stats## directory at the root of the download space:
31
32 {{code language="bash"}}
33 mkdir /var/www/file-server/.stats
34 {{/code}}
35
36 This directory will contain:
37
38 * ##counts.json## : cumulative history of monthly counters
39 * ##index.html## : private statistics page
40
41 The statistics page is accessible at the URL:
42 ##https:~/~/downloads.linuxvillage.org/.stats/##
43
44 This URL is accessible but not publicly referenced.
45
46 == Cron Configuration ==
47
48 As root:
49
50 {{code language="bash"}}
51 crontab -e
52 {{/code}}
53
54 Add the following line (daily execution at 01:00 UTC):
55
56 {{code language="bash"}}
57 0 1 * * * /usr/bin/python3 /opt/download_stats/download_stats.py
58 {{/code}}
59
60 == JSON Reset ==
61
62 If needed (structure change, counter reset):
63
64 {{code language="bash"}}
65 echo '{}' > /var/www/file-server/.stats/counts.json
66 /usr/bin/python3 /opt/download_stats/download_stats.py
67 {{/code}}
68
69 == Generated Pages ==
70
71 The script generates two HTML pages on each run:
72
73 ; Public page
74 : ##/var/www/file-server/index.html## → lists available ISOs with their size, upload date, and links to checksums (.md5, .sha512). Content is automatically detected by directory scanning. The page follows the graphic identity of https://linuxvillage.org (colors, fonts, layout).
75
76 ; Statistics page (private)
77 : ##/var/www/file-server/.stats/index.html## → table of downloads month by month, with per-file totals and grand total.
78
79 == Script Logic ==
80
81 ; Log parsing
82 : The script scans all ##downloads-ssl-access.log*## files (including gzipped files from log rotation). Only GET and HEAD requests for ##.iso## files with HTTP status codes 200 or 206 are processed.
83
84 ; Deduplication
85 : For each retained line, a key ##(ip, file, day)## is constructed. If this key has already been seen in the same run, the line is ignored. This prevents counting a resumed download multiple times when it generates multiple 206 requests.
86
87 ; Monthly JSON structure
88 : Counters are stored by file and by month (##YYYY-MM##). On each run, the script merges new counters with the historical data by taking the maximum of the two values → this preserves data prior to the log retention window (14 days) without creating double-counting.
89
90 ; Directory scanning
91 : The list of files displayed on the HTML pages is built dynamically on each run. Adding a new ISO version requires no manual intervention.
92
93 == Attachment ==
94
95 The ##download_stats_en.py.txt## file attached to this page is the source Python script.
96 Rename to ##download_stats_en.py## after download, then:
97
98 {{code language="bash"}}
99 chmod 750 /opt/download_stats/download_stats_en.py
100 {{/code}}

Langues / Languages

🇫🇷 Français | 🇬🇧 English