summaryrefslogtreecommitdiffstatsabout
path: root/README
diff options
context:
space:
mode:
Diffstat (limited to 'README')
-rw-r--r--README204
1 files changed, 204 insertions, 0 deletions
diff --git a/README b/README
new file mode 100644
index 0000000..23b8a91
--- /dev/null
+++ b/README
@@ -0,0 +1,204 @@
1$Id: README,v 1.1 2001/11/28 05:26:54 helios Exp $
2
3
4Homepage
5--------
6http://www.grubbybaby.com/mod_log_mysql/
7
8
9
10Approach
11--------
12
13In order to save speed and overhead, links are kept alive in between
14queries. This module uses one SQL link per httpd process. Among other
15things, this means that this module supports logging into only one
16MySQL server, and for now, also, only one SQL database (although the
17latter limitation can be relatively easily removed).
18
19Different data can be sent to different tables. i.e., it's possible to
20define one table for TransferLog, one for RefererLog, and a 3rd for
21AgentLog. [ Note: this is now deprecated behavior. Please consider
22logging Agent and Referer to the same table as your transfers. ]
23
24Virtual hosts are supported in the same manner they are in the regular
25logging modules. If you specify a different table for a virtual
26host it will be used, otherwise the 'general' would be used. Note:
27since all 3 types of logs are implemented within the same module, if
28you specify an overriding table for a virtual host for one type of log,
29it'll ignore any previous 'general' defaults (see the example in the
30end).
31
32SQL links are opened on demand (i.e., the first time each httpd needs
33to log something to SQL, the link is opened). In case the SQL server
34is down when trying to connect to it, the module remains silent and
35logs no error (I didn't want thousands of error messages in the
36logfile). In case the SQL link is broken ("mysql server has gone
37away") a proper error message is kept to the error log (textual :), and
38the module tries to reestablish the concact (and reports whether it
39succeeded or not in the error log). If the link cannot be
40reestablished, the module will, again, remain silent. Technical note:
41The SQL link is registered using apache's pool mechanism, so SQL links
42are properly closed on any normal shutdown, kill -HUP or kill -TERM.
43This also means that if you restart the MySQL daemon for any reason you
44should restart Apache.
45
46
47
48Supported directives
49--------------------
50
51Please see the web-based documentation for full explanation of all
52supported run-time directives.
53
54http://www.grubbybaby.com/mod_log_mysql/directives.html
55
56
57
58What gets logged by default?
59----------------------------
60
61All the data that would be contained in the "Combined Log Format"
62is logged by default, plus a little extra. Your best bet is to
63accept this default and employ the enclosed access_log.sql to
64format your table. Customize your logging format after you've
65had a chance to experiment with the default first.
66
67The MySQL table looks like this if you use the enclosed access_log.sql:
68
69+------------------+------------------+
70| Field | Type |
71+------------------+------------------+
72| remote_host | varchar(50) |
73| remote_user | varchar(50) |
74| request_uri | varchar(50) |
75| request_duration | smallint(6) |
76| virtual_host | varchar(50) |
77| time_stamp | int(10) unsigned |
78| status | smallint(6) |
79| bytes_sent | int(11) |
80| referer | varchar(255) |
81| agent | varchar(255) |
82+------------------+------------------+
83
84remote_host: corresponds to the Apache %h directive. Contains the remote
85 hostname or IP of the machine accessing your server.
86 Example: si4002.inktomi.com
87
88remote_user: corresponds to the Apache %u directive. Contains the
89 userid of people who have authenticated to your server, if applicable.
90 Example: freddy
91
92request_uri: corresponds to the Apache %U directive. Contains the
93 URL path requested, excluding any query string. This is different than
94 the %r information you might be used to seeing:
95
96 %r: GET /cgi-bin/neomail.pl?sessionid=freddy-session-0.742143231719&sort=date_rev HTTP/1.1
97 %U: /cgi-bin/neomail.pl
98
99 We log %U because it contains the real meat of the information that is
100 needed for log analysis, and saves the database a LOT of wasted growth
101 on unneeded bytes.
102
103request_duration: corresponds to the Apache %T directive. Contains the
104 time in seconds that it took to serve the request.
105 Example: 2
106
107virtual_host: contains the VirtualHost that is making the log entry. This
108 allows you to log multiple VirtualHosts to a single MySQL database and
109 yet still be able to extract them for separate analysis.
110 Example: www.grubbybaby.com
111
112time_stamp: contains the time that the request was logged. Please see
113 "Notes" below to get a better understanding of this.
114 Example: 1014249231
115
116status: corresponds to the Apache %t directive. Contains the HTTP status
117 of the request.
118 Example: 404
119
120bytes_sent: corresponds to the Apache %b directive. Contains the number
121 of bytes sent to service the request.
122 Example: 23123
123
124referer: corresponds to the Apache "%{Referer}i" directive. Contains the
125 referring HTML page's URL, if applicable.
126 Example: http://www.foobar.com/links.html
127
128agent: corresponds to the Apache "%{User-Agent}" directive. Contains the
129 broswer type (user agent) of the software that made the request.
130 Example: Mozilla/3.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
131
132
133Notes
134-----
135
136* The 'time_stamp' field is stored in an UNSIGNED INTEGER column, in the
137 standard unix "seconds since 1/1/1970 12:00:00" format. This is
138 superior to storing the access time as a string due to size
139 requirements: an UNSIGNED INT type fits in 4 bytes. The Apache date
140 string (e.g. "18/Nov/2001:13:59:52 -0800") requires 26 bytes --
141 significantly larger, and those extra 22 bytes will add up over the
142 thousands of accesses that a busy server will experience. Besides,
143 an INT type is far more flexible for comparisons, etc.
144
145 In MySQL 3.21 and above you can easily convert this to a human
146 readable format using from_unixtime(), e.g.:
147
148 select remote_host,request_uri,from_unixtime(time_stamp) from access_log;
149
150 The enclosed perl program make_combined_log.pl shows how you can
151 extract your access records in a format that is completely Combined
152 Log Format compliant. You can then feed this to your favorite web
153 log analysis tool.
154
155
156* The table's string values can be CHAR or VARCHAR, at a length of your choice.
157 VARCHAR is superior because it truncates long strings; CHAR types are
158 fixed-length and will be padded with spaces. Just like the
159 time_stamp described above, that kind of space waste will add up over
160 thousands of records.
161
162
163* Most fields should probably be set to NOT NULL. The only ones that
164 shouldn't are extra fields that you don't intend the logging module
165 to update. (You can have other fields in the logging tables if you'd
166 like, but if they're set to NOT NULL then the logging module won't be
167 able to insert rows to these tables.)
168
169
170* Apache normally logs numeric fields with a '-' character to mean "not
171 applicable," e.g. bytes_sent on a request with a 304 response code.
172 Since '-' is an illegal character in an SQL numeric field, such
173 fields are assigned the value 0 instead of '-' which, of course,
174 makes perfect sense anyway.
175
176
177Disclaimer
178----------
179
180It works for me (I've tested it on my '2 hits/busy day' home Linux box,
181and afterwards on our pretty busy tucows mirror (>100K hits a day) and
182it appears to be working fine.
183
184If it doesn't, and causes you damage of any sort, including but not
185limited to losing logs, losing money or your girlfriend leaving you
186(read 'boyfriend' where applicable), I'm not liable to anything. Bug
187reports and constructive flame mail are ok, though (both about the code
188and this quickly-written README file).
189
190
191Author / Maintainer
192-------------------
193
194The actual logging code was taken from the already existing flat file
195text modules, so all that credit goes to the Apache Server group.
196
197The MySQL routines and directives was added in by Zeev Suraski
198<bourbon@netvision.net.il>
199
200Changes from 1.06 on and the new documentation were added by
201Chris Powell <chris@grubbybaby.com>. It seems that the module had fallen
202into the "unmaintained" category -- it hadn't been updated since 1998 --
203so I've adopted it as the new maintainer.
204