diff options
author | Christopher Powell | 2001-11-28 05:26:53 +0000 |
---|---|---|
committer | Christopher Powell | 2001-11-28 05:26:53 +0000 |
commit | 92d85f793b1a41bbbde1811004ae2708a47a44aa (patch) | |
tree | 69ecadaf47ae83197b8c92ff3f8b7c2002b15b19 /README |
Initial revision1.09
Diffstat (limited to 'README')
-rw-r--r-- | README | 204 |
1 files changed, 204 insertions, 0 deletions
@@ -0,0 +1,204 @@ | |||
1 | $Id: README,v 1.1 2001/11/28 05:26:54 helios Exp $ | ||
2 | |||
3 | |||
4 | Homepage | ||
5 | -------- | ||
6 | http://www.grubbybaby.com/mod_log_mysql/ | ||
7 | |||
8 | |||
9 | |||
10 | Approach | ||
11 | -------- | ||
12 | |||
13 | In order to save speed and overhead, links are kept alive in between | ||
14 | queries. This module uses one SQL link per httpd process. Among other | ||
15 | things, this means that this module supports logging into only one | ||
16 | MySQL server, and for now, also, only one SQL database (although the | ||
17 | latter limitation can be relatively easily removed). | ||
18 | |||
19 | Different data can be sent to different tables. i.e., it's possible to | ||
20 | define one table for TransferLog, one for RefererLog, and a 3rd for | ||
21 | AgentLog. [ Note: this is now deprecated behavior. Please consider | ||
22 | logging Agent and Referer to the same table as your transfers. ] | ||
23 | |||
24 | Virtual hosts are supported in the same manner they are in the regular | ||
25 | logging modules. If you specify a different table for a virtual | ||
26 | host it will be used, otherwise the 'general' would be used. Note: | ||
27 | since all 3 types of logs are implemented within the same module, if | ||
28 | you specify an overriding table for a virtual host for one type of log, | ||
29 | it'll ignore any previous 'general' defaults (see the example in the | ||
30 | end). | ||
31 | |||
32 | SQL links are opened on demand (i.e., the first time each httpd needs | ||
33 | to log something to SQL, the link is opened). In case the SQL server | ||
34 | is down when trying to connect to it, the module remains silent and | ||
35 | logs no error (I didn't want thousands of error messages in the | ||
36 | logfile). In case the SQL link is broken ("mysql server has gone | ||
37 | away") a proper error message is kept to the error log (textual :), and | ||
38 | the module tries to reestablish the concact (and reports whether it | ||
39 | succeeded or not in the error log). If the link cannot be | ||
40 | reestablished, the module will, again, remain silent. Technical note: | ||
41 | The SQL link is registered using apache's pool mechanism, so SQL links | ||
42 | are properly closed on any normal shutdown, kill -HUP or kill -TERM. | ||
43 | This also means that if you restart the MySQL daemon for any reason you | ||
44 | should restart Apache. | ||
45 | |||
46 | |||
47 | |||
48 | Supported directives | ||
49 | -------------------- | ||
50 | |||
51 | Please see the web-based documentation for full explanation of all | ||
52 | supported run-time directives. | ||
53 | |||
54 | http://www.grubbybaby.com/mod_log_mysql/directives.html | ||
55 | |||
56 | |||
57 | |||
58 | What gets logged by default? | ||
59 | ---------------------------- | ||
60 | |||
61 | All the data that would be contained in the "Combined Log Format" | ||
62 | is logged by default, plus a little extra. Your best bet is to | ||
63 | accept this default and employ the enclosed access_log.sql to | ||
64 | format your table. Customize your logging format after you've | ||
65 | had a chance to experiment with the default first. | ||
66 | |||
67 | The MySQL table looks like this if you use the enclosed access_log.sql: | ||
68 | |||
69 | +------------------+------------------+ | ||
70 | | Field | Type | | ||
71 | +------------------+------------------+ | ||
72 | | remote_host | varchar(50) | | ||
73 | | remote_user | varchar(50) | | ||
74 | | request_uri | varchar(50) | | ||
75 | | request_duration | smallint(6) | | ||
76 | | virtual_host | varchar(50) | | ||
77 | | time_stamp | int(10) unsigned | | ||
78 | | status | smallint(6) | | ||
79 | | bytes_sent | int(11) | | ||
80 | | referer | varchar(255) | | ||
81 | | agent | varchar(255) | | ||
82 | +------------------+------------------+ | ||
83 | |||
84 | remote_host: corresponds to the Apache %h directive. Contains the remote | ||
85 | hostname or IP of the machine accessing your server. | ||
86 | Example: si4002.inktomi.com | ||
87 | |||
88 | remote_user: corresponds to the Apache %u directive. Contains the | ||
89 | userid of people who have authenticated to your server, if applicable. | ||
90 | Example: freddy | ||
91 | |||
92 | request_uri: corresponds to the Apache %U directive. Contains the | ||
93 | URL path requested, excluding any query string. This is different than | ||
94 | the %r information you might be used to seeing: | ||
95 | |||
96 | %r: GET /cgi-bin/neomail.pl?sessionid=freddy-session-0.742143231719&sort=date_rev HTTP/1.1 | ||
97 | %U: /cgi-bin/neomail.pl | ||
98 | |||
99 | We log %U because it contains the real meat of the information that is | ||
100 | needed for log analysis, and saves the database a LOT of wasted growth | ||
101 | on unneeded bytes. | ||
102 | |||
103 | request_duration: corresponds to the Apache %T directive. Contains the | ||
104 | time in seconds that it took to serve the request. | ||
105 | Example: 2 | ||
106 | |||
107 | virtual_host: contains the VirtualHost that is making the log entry. This | ||
108 | allows you to log multiple VirtualHosts to a single MySQL database and | ||
109 | yet still be able to extract them for separate analysis. | ||
110 | Example: www.grubbybaby.com | ||
111 | |||
112 | time_stamp: contains the time that the request was logged. Please see | ||
113 | "Notes" below to get a better understanding of this. | ||
114 | Example: 1014249231 | ||
115 | |||
116 | status: corresponds to the Apache %t directive. Contains the HTTP status | ||
117 | of the request. | ||
118 | Example: 404 | ||
119 | |||
120 | bytes_sent: corresponds to the Apache %b directive. Contains the number | ||
121 | of bytes sent to service the request. | ||
122 | Example: 23123 | ||
123 | |||
124 | referer: corresponds to the Apache "%{Referer}i" directive. Contains the | ||
125 | referring HTML page's URL, if applicable. | ||
126 | Example: http://www.foobar.com/links.html | ||
127 | |||
128 | agent: corresponds to the Apache "%{User-Agent}" directive. Contains the | ||
129 | broswer type (user agent) of the software that made the request. | ||
130 | Example: Mozilla/3.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html) | ||
131 | |||
132 | |||
133 | Notes | ||
134 | ----- | ||
135 | |||
136 | * The 'time_stamp' field is stored in an UNSIGNED INTEGER column, in the | ||
137 | standard unix "seconds since 1/1/1970 12:00:00" format. This is | ||
138 | superior to storing the access time as a string due to size | ||
139 | requirements: an UNSIGNED INT type fits in 4 bytes. The Apache date | ||
140 | string (e.g. "18/Nov/2001:13:59:52 -0800") requires 26 bytes -- | ||
141 | significantly larger, and those extra 22 bytes will add up over the | ||
142 | thousands of accesses that a busy server will experience. Besides, | ||
143 | an INT type is far more flexible for comparisons, etc. | ||
144 | |||
145 | In MySQL 3.21 and above you can easily convert this to a human | ||
146 | readable format using from_unixtime(), e.g.: | ||
147 | |||
148 | select remote_host,request_uri,from_unixtime(time_stamp) from access_log; | ||
149 | |||
150 | The enclosed perl program make_combined_log.pl shows how you can | ||
151 | extract your access records in a format that is completely Combined | ||
152 | Log Format compliant. You can then feed this to your favorite web | ||
153 | log analysis tool. | ||
154 | |||
155 | |||
156 | * The table's string values can be CHAR or VARCHAR, at a length of your choice. | ||
157 | VARCHAR is superior because it truncates long strings; CHAR types are | ||
158 | fixed-length and will be padded with spaces. Just like the | ||
159 | time_stamp described above, that kind of space waste will add up over | ||
160 | thousands of records. | ||
161 | |||
162 | |||
163 | * Most fields should probably be set to NOT NULL. The only ones that | ||
164 | shouldn't are extra fields that you don't intend the logging module | ||
165 | to update. (You can have other fields in the logging tables if you'd | ||
166 | like, but if they're set to NOT NULL then the logging module won't be | ||
167 | able to insert rows to these tables.) | ||
168 | |||
169 | |||
170 | * Apache normally logs numeric fields with a '-' character to mean "not | ||
171 | applicable," e.g. bytes_sent on a request with a 304 response code. | ||
172 | Since '-' is an illegal character in an SQL numeric field, such | ||
173 | fields are assigned the value 0 instead of '-' which, of course, | ||
174 | makes perfect sense anyway. | ||
175 | |||
176 | |||
177 | Disclaimer | ||
178 | ---------- | ||
179 | |||
180 | It works for me (I've tested it on my '2 hits/busy day' home Linux box, | ||
181 | and afterwards on our pretty busy tucows mirror (>100K hits a day) and | ||
182 | it appears to be working fine. | ||
183 | |||
184 | If it doesn't, and causes you damage of any sort, including but not | ||
185 | limited to losing logs, losing money or your girlfriend leaving you | ||
186 | (read 'boyfriend' where applicable), I'm not liable to anything. Bug | ||
187 | reports and constructive flame mail are ok, though (both about the code | ||
188 | and this quickly-written README file). | ||
189 | |||
190 | |||
191 | Author / Maintainer | ||
192 | ------------------- | ||
193 | |||
194 | The actual logging code was taken from the already existing flat file | ||
195 | text modules, so all that credit goes to the Apache Server group. | ||
196 | |||
197 | The MySQL routines and directives was added in by Zeev Suraski | ||
198 | <bourbon@netvision.net.il> | ||
199 | |||
200 | Changes from 1.06 on and the new documentation were added by | ||
201 | Chris Powell <chris@grubbybaby.com>. It seems that the module had fallen | ||
202 | into the "unmaintained" category -- it hadn't been updated since 1998 -- | ||
203 | so I've adopted it as the new maintainer. | ||
204 | |||