summaryrefslogtreecommitdiffstats
path: root/README
blob: 977080f835d3c3c4366abc7a7567e255de95bc6b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
$Id: README,v 1.4 2002/04/08 07:06:20 helios Exp $


Homepage
--------
http://www.grubbybaby.com/mod_log_mysql/



Approach
--------

In order to save speed and overhead, links are kept alive in between
queries.  This module uses one SQL link per httpd process.  Among other
things, this means that this module supports logging into only one
MySQL server, and for now, also, only one SQL database (although the
latter limitation can be relatively easily removed). 

Different data can be sent to different tables.  i.e., it's possible to
define one table for TransferLog, one for RefererLog, and a 3rd for
AgentLog.  [ Note: this is now deprecated behavior.  Please consider
logging Agent and Referer to the same table as your transfers. ]

Virtual hosts are supported in the same manner they are in the regular
logging modules.  If you specify a different table for a virtual
host it will be used, otherwise the 'general' would be used.  Note:
since all 3 types of logs are implemented within the same module, if
you specify an overriding table for a virtual host for one type of log,
it'll ignore any previous 'general' defaults (see the example in the
end).

SQL links are opened on demand (i.e., the first time each httpd needs
to log something to SQL, the link is opened).  In case the SQL server
is down when trying to connect to it, the module remains silent and
logs no error (I didn't want thousands of error messages in the
logfile).  In case the SQL link is broken ("mysql server has gone
away") a proper error message is kept to the error log (textual :), and
the module tries to reestablish the concact (and reports whether it
succeeded or not in the error log).  If the link cannot be
reestablished, the module will, again, remain silent. Technical note:
The SQL link is registered using apache's pool mechanism, so SQL links
are properly closed on any normal shutdown, kill -HUP or kill -TERM. 
This also means that if you restart the MySQL daemon for any reason you
should restart Apache.



Supported directives
--------------------

Please see the web-based documentation for full explanation of all
supported run-time directives.

 http://www.grubbybaby.com/mod_log_mysql/directives.html

See the FAQ for some handy examples:

 http://www.grubbybaby.com/mod_log_mysql/faq.html


What gets logged by default?
----------------------------

All the data that would be contained in the "Combined Log Format" 
is logged by default, plus a little extra.  Your best bet is to
accept this default and employ the enclosed access_log.sql to
format your table.  Customize your logging format after you've
had a chance to experiment with the default first.

If you just want to log enough data to be able to reconstruct
a Combined Log Format log, log these:

+------------------+------------------+
| Field            | Type             |
+------------------+------------------+
| remote_host      | varchar(50)      |
| remote_user      | varchar(50)      |
| request_uri      | varchar(50)      |
| virtual_host     | varchar(50)      |
| time_stamp       | int(10) unsigned |
| status           | smallint(6)      |
| bytes_sent       | int(11)          |
| referer          | varchar(255)     |
| agent            | varchar(255)     |
| request_method   | varchar(6)       |
| request_protocol | varchar(10)      |
+------------------+------------------+

remote_host: corresponds to the Apache %h directive.  Contains the remote
  hostname or IP of the machine accessing your server.
  Example:  si4002.inktomi.com

remote_user: corresponds to the Apache %u directive.  Contains the 
  userid of people who have authenticated to your server, if applicable.
  Example:  freddy

request_uri: corresponds to the Apache %U directive.  Contains the
  URL path requested, excluding any query string.  This is different than
  the %r information you might be used to seeing:

  %r: GET /cgi-bin/neomail.pl?sessionid=freddy-session-0.742143231719&sort=date_rev HTTP/1.1
  %U: /cgi-bin/neomail.pl

  We log %U because it contains the real meat of the information that is
  needed for log analysis, and saves the database a LOT of wasted growth
  on unneeded bytes.

virtual_host: contains the VirtualHost that is making the log entry. This
  allows you to log multiple VirtualHosts to a single MySQL database and
  yet still be able to extract them for separate analysis.
  Example: www.grubbybaby.com

time_stamp: contains the time that the request was logged.  Please see
  "Notes" below to get a better understanding of this.
  Example: 1014249231

status: corresponds to the Apache %t directive.  Contains the HTTP status
  of the request.
  Example: 404

bytes_sent: corresponds to the Apache %b directive.  Contains the number
  of bytes sent to service the request.
  Example: 23123

referer: corresponds to the Apache "%{Referer}i" directive.  Contains the
  referring HTML page's URL, if applicable.
  Example: http://www.foobar.com/links.html

agent: corresponds to the Apache "%{User-Agent}" directive.  Contains the
  broswer type (user agent) of the software that made the request.
  Example: Mozilla/3.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)

request_method: corresponds to the Apache %m directive.  Contains the type
  of request sent: GET, PUT, etc.
  Example: GET
  
request_protocol: corresponds to the Apache %H directive.  Contains the HTTP
  protocol that was used.
  Example: HTTP/1.1
  
  
Notes
-----

* The 'time_stamp' field is stored in an UNSIGNED INTEGER column, in the
  standard unix "seconds since 1/1/1970 12:00:00" format.  This is
  superior to storing the access time as a string due to size
  requirements: an UNSIGNED INT type fits in 4 bytes, whereas the Apache date
  string (e.g. "18/Nov/2001:13:59:52 -0800") requires 26 bytes --
  significantly larger, and those extra 22 bytes will add up over the
  thousands of accesses that a busy server will experience.  Besides,
  an INT type is far more flexible for comparisons, etc.

  In MySQL 3.21 and above you can easily convert this to a human
  readable format using from_unixtime(), e.g.:

  select remote_host,request_uri,from_unixtime(time_stamp) from access_log;

  The enclosed perl program make_combined_log.pl shows how you can
  extract your access records in a format that is completely Combined
  Log Format compliant.  You can then feed this to your favorite web
  log analysis tool.


* The table's string values can be CHAR or VARCHAR, at a length of your choice.
  VARCHAR is superior because it truncates long strings; CHAR types are
  fixed-length and will be padded with spaces.  Just like the
  time_stamp described above, that kind of space waste will add up over
  thousands of records.


* Most fields should probably be set to NOT NULL.  The only ones that
  shouldn't are extra fields that you don't intend the logging module
  to update.  (You can have other fields in the logging tables if you'd
  like, but if they're set to NOT NULL then the logging module won't be
  able to insert rows to these tables.)


* Apache normally logs numeric fields with a '-' character to mean "not
  applicable," e.g. bytes_sent on a request with a 304 response code. 
  Since '-' is an illegal character in an SQL numeric field, such
  fields are assigned the value 0 instead of '-' which, of course,
  makes perfect sense anyway.


* If your database goes offline and Apache cannot log to it, mod_log_mysql
  intelligently preserves any queries to a local text file.  (By
  default the file is /tmp/mysql-preserve.)  This will allow you to not
  miss those entries; when you bring your database back online it is a
  simple matter to import the contents of this preserve file.  To do
  this simply copy the file to your MySQL server and run an import
  as follows:
  # mysql -uadminuser -p mydbname < mysql-preserve


Author / Maintainer
-------------------

The actual logging code was taken from the already existing flat file
text modules, so all that credit goes to the Apache Server group.  

The MySQL routines and directives were added by Zeev Suraski
<bourbon@netvision.net.il>.

Changes from 1.06 on and the new documentation were added by 
Chris Powell <chris@grubbybaby.com>.  It seems that the module had fallen
into the "unmaintained" category -- it hadn't been updated since 1998 --
so Chris adopted it as the new maintainer.