diff options
Diffstat (limited to 'README')
| -rw-r--r-- | README | 204 |
1 files changed, 204 insertions, 0 deletions
| @@ -0,0 +1,204 @@ | |||
| 1 | $Id: README,v 1.1 2001/11/28 05:26:54 helios Exp $ | ||
| 2 | |||
| 3 | |||
| 4 | Homepage | ||
| 5 | -------- | ||
| 6 | http://www.grubbybaby.com/mod_log_mysql/ | ||
| 7 | |||
| 8 | |||
| 9 | |||
| 10 | Approach | ||
| 11 | -------- | ||
| 12 | |||
| 13 | In order to save speed and overhead, links are kept alive in between | ||
| 14 | queries. This module uses one SQL link per httpd process. Among other | ||
| 15 | things, this means that this module supports logging into only one | ||
| 16 | MySQL server, and for now, also, only one SQL database (although the | ||
| 17 | latter limitation can be relatively easily removed). | ||
| 18 | |||
| 19 | Different data can be sent to different tables. i.e., it's possible to | ||
| 20 | define one table for TransferLog, one for RefererLog, and a 3rd for | ||
| 21 | AgentLog. [ Note: this is now deprecated behavior. Please consider | ||
| 22 | logging Agent and Referer to the same table as your transfers. ] | ||
| 23 | |||
| 24 | Virtual hosts are supported in the same manner they are in the regular | ||
| 25 | logging modules. If you specify a different table for a virtual | ||
| 26 | host it will be used, otherwise the 'general' would be used. Note: | ||
| 27 | since all 3 types of logs are implemented within the same module, if | ||
| 28 | you specify an overriding table for a virtual host for one type of log, | ||
| 29 | it'll ignore any previous 'general' defaults (see the example in the | ||
| 30 | end). | ||
| 31 | |||
| 32 | SQL links are opened on demand (i.e., the first time each httpd needs | ||
| 33 | to log something to SQL, the link is opened). In case the SQL server | ||
| 34 | is down when trying to connect to it, the module remains silent and | ||
| 35 | logs no error (I didn't want thousands of error messages in the | ||
| 36 | logfile). In case the SQL link is broken ("mysql server has gone | ||
| 37 | away") a proper error message is kept to the error log (textual :), and | ||
| 38 | the module tries to reestablish the concact (and reports whether it | ||
| 39 | succeeded or not in the error log). If the link cannot be | ||
| 40 | reestablished, the module will, again, remain silent. Technical note: | ||
| 41 | The SQL link is registered using apache's pool mechanism, so SQL links | ||
| 42 | are properly closed on any normal shutdown, kill -HUP or kill -TERM. | ||
| 43 | This also means that if you restart the MySQL daemon for any reason you | ||
| 44 | should restart Apache. | ||
| 45 | |||
| 46 | |||
| 47 | |||
| 48 | Supported directives | ||
| 49 | -------------------- | ||
| 50 | |||
| 51 | Please see the web-based documentation for full explanation of all | ||
| 52 | supported run-time directives. | ||
| 53 | |||
| 54 | http://www.grubbybaby.com/mod_log_mysql/directives.html | ||
| 55 | |||
| 56 | |||
| 57 | |||
| 58 | What gets logged by default? | ||
| 59 | ---------------------------- | ||
| 60 | |||
| 61 | All the data that would be contained in the "Combined Log Format" | ||
| 62 | is logged by default, plus a little extra. Your best bet is to | ||
| 63 | accept this default and employ the enclosed access_log.sql to | ||
| 64 | format your table. Customize your logging format after you've | ||
| 65 | had a chance to experiment with the default first. | ||
| 66 | |||
| 67 | The MySQL table looks like this if you use the enclosed access_log.sql: | ||
| 68 | |||
| 69 | +------------------+------------------+ | ||
| 70 | | Field | Type | | ||
| 71 | +------------------+------------------+ | ||
| 72 | | remote_host | varchar(50) | | ||
| 73 | | remote_user | varchar(50) | | ||
| 74 | | request_uri | varchar(50) | | ||
| 75 | | request_duration | smallint(6) | | ||
| 76 | | virtual_host | varchar(50) | | ||
| 77 | | time_stamp | int(10) unsigned | | ||
| 78 | | status | smallint(6) | | ||
| 79 | | bytes_sent | int(11) | | ||
| 80 | | referer | varchar(255) | | ||
| 81 | | agent | varchar(255) | | ||
| 82 | +------------------+------------------+ | ||
| 83 | |||
| 84 | remote_host: corresponds to the Apache %h directive. Contains the remote | ||
| 85 | hostname or IP of the machine accessing your server. | ||
| 86 | Example: si4002.inktomi.com | ||
| 87 | |||
| 88 | remote_user: corresponds to the Apache %u directive. Contains the | ||
| 89 | userid of people who have authenticated to your server, if applicable. | ||
| 90 | Example: freddy | ||
| 91 | |||
| 92 | request_uri: corresponds to the Apache %U directive. Contains the | ||
| 93 | URL path requested, excluding any query string. This is different than | ||
| 94 | the %r information you might be used to seeing: | ||
| 95 | |||
| 96 | %r: GET /cgi-bin/neomail.pl?sessionid=freddy-session-0.742143231719&sort=date_rev HTTP/1.1 | ||
| 97 | %U: /cgi-bin/neomail.pl | ||
| 98 | |||
| 99 | We log %U because it contains the real meat of the information that is | ||
| 100 | needed for log analysis, and saves the database a LOT of wasted growth | ||
| 101 | on unneeded bytes. | ||
| 102 | |||
| 103 | request_duration: corresponds to the Apache %T directive. Contains the | ||
| 104 | time in seconds that it took to serve the request. | ||
| 105 | Example: 2 | ||
| 106 | |||
| 107 | virtual_host: contains the VirtualHost that is making the log entry. This | ||
| 108 | allows you to log multiple VirtualHosts to a single MySQL database and | ||
| 109 | yet still be able to extract them for separate analysis. | ||
| 110 | Example: www.grubbybaby.com | ||
| 111 | |||
| 112 | time_stamp: contains the time that the request was logged. Please see | ||
| 113 | "Notes" below to get a better understanding of this. | ||
| 114 | Example: 1014249231 | ||
| 115 | |||
| 116 | status: corresponds to the Apache %t directive. Contains the HTTP status | ||
| 117 | of the request. | ||
| 118 | Example: 404 | ||
| 119 | |||
| 120 | bytes_sent: corresponds to the Apache %b directive. Contains the number | ||
| 121 | of bytes sent to service the request. | ||
| 122 | Example: 23123 | ||
| 123 | |||
| 124 | referer: corresponds to the Apache "%{Referer}i" directive. Contains the | ||
| 125 | referring HTML page's URL, if applicable. | ||
| 126 | Example: http://www.foobar.com/links.html | ||
| 127 | |||
| 128 | agent: corresponds to the Apache "%{User-Agent}" directive. Contains the | ||
| 129 | broswer type (user agent) of the software that made the request. | ||
| 130 | Example: Mozilla/3.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html) | ||
| 131 | |||
| 132 | |||
| 133 | Notes | ||
| 134 | ----- | ||
| 135 | |||
| 136 | * The 'time_stamp' field is stored in an UNSIGNED INTEGER column, in the | ||
| 137 | standard unix "seconds since 1/1/1970 12:00:00" format. This is | ||
| 138 | superior to storing the access time as a string due to size | ||
| 139 | requirements: an UNSIGNED INT type fits in 4 bytes. The Apache date | ||
| 140 | string (e.g. "18/Nov/2001:13:59:52 -0800") requires 26 bytes -- | ||
| 141 | significantly larger, and those extra 22 bytes will add up over the | ||
| 142 | thousands of accesses that a busy server will experience. Besides, | ||
| 143 | an INT type is far more flexible for comparisons, etc. | ||
| 144 | |||
| 145 | In MySQL 3.21 and above you can easily convert this to a human | ||
| 146 | readable format using from_unixtime(), e.g.: | ||
| 147 | |||
| 148 | select remote_host,request_uri,from_unixtime(time_stamp) from access_log; | ||
| 149 | |||
| 150 | The enclosed perl program make_combined_log.pl shows how you can | ||
| 151 | extract your access records in a format that is completely Combined | ||
| 152 | Log Format compliant. You can then feed this to your favorite web | ||
| 153 | log analysis tool. | ||
| 154 | |||
| 155 | |||
| 156 | * The table's string values can be CHAR or VARCHAR, at a length of your choice. | ||
| 157 | VARCHAR is superior because it truncates long strings; CHAR types are | ||
| 158 | fixed-length and will be padded with spaces. Just like the | ||
| 159 | time_stamp described above, that kind of space waste will add up over | ||
| 160 | thousands of records. | ||
| 161 | |||
| 162 | |||
| 163 | * Most fields should probably be set to NOT NULL. The only ones that | ||
| 164 | shouldn't are extra fields that you don't intend the logging module | ||
| 165 | to update. (You can have other fields in the logging tables if you'd | ||
| 166 | like, but if they're set to NOT NULL then the logging module won't be | ||
| 167 | able to insert rows to these tables.) | ||
| 168 | |||
| 169 | |||
| 170 | * Apache normally logs numeric fields with a '-' character to mean "not | ||
| 171 | applicable," e.g. bytes_sent on a request with a 304 response code. | ||
| 172 | Since '-' is an illegal character in an SQL numeric field, such | ||
| 173 | fields are assigned the value 0 instead of '-' which, of course, | ||
| 174 | makes perfect sense anyway. | ||
| 175 | |||
| 176 | |||
| 177 | Disclaimer | ||
| 178 | ---------- | ||
| 179 | |||
| 180 | It works for me (I've tested it on my '2 hits/busy day' home Linux box, | ||
| 181 | and afterwards on our pretty busy tucows mirror (>100K hits a day) and | ||
| 182 | it appears to be working fine. | ||
| 183 | |||
| 184 | If it doesn't, and causes you damage of any sort, including but not | ||
| 185 | limited to losing logs, losing money or your girlfriend leaving you | ||
| 186 | (read 'boyfriend' where applicable), I'm not liable to anything. Bug | ||
| 187 | reports and constructive flame mail are ok, though (both about the code | ||
| 188 | and this quickly-written README file). | ||
| 189 | |||
| 190 | |||
| 191 | Author / Maintainer | ||
| 192 | ------------------- | ||
| 193 | |||
| 194 | The actual logging code was taken from the already existing flat file | ||
| 195 | text modules, so all that credit goes to the Apache Server group. | ||
| 196 | |||
| 197 | The MySQL routines and directives was added in by Zeev Suraski | ||
| 198 | <bourbon@netvision.net.il> | ||
| 199 | |||
| 200 | Changes from 1.06 on and the new documentation were added by | ||
| 201 | Chris Powell <chris@grubbybaby.com>. It seems that the module had fallen | ||
| 202 | into the "unmaintained" category -- it hadn't been updated since 1998 -- | ||
| 203 | so I've adopted it as the new maintainer. | ||
| 204 | |||
