2 Apache2
elB4RTO edited this page 2022-10-15 14:35:58 +00:00

Access logs format string


Configuration file


The configuration file should be located at:

/etc/apache2/apache2.conf

The line to configure access logs is the one starting with "LogFormat" followed by the list of fields codes.



Common logs formats


Most commonly used format strings are:


  • Common log format (CLF)
  • LogFormat "%h %l %u %t \"%r\" %>s %O" common

  • Combined log format (NCSA standard)
  • LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-agent}i\"" combined


Suggested logs formats


A suggested format string, to allow using the complete set of functionalities of LogDoctor, is:

LogFormat "%{%F %T}t %H %m %U %q %>s %I %O %D \"%{Referer}i\" \"%{Cookie}i\" \"%{User-agent}i\" %{c}h" combined


The string above should be preferred, but alternatives can be used as well, like:

LogFormat "%{sec}t \"%r\" %q %<s %I %O %D \"%{Referer}i\" \"%{Cookie}i\" \"%{User-agent}i\" %h" combined


Note on custom format strings


If you're using your own custom string, please keep in mind that parsing is not magic. When you define your own string, think about which characters can be there in a field and use separators accordingly to not conflict with the field itself.
As an example: an URI (%U) can't contain whitespaces, so it is safe to use a space to separe this field by the previous and next one. Instead, the User-Agent (*%{User-agent}i*) may contain spaces, as well as parenthesis, brackets, dashes, etc, so it's better to pick an appropriate separator (double-quotes are a good choice, since they get escaped while logging).



Note on control-characters


Although Apache2 does support some control-characters (aka escape sequences), it is reccomended to not use them inside format strings.
In particular, the carriage return will most-likely overwrite previous fields data, making it very difficult to understand where the current field ends (specially for fields like URIs, queries, user-agents, etc) and nearly impossible to retrieve the overwritten data, which will lead in having a wasted database, un-realistic statistics and/or crashes during execution.
About the new line character, it has no sense to use it, if not for testing purposes. The same is true for the horizontal tab, for which is better to use a simple whitespace instead.
The only control-characters supported by Apache2 are \n, \t and \r. Any other character will be ignored and treated as text.





Access logs format fields


Fields considered by LogDoctor


Only the following fields will be considered, meaning that only these fields' data will be stored and used for the statistics.


CodeInformations
%% The percent sign character, will result in a single percent sign and treated as normal text (from both Apache and LogDoctor).
%t Time the request was received, in the format [DD/Mon/YYYY:hh:mm:ss ±TZ]. The last number (TZ) indicates the timezone offset from GMT.
%{FORMAT}t Time the request was received, in the form given by FORMAT, which should be in an extended strftime format.
The following format tokens are supported (by LogDoctor, any other than the following will be discarded, even if valid):
FormatDescription
sectime since epoch, in seconds
msectime since epoch, in milliseconds
usectime since epoch, in microseconds
%bmonth name, abbreviated (same as %h)
%Bmonth name
%cdate and time representation
%dday number, zero padded
%Ddate, in the form of MM/DD/YY
%eday number, space padded
%Fdate, in the form of YYYY/MM/DD
%hmonth name, abbreviated (same as %b)
%Hhour, in 24h format, zero padded
%mmonth number, zero padded
%Mminute
%rtime if the day, in 12h format, in the form of HH:MM:SS AM/PM
%Rtime of the day, in HH:MM format
%Ssecond
%TISO 8601 time, in the form of HH:MM:SS
%xdate representation
%Xtime representation
%yyear, last two digits (YY)
%Yyear
Note: time formats sec, msec and usec can't be mixed together or with other formats.
%r First line of request, equivalent to: %m %U?%q %H.
%H The request protocol (HTTP/v, HTTPS/v).
%m The request method (GET, POST, HEAD, ...).
%U The URI path requested, not including any query string.
%q Query string (if any).
%s HTTP Status code at the beginning of the request (exclude redirections statuses).
%>s Final HTTP Status code (in case requests have been internally redirected).
%I Bytes received, including request and headers (you need to enable mod_logio to use this).
%O Bytes sent, including headers (you need to enable mod_logio to use this).
%T The time taken to serve the request, in seconds.
%{UNIT}T The time taken to serve the request, in a time unit given by UNIT (only available in 2.4.13 and later).
Valid units are:
UnitDescription
sseconds
msmilliseconds
usmicroseconds
%D The time taken to serve the request, in milliseconds.
%h IP Address of the client (remote hostname).
%{c}h Like %h, but always reports on the hostname of the underlying TCP connection and not any modifications to the remote hostname by modules like mod_remoteip.
%{VARNAME}i The contents of VARNAME: header line(s) in the request sent to the server.
Supported varnames (by LogDoctor) are:
VarNameDescription
Cookiecookie of the request
Refererreferrer host
User-agentweb-browser or bot identification string


Fields discarded by LogDoctor


Any field than the ones above won't be considered by LogDoctor.
When generating a log sample, these fields will appear as 'DISCARDED'.
If you aint using logs for any other purpose, please remove unnecessary fields to make the process faster and reduce the possibility of errors.





References