Okay, here is the detailed article on Getting Started with Nginx Log Formatting.
Mastering Nginx Logs: A Comprehensive Guide to Custom Log Formatting
Nginx is renowned for its high performance, stability, and rich feature set, making it one of the most popular web servers and reverse proxies in the world. Behind the scenes of serving web pages, handling API requests, or balancing load, Nginx diligently records crucial information about its operations in log files. These logs are invaluable resources for debugging issues, monitoring performance, understanding user behavior, ensuring security, and performing audits.
By default, Nginx provides useful logging capabilities, but the true power lies in its flexibility to customize what gets logged and how it’s formatted. Standard log formats might suffice for basic needs, but as applications grow in complexity and monitoring requirements become more sophisticated, tailoring log formats becomes essential. Custom formatting allows you to capture specific data points relevant to your application, structure logs for easier parsing by automated tools, and optimize logging for performance and storage.
This comprehensive guide will walk you through the intricacies of Nginx log formatting. We’ll start with the basics of default Nginx logs, delve into the core directives (log_format
and access_log
), explore the vast array of available Nginx variables, demonstrate how to create custom formats (including the increasingly popular JSON format), discuss conditional logging, touch upon error log specifics, and cover best practices and performance considerations. By the end of this article, you’ll have the knowledge and tools to effectively configure Nginx logging to meet your specific requirements.
1. Understanding Nginx’s Default Logging Behavior
Before customizing, it’s crucial to understand what Nginx logs by default. Nginx primarily maintains two types of logs:
- Access Logs: Record information about client requests processed by the server. Every time a client (like a browser or an API client) requests a resource from Nginx, an entry is typically added to the access log.
- Error Logs: Record diagnostic information about problems encountered during Nginx’s operation, ranging from minor warnings to critical errors that might prevent the server from functioning correctly.
1.1 Default Access Log
By default, Nginx usually comes pre-configured with an access log directive, often pointing to a file like /var/log/nginx/access.log
(the exact path can vary depending on the operating system distribution and installation method). It typically uses a predefined format named combined
.
A typical default access_log
directive in the main nginx.conf
or a specific virtual host configuration might look like this:
“`nginx
http {
# … other http settings …
log_format combined '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent"';
access_log /var/log/nginx/access.log combined;
# ... server blocks etc ...
}
“`
Let’s break down the fields in the default combined
format:
$remote_addr
: The IP address of the client making the request. If Nginx is behind a load balancer or reverse proxy, this might be the IP of the proxy, not the original client (we’ll discuss how to handle this later).-
: A literal hyphen, often used as a placeholder.$remote_user
: The username provided for HTTP Basic Authentication. If authentication is not used or fails, this will be a hyphen (-
).[$time_local]
: The local server time when the request was processed, enclosed in square brackets. The format is usually likedd/MMM/yyyy:HH:mm:ss +ZZZZ
(e.g.,15/Oct/2023:10:30:05 +0000
)."$request"
: The full original request line, enclosed in double quotes. This includes the HTTP method, the requested URI (including arguments), and the HTTP protocol version (e.g.,"GET /images/logo.png HTTP/1.1"
).$status
: The HTTP status code returned to the client (e.g.,200
,404
,500
).$body_bytes_sent
: The number of bytes sent to the client in the response body, excluding the response headers."$http_referer"
: The URL of the page from which the request originated (the content of theReferer
HTTP header), enclosed in double quotes. If the header is missing or empty, it’s logged as"-"
."$http_user_agent"
: The client’s user agent identification string (the content of theUser-Agent
HTTP header), enclosed in double quotes. This usually identifies the browser, operating system, or client application.
A sample log entry in the combined
format might look like:
192.168.1.101 - - [15/Oct/2023:10:30:05 +0000] "GET /index.html HTTP/1.1" 200 1543 "http://example.com/previous_page" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"
While combined
provides a good baseline, it often lacks information crucial for modern web application monitoring, such as request processing time, upstream response times (if using Nginx as a reverse proxy), or custom application-specific headers.
1.2 Default Error Log
The error log captures issues encountered by Nginx itself. Its default location is often /var/log/nginx/error.log
. Unlike the access log, its format is less customizable through simple directives, but its verbosity can be controlled.
A typical default error_log
directive might look like:
nginx
error_log /var/log/nginx/error.log warn;
The format usually includes:
- Timestamp: Date and time of the event.
- Severity Level: Indicates the seriousness of the event.
- Process ID (PID) and Thread ID (TID): Identifies the Nginx worker process and thread handling the request.
- Client Information (optional): Sometimes includes the client IP address and request details if the error is associated with a specific client request.
- Error Message: A descriptive message detailing the problem.
The severity levels, from lowest to highest, are:
debug
: Detailed debugging information (requires Nginx to be compiled with debugging support).info
: Informational messages about normal operations.notice
: Notable events that are part of normal operation but might be worth noting.warn
: Warnings about potential issues or non-critical errors. This is often the default level.error
: Errors encountered during request processing.crit
: Critical conditions requiring immediate attention.alert
: Alerts requiring prompt action.emerg
: Emergency conditions causing the server to be unusable.
Setting a lower level (like info
or debug
) will include messages from all higher levels as well, potentially generating very large log files.
While error log formatting is limited compared to access logs, understanding its structure and controlling its level is vital for troubleshooting.
2. The Core Directives: log_format
and access_log
Customizing access logs revolves around two primary directives: log_format
and access_log
.
2.1 The log_format
Directive
This directive defines a named log format template. It specifies what information should be included in log entries and how it should be arranged.
- Syntax:
log_format <format_name> [<escape_type>] '<string_containing_variables>';
- Context:
http
- Purpose: Creates a reusable template for log entries.
Breakdown:
<format_name>
: A unique name you give to this format (e.g.,my_custom_format
,json_format
). This name will be referenced by theaccess_log
directive. Nginx has built-in formats likecombined
.[<escape_type>]
: An optional parameter specifying how characters should be escaped in the variables. Possible values are:default
: Escapes characters"
,\
, and other characters with values less than 32 or above 126 using hexadecimal notation (\xXX
). This is the default behavior.json
: Escapes characters according to JSON string rules (e.g.,"
becomes\"
,\
becomes\\
, newline becomes\n
, etc.). Essential for creating valid JSON log entries.none
: Disables all escaping. Use with extreme caution, as unescaped special characters can corrupt log parsing.
'<string_containing_variables>'
: A string defining the structure of the log entry. This string can contain:- Plain text (e.g., spaces, brackets, labels like
RespTime:
). - Nginx variables (prefixed with
$
, e.g.,$remote_addr
,$status
,$request_time
). These variables are replaced with their actual values when a request is logged.
- Plain text (e.g., spaces, brackets, labels like
Example:
“`nginx
http {
# Define a custom format named ‘detailed’
log_format detailed ‘$remote_addr – $remote_user [$time_local] ‘
‘”$request” $status $body_bytes_sent ‘
‘”$http_referer” “$http_user_agent” ‘
‘rt=$request_time urt=$upstream_response_time’;
# ... rest of http config ...
}
“`
This defines a new format called detailed
which is similar to combined
but adds two new fields at the end: rt=
followed by the request processing time ($request_time
) and urt=
followed by the upstream response time ($upstream_response_time
).
The log_format
directive must be placed within the http
context, making the defined format available globally within that context and its child contexts (like server
and location
). You cannot define log_format
inside a server
or location
block.
2.2 The access_log
Directive
This directive enables logging of requests to a specified file using a defined format. It tells Nginx where to log and which format to use.
- Syntax:
access_log <path> [<format_name> [buffer=<size>] [gzip[=<level>]] [flush=<time>] [if=<condition>]];
access_log off;
- Context:
http
,server
,location
,if in location
,limit_except
- Purpose: Specifies the log file path and the format to use for logging requests within its context.
Breakdown:
<path>
: The path to the log file. This can be a regular file path (e.g.,/var/log/nginx/my_app.log
) or special values likesyslog:
for logging via syslog, or/dev/stdout
,/dev/stderr
. Ensure the Nginx worker processes have write permissions to the specified file or directory.<format_name>
: (Optional) The name of the log format to use, as defined by alog_format
directive. If omitted, the defaultcombined
format is used.buffer=<size>
: (Optional) Enables buffering of log entries. Instead of writing each entry immediately, Nginx stores them in memory and writes them to the file in batches when the buffer is full. This can significantly improve performance by reducing disk I/O operations.<size>
specifies the buffer size (e.g.,32k
,1m
). Buffered entries are also written when the worker process exits or when the configuration is reloaded.gzip[=<level>]
: (Optional, requires Nginx compiled withngx_http_gzip_module
) Compresses buffered log entries before writing them to the file.<level>
specifies the compression level (1-9, default is 1). This requires thebuffer
parameter to be set. Saves disk space but consumes CPU resources for compression.flush=<time>
: (Optional) If buffering is enabled (buffer
parameter is set), this parameter specifies the maximum time log entries can stay in the buffer before being flushed to disk (e.g.,5s
,1m
). This ensures logs are written reasonably promptly even under low traffic.if=<condition>
: (Optional) Enables conditional logging. The request will only be logged if the specified<condition>
evaluates to true (non-empty string and not “0”). This is powerful for excluding certain requests (like health checks) from logs. The condition often uses variables and potentially themap
directive for more complex logic.off
: Special value to disable access logging within the current context. This overrides anyaccess_log
directives inherited from higher levels.
Example Usage:
“`nginx
http {
log_format detailed ‘$remote_addr – $remote_user [$time_local] ‘
‘”$request” $status $body_bytes_sent ‘
‘”$http_referer” “$http_user_agent” ‘
‘rt=$request_time urt=$upstream_response_time’;
log_format healthcheck_format '$remote_addr [$time_local] "$request" $status';
# Define a map to identify health check requests
map $request_uri $is_health_check {
/healthz 0;
/status 0;
default 1; # Log everything else
}
server {
listen 80;
server_name example.com;
# Log most requests using 'detailed' format with buffering
# but only if it's NOT a health check (map result is 1)
access_log /var/log/nginx/example.com_access.log detailed buffer=64k flush=1m if=$is_health_check;
# Log health checks separately with a simpler format and no buffering
location /healthz {
access_log /var/log/nginx/health_checks.log healthcheck_format;
return 200 "OK";
# Note: access_log in location overrides server level for this location
}
location /status {
access_log /var/log/nginx/health_checks.log healthcheck_format;
# Or disable logging completely for this location
# access_log off;
stub_status; # Example Nginx status module
}
location / {
# Requests here inherit the server-level access_log directive
proxy_pass http://backend_app;
}
}
}
“`
This example demonstrates:
* Defining multiple formats (detailed
, healthcheck_format
).
* Using buffering (buffer=64k
) and flushing (flush=1m
).
* Using conditional logging (if=$is_health_check
) based on a map
to exclude /healthz
and /status
requests from the main log.
* Overriding the server-level access_log
within specific location
blocks to log health checks differently (or disable logging).
Understanding the interplay between log_format
(defining what) and access_log
(defining where and when) is fundamental to mastering Nginx logging.
3. Nginx Variables: The Building Blocks of Custom Formats
The real power of custom log formats comes from Nginx variables. These variables provide access to a wealth of information about the client request, the server’s response, connection details, timing, and Nginx’s internal state. You embed these variables (prefixed with $
) into your log_format
string.
Nginx offers a vast number of built-in variables. Here’s a categorized list of some of the most commonly used and useful ones for log formatting:
Request-Related Variables:
$args
: Arguments in the request line (e.g.,foo=123&bar=abc
).$arg_<name>
: Value of a specific argument<name>
in the request line (e.g.,$arg_foo
would be123
).$request
: Full original request line (e.g.,"GET /page?id=10 HTTP/1.1"
).$request_method
: Request method (e.g.,GET
,POST
,PUT
).$request_uri
: Full original request URI including arguments (e.g.,/page?id=10
).$uri
: Current URI in request, potentially normalized or rewritten by internal redirects. Excludes arguments. (e.g.,/page
).$document_uri
: Same as$uri
.$scheme
: Request scheme (http
orhttps
).$query_string
: Same as$args
.$host
: The hostname from the request line or theHost
header. Order of preference: host name from request line, host name from “Host” request header field, or server name matching the request.$hostname
: The server’s hostname (fromgethostname()
syscall).$content_length
: Value of theContent-Length
request header.$content_type
: Value of theContent-Type
request header.$is_args
:?
if the request line has arguments, otherwise an empty string.$request_length
: Full request length including headers and body (bytes).$request_body
: The client request body (available if processed, might requireclient_body_in_single_buffer on;
). Be cautious logging this due to size and potential sensitivity.$request_body_file
: Path to the temporary file holding the request body.
HTTP Headers & Cookies:
$http_<header_name>
: Value of the specified request header<header_name>
. Hyphens in the header name are replaced with underscores, and the name is lowercased (e.g.,User-Agent
becomes$http_user_agent
,X-Forwarded-For
becomes$http_x_forwarded_for
).$cookie_<cookie_name>
: Value of the specified cookie<cookie_name>
(e.g.,$cookie_sessionid
).
Response-Related Variables:
$status
: Response status code (e.g.,200
,404
,503
).$body_bytes_sent
: Number of bytes sent to the client in the response body (excluding headers).$bytes_sent
: Total number of bytes sent to the client (including headers).$sent_http_<header_name>
: Value of the specified response header<header_name>
. Similar transformation rules as$http_
(underscores, lowercasing) apply (e.g.,$sent_http_content_type
,$sent_http_location
).$sent_trailer_<name>
: Value of a trailer field sent in the response.
Connection & Client Variables:
$remote_addr
: Client IP address.$remote_port
: Client port number.$remote_user
: Username supplied for HTTP Basic Authentication.$server_addr
: Address of the server which accepted the request.$server_port
: Port of the server which accepted the request.$server_protocol
: Protocol used for the request (e.g.,HTTP/1.1
,HTTP/2.0
).$connection
: Connection serial number.$connection_requests
: Number of requests made through the current connection.$binary_remote_addr
: Client address in binary form (useful for certain modules).
Time-Related Variables:
$time_local
: Local server time in the common log format (e.g.,15/Oct/2023:14:55:02 +0000
).$time_iso8601
: Local server time in ISO 8601 standard format (e.g.,2023-10-15T14:55:02+00:00
). Often preferred for machine parsing.$msec
: Current time in seconds with millisecond resolution (epoch time, e.g.,1697381702.123
). Useful as a precise timestamp.$request_time
: Request processing time in seconds with millisecond resolution. Measured from the first byte received from the client to the last byte sent to the client. Crucial for performance monitoring.
Nginx Internal & Location Variables:
$nginx_version
: Nginx version string.$pid
: Process ID (PID) of the Nginx worker process handling the request.$pipe
:p
if the request was pipelined,.
otherwise.$limit_rate
: Current rate limit set for the connection.$server_name
: Theserver_name
of the virtual host that processed the request.
Upstream Variables (when using Nginx as a reverse proxy):
$upstream_addr
: IP address and port (or socket path) of the upstream server(s) contacted. Multiple servers are comma-separated.$upstream_status
: Status code returned by the upstream server(s). Multiple codes are comma-separated.$upstream_response_time
: Time spent receiving the response from the upstream server(s), measured in seconds with millisecond resolution. For multiple upstreams, times are comma-separated. Crucial for diagnosing backend performance.$upstream_response_length
: Length of the response obtained from the upstream server(s).$upstream_connect_time
: Time spent establishing a connection with the upstream server(s).$upstream_header_time
: Time spent receiving the response header from the upstream server(s).$upstream_cache_status
: Shows whether the response was served from Nginx’s cache (HIT
,MISS
,EXPIRED
,STALE
,UPDATING
,REVALIDATED
,BYPASS
).
SSL/TLS Variables (when using HTTPS):
$ssl_protocol
: Protocol of the established SSL/TLS connection (e.g.,TLSv1.2
,TLSv1.3
).$ssl_cipher
: Cipher suite used for the established SSL/TLS connection.$ssl_client_s_dn
: Subject DN (Distinguished Name) of the client certificate for established SSL connection.$ssl_client_verify
: Result of client certificate verification (SUCCESS
,FAILED:reason
,NONE
).
GeoIP Module Variables (if GeoIP module is enabled and configured):
$geoip_country_code
: Two-letter country code (e.g.,US
,GB
).$geoip_city
: City name.$geoip_region_name
: Region or state name.- (Many others depending on the GeoIP database used)
Gzip Module Variables (if Gzip module is enabled):
$gzip_ratio
: Achieved compression ratio (original size / compressed size).
This list is not exhaustive, but it covers many of the variables you’ll frequently use. The official Nginx documentation for the ngx_http_log_module
and other modules (like ngx_http_upstream_module
, ngx_http_ssl_module
) provides the full list.
When constructing your log_format
string, you simply include the desired variables prefixed with $
. Nginx will substitute their values at request processing time.
Handling Missing Headers/Variables: Many variables (especially those derived from optional request headers like $http_referer
or $http_user_agent
, or authentication like $remote_user
) will evaluate to a hyphen (-
) or an empty string if the corresponding data is not present in the request. This is standard behavior in log formats like combined
.
Getting the Real Client IP Behind a Proxy:
A very common requirement is logging the original client IP address when Nginx is behind a load balancer or reverse proxy. These proxies often add headers like X-Forwarded-For
or X-Real-IP
containing the original client IP.
You can log these directly:
nginx
log_format proxy_aware '$http_x_forwarded_for - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" "$http_user_agent"';
However, the X-Forwarded-For
header can contain a list of IPs (client, proxy1, proxy2
). A more robust approach is to use the ngx_http_realip_module
:
“`nginx
http {
# Define trusted proxy IPs/networks
set_real_ip_from 192.168.1.0/24;
set_real_ip_from 10.0.0.1;
# Specify which header contains the real IP
real_ip_header X-Forwarded-For;
# Optionally, use the rightmost IP if multiple exist in X-Forwarded-For
# real_ip_recursive on;
log_format realip_combined '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent"';
# ...
access_log /var/log/nginx/access.log realip_combined;
}
``
realip
When themodule is configured correctly, the
$remote_addrvariable itself will be updated to contain the actual client IP extracted from the specified header (provided the request came from a trusted proxy). This allows you to keep using
$remote_addr` in your log formats while getting the correct value.
4. Creating Your First Custom Log Format
Let’s create a practical custom log format that enhances the default combined
format by adding request processing time and upstream response time, and uses the ISO 8601 timestamp format.
Step 1: Define the Format using log_format
Place this within the http
block of your nginx.conf
or a file included within the http
block:
“`nginx
http {
# … other http settings …
# Define our custom format named 'enhanced'
log_format enhanced '$remote_addr - $remote_user [$time_iso8601] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'rt=$request_time urt=$upstream_response_time uct=$upstream_connect_time uht=$upstream_header_time';
# If behind a proxy, configure realip module (optional but recommended)
# set_real_ip_from 192.168.1.1;
# real_ip_header X-Forwarded-For;
# ... server blocks etc ...
}
“`
We named our format enhanced
. Compared to combined
, we changed $time_local
to $time_iso8601
and added fields prefixed with descriptive labels:
* rt=
for $request_time
(total request time)
* urt=
for $upstream_response_time
(time waiting for upstream response)
* uct=
for $upstream_connect_time
(time connecting to upstream)
* uht=
for $upstream_header_time
(time waiting for upstream headers)
Using labels like rt=
makes the log entries more self-descriptive, although it increases log size slightly. The upstream variables ($upstream_*
) will log a hyphen (-
) if the request wasn’t proxied (e.g., serving static files directly).
Step 2: Apply the Format using access_log
Now, apply this format within a server
or location
block where you want to use it. You can replace the default access_log
directive or add a new one.
“`nginx
http {
# … log_format definition from Step 1 …
server {
listen 80;
server_name api.example.com;
# Apply the 'enhanced' format to this server block
access_log /var/log/nginx/api.example.com_access.log enhanced;
location / {
proxy_pass http://backend_api_server;
# Standard proxy headers often needed by backends
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
server {
listen 80;
server_name static.example.com;
# Maybe use a simpler format or the default 'combined' for static files
access_log /var/log/nginx/static.example.com_access.log combined;
location / {
root /var/www/static_files;
try_files $uri $uri/ =404;
}
}
}
“`
In this setup:
* Requests to api.example.com
will be logged to /var/log/nginx/api.example.com_access.log
using our enhanced
format, including the valuable upstream timing information.
* Requests to static.example.com
will be logged to /var/log/nginx/static.example.com_access.log
using the standard combined
format.
Step 3: Test and Reload Configuration
Before applying changes to a production server, always test your Nginx configuration syntax:
bash
sudo nginx -t
If the syntax is okay, reload Nginx gracefully to apply the changes without dropping connections:
“`bash
sudo systemctl reload nginx
or
sudo service nginx reload
or directly: sudo nginx -s reload
“`
Now, make some requests to api.example.com
and static.example.com
and check the contents of their respective log files (/var/log/nginx/*.log
). You should see entries formatted according to the specified log_format
.
An entry in api.example.com_access.log
might look like:
1.2.3.4 - - [2023-10-15T15:10:30+00:00] "POST /v1/users HTTP/1.1" 201 55 "-" "MyAppClient/1.0" rt=0.152 urt=0.135 uct=0.002 uht=0.134
This clearly shows the request took 152ms total (rt=0.152
), and most of that time (135ms) was spent waiting for the upstream backend (urt=0.135
). The connection to upstream was fast (uct=0.002
), and receiving upstream headers took 134ms (uht=0.134
). This level of detail is invaluable for performance analysis.
5. Advanced Log Formatting Techniques
Beyond basic custom formats, Nginx offers more advanced capabilities.
5.1 Logging in JSON Format
Traditional space-separated or custom-delimited log formats can be challenging for automated tools to parse reliably, especially if fields like user agents or referrers contain unexpected characters or delimiters. JSON (JavaScript Object Notation) provides a structured, widely supported format that is much easier for machines to ingest and process. Log aggregation and analysis platforms like the ELK Stack (Elasticsearch, Logstash, Kibana), Graylog, Splunk, Datadog, etc., work exceptionally well with JSON logs.
To create JSON logs in Nginx, you use the log_format
directive with the json
escape type and structure your string as a JSON object.
Step 1: Define the JSON Format
“`nginx
http {
# … other settings …
log_format json_log escape=json
'{'
'"timestamp": "$time_iso8601", '
'"client_ip": "$remote_addr", '
'"real_ip": "$http_x_forwarded_for", ' # Log proxy header separately if needed
'"request": "$request", '
'"status": $status, ' # Status is numeric, no quotes needed in JSON
'"body_bytes_sent": $body_bytes_sent, ' # Numeric
'"request_time": $request_time, ' # Numeric
'"http_referer": "$http_referer", '
'"http_user_agent": "$http_user_agent", '
'"upstream_addr": "$upstream_addr", '
'"upstream_status": "$upstream_status", ' # Often numeric, but can be list, so keep as string
'"upstream_response_time": "$upstream_response_time", ' # Can be list, keep as string or parse later
'"server_name": "$server_name", '
'"request_method": "$request_method", '
'"uri": "$uri", '
'"args": "$args"'
'}';
# Configure realip module if necessary, $remote_addr will be correct then
# set_real_ip_from ...
# real_ip_header ...
# ... server blocks ...
}
“`
Key Points for JSON Formatting:
escape=json
: This is crucial. It ensures that special characters within variable values (like quotes in the User-Agent) are properly escaped ("
becomes\"
, etc.) to maintain valid JSON structure.- JSON Structure: The format string itself must be a valid JSON object structure. Each Nginx variable should correspond to a JSON value.
- Quotes:
- JSON keys (
"timestamp"
,"client_ip"
, etc.) must be enclosed in double quotes. - Nginx variables that produce string values (
$time_iso8601
,$remote_addr
,$request
, etc.) should also be enclosed in double quotes within the format string. Theescape=json
will handle escaping any quotes inside the variable’s value. - Nginx variables that produce numeric values (
$status
,$body_bytes_sent
,$request_time
) can be included without surrounding double quotes in the format string. This allows log analysis tools to treat them directly as numbers. Be careful though: if a variable might sometimes be non-numeric (e.g.,$upstream_status
could be-
or a list), it’s safer to quote it as a string in the JSON format and handle parsing later.
- JSON keys (
- Commas: Separate key-value pairs with commas, but ensure there’s no trailing comma after the last pair.
- Whitespace: Whitespace (spaces, newlines) outside of quoted strings is ignored by JSON parsers but can make the
log_format
definition more readable.
Step 2: Apply the JSON Format
“`nginx
server {
listen 80;
server_name service.example.com;
# Use the json_log format defined in the http block
access_log /var/log/nginx/service_access.json json_log;
# ... rest of server config ...
}
“`
Resulting Log Entry:
A request logged using this format would look like this in /var/log/nginx/service_access.json
:
json
{"timestamp": "2023-10-15T15:30:45+00:00", "client_ip": "10.0.0.50", "real_ip": "1.2.3.4, 10.0.0.50", "request": "GET /api/data?limit=10 HTTP/1.1", "status": 200, "body_bytes_sent": 1024, "request_time": 0.045, "http_referer": "-", "http_user_agent": "curl/7.81.0", "upstream_addr": "192.168.10.10:8080", "upstream_status": "200", "upstream_response_time": "0.040", "server_name": "service.example.com", "request_method": "GET", "uri": "/api/data", "args": "limit=10"}
This structured format is immediately ready for ingestion into various log management systems.
5.2 Conditional Logging
Sometimes, you don’t want to log every single request. Common scenarios include:
- Excluding health checks from main application logs.
- Logging only specific error responses (e.g., 5xx errors).
- Logging requests from specific IP addresses or user agents for debugging.
The if=<condition>
parameter of the access_log
directive allows for this. The condition is evaluated for each request, and logging only occurs if the condition is true (evaluates to a non-empty string that is not “0”).
Simple Example: Don’t Log Health Checks
“`nginx
server {
listen 80;
server_name app.example.com;
# Don't log requests to /healthz
location = /healthz {
access_log off; # Disable logging specifically for this location
return 200 "OK";
}
location / {
# Log other requests normally
access_log /var/log/nginx/app_access.log combined;
proxy_pass http://app_backend;
}
}
``
access_log off;` within the specific location, which is often the simplest way for exact path matches.
This uses
Using if=
for More Complex Conditions:
The if=
parameter provides more flexibility. However, using the if
directive directly inside location
blocks is generally discouraged (“if is evil”). A more robust and recommended approach for conditional logic affecting logging (and other directives) is to use the map
directive.
Using map
for Conditional Logging:
The map
directive allows you to create a new variable whose value depends on the value of another variable. It’s typically defined in the http
context.
Example: Don’t Log Static Assets or Health Checks
“`nginx
http {
# … log formats …
# Map request URI to a variable $loggable. 1 means log, 0 means don't log.
map $request_uri $loggable {
~*\.(jpg|jpeg|gif|png|css|js)$ 0; # Don't log common static files
/healthz 0; # Don't log health check endpoint
/status 0; # Don't log status endpoint
default 1; # Log everything else
}
server {
listen 80;
server_name www.example.com;
# Apply the condition using the $loggable variable
access_log /var/log/nginx/www_access.log combined if=$loggable;
location / {
# ... serve dynamic content or proxy ...
proxy_pass http://main_app;
}
location ~* \.(jpg|jpeg|gif|png|css|js)$ {
root /var/www/html/static;
expires 1d;
# No access_log needed here, it inherits the conditional one from server level
}
location = /healthz {
# No access_log needed here either
return 200 "OK";
}
location = /status {
# No access_log needed here either
stub_status;
}
}
}
“`
In this example:
1. The map
block checks the $request_uri
.
2. If it matches the regular expression for common static file extensions (~*\.(jpg|...|js)$
) or is exactly /healthz
or /status
, the $loggable
variable is set to 0
.
3. For any other URI (default
), $loggable
is set to 1
.
4. The access_log
directive in the server
block uses if=$loggable
. Only requests where $loggable
is 1
will be written to the log file.
This map
-based approach is generally considered cleaner and more performant than complex if
structures within location
blocks for conditional logging.
Example: Logging Only Errors
“`nginx
http {
# Map non-2xx/3xx status codes to $log_error. 1 means log, 0 means don’t.
map $status $log_error {
~^[23].. 0; # Don’t log 2xx or 3xx status codes
default 1; # Log everything else (1xx, 4xx, 5xx)
}
server {
listen 80;
server_name api.example.com;
# Log only if $log_error is 1 (i.e., status is not 2xx/3xx)
access_log /var/log/nginx/api_errors.log combined if=$log_error;
# We probably still want a main log for all requests
access_log /var/log/nginx/api_access.log combined;
# ...
}
}
``
access_log` directives: one conditional log for errors, and one unconditional log for all requests.
This uses two
5.3 Multiple Access Logs
You can specify multiple access_log
directives within the same context (http
, server
, location
). This causes Nginx to write log entries to all specified destinations for matching requests.
Use Cases:
- Different Formats: Log the same request using different formats to different files (e.g., a detailed JSON log for machine analysis and a simpler
combined
format for quick human checks). - Filtered Logs: Send all logs to a main file, and conditionally send a subset (e.g., errors, specific paths) to another file or system (like syslog).
- Centralized and Local Logging: Log locally for immediate access and also send logs via syslog to a central collector.
Example:
“`nginx
http {
log_format combined_extra ‘$combined “$http_x_request_id”‘; # Add request ID
log_format json_detailed escape=json ‘{ … }’; # Your detailed JSON format
map $status $is_error {
~^[45].. 1; # 4xx or 5xx status
default 0;
}
server {
listen 80;
server_name critical.example.com;
# Log 1: Main log file using a custom 'combined_extra' format
access_log /var/log/nginx/critical_access.log combined_extra buffer=128k flush=30s;
# Log 2: Detailed JSON log file
access_log /var/log/nginx/critical_access.json json_detailed buffer=1m flush=1m;
# Log 3: Send only errors (4xx/5xx) via syslog using 'combined_extra' format
access_log syslog:server=10.1.1.10:514,tag=nginx_errors combined_extra if=$is_error;
location / {
# Add a request ID header (requires ngx_http_random_index_module or similar)
# Or better, use $request_id available in newer Nginx versions
# proxy_set_header X-Request-ID $request_id; # Newer Nginx
# set $reqid $pid-$msec-$connection; # Older way to approximate
# proxy_set_header X-Request-ID $reqid;
proxy_pass http://critical_backend;
}
}
}
“`
In this setup, for every request to critical.example.com
:
1. An entry is written to /var/log/nginx/critical_access.log
using the combined_extra
format, with buffering.
2. An entry is written to /var/log/nginx/critical_access.json
using the json_detailed
format, with potentially larger buffering.
3. If the response status is 4xx or 5xx ($is_error
is 1), an entry is also sent via UDP syslog to 10.1.1.10:514
with the tag nginx_errors
, using the combined_extra
format.
Be mindful that each access_log
directive adds overhead. Using multiple logs increases disk I/O and potentially CPU usage (especially with JSON formatting or compression).
5.4 Buffering and Flushing (buffer
, flush
, gzip
)
As mentioned earlier, the buffer
, flush
, and gzip
parameters on the access_log
directive can significantly impact performance and resource usage.
-
buffer=<size>
:- Pros: Reduces the frequency of disk write operations, lowering disk I/O load and potentially improving overall Nginx performance, especially under high traffic. Disk writes happen in larger chunks.
- Cons: Log entries are delayed. If Nginx crashes unexpectedly, log entries still in the buffer might be lost. Requires extra memory for the buffer (per worker process).
- Recommendation: Generally recommended for high-traffic sites. Start with a moderate buffer size (e.g.,
64k
or128k
) and monitor performance and memory usage.
-
flush=<time>
:- Pros: Mitigates the log delay introduced by buffering. Ensures that logs are written periodically even if the buffer isn’t full, making recent entries available sooner for monitoring.
- Cons: Slightly increases the number of write operations compared to only flushing when the buffer is full.
- Recommendation: Use in conjunction with
buffer
. Choose a time that balances promptness with I/O efficiency (e.g.,5s
to1m
).
-
gzip[=<level>]
:- Pros: Significantly reduces the size of log files on disk, saving storage space. Especially effective with text-based logs (including JSON).
- Cons: Consumes CPU cycles for compression. Adds a slight delay to log writing due to the compression step. Requires buffering (
buffer
must be enabled). Requires Nginx compiled with thengx_http_gzip_module
. - Recommendation: Consider if disk space is a major constraint and you have spare CPU capacity. Test the impact on performance. A lower compression level (
gzip=1
) offers a good balance between compression ratio and CPU usage. Note that external tools likelogrotate
can also handle log compression, often preferred as it runs separately from Nginx worker processes.
Example with Buffering, Flushing, and Gzip:
nginx
access_log /var/log/nginx/compressed_access.log.gz combined buffer=1m flush=5s gzip=1;
This logs using the combined
format to a gzipped file, buffering up to 1MB of data, flushing buffered data every 5 seconds if not full, and using gzip compression level 1.
6. Error Log Customization (Limited)
Compared to the access log, the error_log
directive offers less flexibility in formatting. You primarily control:
- File Path: Where the errors are logged (file, stderr, syslog, memory).
- Severity Level: Which messages get logged (
debug
,info
,notice
,warn
,error
,crit
,alert
,emerg
).
Syntax: error_log <file> | stderr | syslog:server=<address>[,parameter=value] | memory:<size> [<level>];
Context: main
, http
, mail
, stream
, server
, location
<file>
: Path to the log file.stderr
: Log to standard error. Often used when running Nginx in containers.syslog:server=...
: Log to a syslog server.memory:<size>
: Log to an in-memory cyclic buffer (useful for debugging, accessible via gdb).<level>
: Minimum severity level to log (default iserror
).
Example:
“`nginx
Global default error log
error_log /var/log/nginx/error.log warn;
http {
# …
server {
listen 443 ssl;
server_name secure.example.com;
# Override error log for this server, log info and higher
error_log /var/log/nginx/secure_error.log info;
location /debug-section {
# Even more verbose logging for a specific location (requires debug build)
error_log /var/log/nginx/debug_section_error.log debug;
# ...
}
}
}
“`
While you cannot easily change the structure of error log messages like you can with log_format
for access logs, you can control their destination and verbosity effectively for different parts of your configuration.
7. Performance Considerations and Best Practices
Configuring logging effectively involves balancing the need for detailed information with performance and resource management.
- Complexity vs. Performance: Very complex
log_format
strings involving many variables, especially those requiring significant computation (like regex captures withinmap
directives used in the format, though less common), can add minor overhead to each request. Keep formats as simple as meets your needs. - Disk I/O: Logging is inherently I/O-bound.
- Use buffering (
buffer
parameter) on high-traffic sites to reduce disk write frequency. - Ensure Nginx logs are written to fast storage (SSDs are highly recommended).
- Consider using syslog to send logs off-box, potentially reducing local disk I/O, but introducing network latency and reliance on the syslog collector.
- Avoid logging excessive data: Don’t log full request/response bodies unless absolutely necessary for debugging specific issues, and only temporarily.
- Use buffering (
- JSON Logging Overhead: While incredibly useful, JSON formatting can slightly increase log volume compared to compact formats like
combined
due to keys and quotes. Compression (gzip
or externallogrotate
) can mitigate this. The benefits of structured data usually outweigh the slight size increase. - Conditional Logging: Use
map
+if=
effectively to filter out noise (health checks, static assets) from main application logs, reducing log volume and I/O. - Log Rotation: Nginx itself doesn’t rotate logs. Log files will grow indefinitely unless managed. Use standard system tools like
logrotate
(common on Linux) to periodically archive, compress, and delete old log files. Ensurelogrotate
is configured correctly to signal Nginx (USR1
signal) to reopen log files after rotation.- A typical
logrotate
configuration for Nginx might includecopytruncate
or, preferably, apostrotate
script that sendsnginx -s reopen
orkill -USR1 $(cat /var/run/nginx.pid)
. The latter is better as it avoids potential data loss during truncation.
- A typical
- Security and Privacy:
- Be extremely cautious about logging sensitive information like passwords, API keys, session tokens, or personally identifiable information (PII). Review your formats carefully.
- Avoid logging the full
$request_body
unless strictly necessary and ensure access to logs is tightly controlled. - Scrub sensitive data if possible before logging, perhaps using Lua scripting with
log_by_lua_block
if complex filtering is needed (advanced topic).
- Consistency: Use consistent log formats (especially JSON schemas) across your Nginx instances and applications where possible. This simplifies log aggregation, parsing, and analysis.
- Monitoring Log Volume: Keep an eye on log generation rates and disk space usage. Adjust log levels, conditional logging, or sampling if necessary.
- Testing: Always test configuration changes (
nginx -t
) before reloading (nginx -s reload
). After reloading, monitor logs to ensure they are being written correctly and check Nginx error logs for any issues related to the new configuration.
8. Troubleshooting Common Issues
- Syntax Errors: Running
nginx -t
is your first step. It will point out typos, incorrect directive placements, or invalid parameter values in yourlog_format
oraccess_log
lines. Pay close attention to quotes, semicolons, and valid variable names. - Permission Denied: Nginx worker processes (usually run as a non-root user like
nginx
orwww-data
) need write permissions on the log file and execute permissions on all parent directories leading to it. Check file and directory ownership and permissions (ls -l
,namei -om /path/to/your/log/file.log
). - Logs Not Appearing:
- Check permissions (see above).
- Verify the
access_log
path is correct. - If using conditional logging (
if=
), ensure the condition evaluates correctly for the requests you expect to be logged. Add the condition variable temporarily to the log format to debug its value. - If using buffering (
buffer=
), there might be a delay. Wait for the buffer to fill or theflush
interval to pass. Try temporarily removing buffering to see if logs appear immediately. - Check if
access_log off;
is specified in a higher-level or conflicting block. - Ensure the Nginx configuration was actually reloaded (
nginx -s reload
).
- Incorrect Data / Hyphens (
-
) in Logs:- A hyphen often means the variable had no value (e.g.,
$http_referer
when no referer header was sent,$upstream_response_time
for non-proxied requests,$remote_user
when no auth). This is often normal. - Check for typos in variable names (
$request_time
vs$request_ime
). Invalid variable names often evaluate to an empty string or their literal name, depending on context. - Ensure the necessary modules are active (e.g.,
realip
module for correct$remote_addr
behind proxy,upstream
module context for$upstream_*
variables).
- A hyphen often means the variable had no value (e.g.,
- JSON Logs Invalid:
- Ensure you used
escape=json
in thelog_format
definition. - Double-check the JSON structure: proper quoting of keys and string values, correct comma placement, balanced braces
{}
. Validate a sample log entry using a JSON validator.
- Ensure you used
- High CPU Usage after Changes:
- If you enabled
gzip
logging, this might be the cause. Try lowering the level or disabling it. - Very complex
map
directives or regexes used in conditions or formats could contribute, but this is less common.
- If you enabled
Always check the Nginx error_log
for clues when troubleshooting access log issues. It might contain messages about file permission problems or configuration errors encountered during request processing.
9. Integration with Log Analysis Tools
The primary reason for investing time in custom log formatting, especially JSON, is to facilitate easier integration with log management and analysis platforms.
- ELK Stack (Elasticsearch, Logstash, Kibana): Logstash’s
json
filter plugin can directly parse Nginx JSON logs. Filebeat (an Elastic agent) can harvest log files and forward them to Logstash or directly to Elasticsearch. Kibana provides powerful visualization and analysis capabilities. - Graylog: Has built-in JSON parsers and can easily ingest structured Nginx logs sent via Filebeat, syslog, or other collectors.
- Splunk: Uses forwarders to collect log data. Properly formatted JSON logs are easily parsed by Splunk, mapping JSON fields to Splunk index fields.
- Datadog, Grafana Loki, Fluentd, etc.: Most modern observability and log aggregation tools have robust support for JSON log ingestion.
Using a consistent, structured format like JSON significantly reduces the complexity of setting up parsing rules in these tools, making data readily available for dashboards, alerting, and ad-hoc analysis. Key metrics like request rates, error rates (4xx/5xx counts), request latency ($request_time
), upstream latency ($upstream_response_time
), and traffic volume ($body_bytes_sent
) become trivial to track and visualize.
Conclusion
Nginx logging is a powerful feature that transforms raw server activity into actionable insights. While the default combined
format provides a starting point, mastering the log_format
and access_log
directives unlocks the ability to tailor logs precisely to your operational needs.
We’ve journeyed from understanding the default logs to defining custom formats using a wide array of Nginx variables. We explored the significant advantages of structured JSON logging for automated analysis and the techniques for conditional logging to reduce noise. We also touched upon performance implications, best practices like log rotation and security awareness, and common troubleshooting steps.
By leveraging custom Nginx log formats, you can:
- Enhance Debugging: Capture specific headers, timing details, or upstream information crucial for diagnosing problems.
- Improve Monitoring: Track key performance indicators like request latency, upstream performance, and error rates accurately.
- Streamline Analysis: Produce logs in structured formats (like JSON) that are easily ingested and queried by log management platforms.
- Optimize Resources: Use buffering, compression, and conditional logging to manage disk I/O, storage space, and CPU usage effectively.
Investing time in configuring your Nginx logs thoughtfully is an investment in the observability, stability, and performance of your web applications. Start with small additions to the default format, experiment with JSON, filter out noise, and build a logging strategy that provides clear visibility into how your Nginx instances are serving your users. The detailed information captured in well-formatted logs is often the key to quickly resolving issues and optimizing performance in complex web environments.