Okay, here is the detailed article on Getting Started with Nginx Log Formatting.

Mastering Nginx Logs: A Comprehensive Guide to Custom Log Formatting

Nginx is renowned for its high performance, stability, and rich feature set, making it one of the most popular web servers and reverse proxies in the world. Behind the scenes of serving web pages, handling API requests, or balancing load, Nginx diligently records crucial information about its operations in log files. These logs are invaluable resources for debugging issues, monitoring performance, understanding user behavior, ensuring security, and performing audits.

By default, Nginx provides useful logging capabilities, but the true power lies in its flexibility to customize what gets logged and how it’s formatted. Standard log formats might suffice for basic needs, but as applications grow in complexity and monitoring requirements become more sophisticated, tailoring log formats becomes essential. Custom formatting allows you to capture specific data points relevant to your application, structure logs for easier parsing by automated tools, and optimize logging for performance and storage.

This comprehensive guide will walk you through the intricacies of Nginx log formatting. We’ll start with the basics of default Nginx logs, delve into the core directives (log_format and access_log), explore the vast array of available Nginx variables, demonstrate how to create custom formats (including the increasingly popular JSON format), discuss conditional logging, touch upon error log specifics, and cover best practices and performance considerations. By the end of this article, you’ll have the knowledge and tools to effectively configure Nginx logging to meet your specific requirements.

1. Understanding Nginx’s Default Logging Behavior

Before customizing, it’s crucial to understand what Nginx logs by default. Nginx primarily maintains two types of logs:

Access Logs: Record information about client requests processed by the server. Every time a client (like a browser or an API client) requests a resource from Nginx, an entry is typically added to the access log.
Error Logs: Record diagnostic information about problems encountered during Nginx’s operation, ranging from minor warnings to critical errors that might prevent the server from functioning correctly.

1.1 Default Access Log

By default, Nginx usually comes pre-configured with an access log directive, often pointing to a file like /var/log/nginx/access.log (the exact path can vary depending on the operating system distribution and installation method). It typically uses a predefined format named combined.

A typical default access_log directive in the main nginx.conf or a specific virtual host configuration might look like this:

“`nginx
http {
# … other http settings …

log_format combined '$remote_addr - $remote_user [$time_local] "$request" '
                  '$status $body_bytes_sent "$http_referer" '
                  '"$http_user_agent"';

access_log /var/log/nginx/access.log combined;

# ... server blocks etc ...

}
“`

Let’s break down the fields in the default combined format:

$remote_addr: The IP address of the client making the request. If Nginx is behind a load balancer or reverse proxy, this might be the IP of the proxy, not the original client (we’ll discuss how to handle this later).
-: A literal hyphen, often used as a placeholder.
$remote_user: The username provided for HTTP Basic Authentication. If authentication is not used or fails, this will be a hyphen (-).
[$time_local]: The local server time when the request was processed, enclosed in square brackets. The format is usually like dd/MMM/yyyy:HH:mm:ss +ZZZZ (e.g., 15/Oct/2023:10:30:05 +0000).
"$request": The full original request line, enclosed in double quotes. This includes the HTTP method, the requested URI (including arguments), and the HTTP protocol version (e.g., "GET /images/logo.png HTTP/1.1").
$status: The HTTP status code returned to the client (e.g., 200, 404, 500).
$body_bytes_sent: The number of bytes sent to the client in the response body, excluding the response headers.
"$http_referer": The URL of the page from which the request originated (the content of the Referer HTTP header), enclosed in double quotes. If the header is missing or empty, it’s logged as "-".
"$http_user_agent": The client’s user agent identification string (the content of the User-Agent HTTP header), enclosed in double quotes. This usually identifies the browser, operating system, or client application.

A sample log entry in the combined format might look like:

192.168.1.101 - - [15/Oct/2023:10:30:05 +0000] "GET /index.html HTTP/1.1" 200 1543 "http://example.com/previous_page" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"

While combined provides a good baseline, it often lacks information crucial for modern web application monitoring, such as request processing time, upstream response times (if using Nginx as a reverse proxy), or custom application-specific headers.

1.2 Default Error Log

The error log captures issues encountered by Nginx itself. Its default location is often /var/log/nginx/error.log. Unlike the access log, its format is less customizable through simple directives, but its verbosity can be controlled.

A typical default error_log directive might look like:

nginx error_log /var/log/nginx/error.log warn;

The format usually includes:

Timestamp: Date and time of the event.
Severity Level: Indicates the seriousness of the event.
Process ID (PID) and Thread ID (TID): Identifies the Nginx worker process and thread handling the request.
Client Information (optional): Sometimes includes the client IP address and request details if the error is associated with a specific client request.
Error Message: A descriptive message detailing the problem.

The severity levels, from lowest to highest, are:

debug: Detailed debugging information (requires Nginx to be compiled with debugging support).
info: Informational messages about normal operations.
notice: Notable events that are part of normal operation but might be worth noting.
warn: Warnings about potential issues or non-critical errors. This is often the default level.
error: Errors encountered during request processing.
crit: Critical conditions requiring immediate attention.
alert: Alerts requiring prompt action.
emerg: Emergency conditions causing the server to be unusable.

Setting a lower level (like info or debug) will include messages from all higher levels as well, potentially generating very large log files.

While error log formatting is limited compared to access logs, understanding its structure and controlling its level is vital for troubleshooting.

2. The Core Directives: `log_format` and `access_log`

Customizing access logs revolves around two primary directives: log_format and access_log.

2.1 The `log_format` Directive

This directive defines a named log format template. It specifies what information should be included in log entries and how it should be arranged.

Syntax: log_format <format_name> [<escape_type>] '<string_containing_variables>';
Context: http
Purpose: Creates a reusable template for log entries.

Breakdown:

<format_name>: A unique name you give to this format (e.g., my_custom_format, json_format). This name will be referenced by the access_log directive. Nginx has built-in formats like combined.
[<escape_type>]: An optional parameter specifying how characters should be escaped in the variables. Possible values are:
- default: Escapes characters ", \, and other characters with values less than 32 or above 126 using hexadecimal notation (\xXX). This is the default behavior.
- json: Escapes characters according to JSON string rules (e.g., " becomes \", \ becomes \\, newline becomes \n, etc.). Essential for creating valid JSON log entries.
- none: Disables all escaping. Use with extreme caution, as unescaped special characters can corrupt log parsing.
'<string_containing_variables>': A string defining the structure of the log entry. This string can contain:
- Plain text (e.g., spaces, brackets, labels like RespTime:).
- Nginx variables (prefixed with $, e.g., $remote_addr, $status, $request_time). These variables are replaced with their actual values when a request is logged.

Example:

“`nginx
http {
# Define a custom format named ‘detailed’
log_format detailed ‘$remote_addr – $remote_user [$time_local] ‘
‘”$request” $status $body_bytes_sent ‘
‘”$http_referer” “$http_user_agent” ‘
‘rt=$request_time urt=$upstream_response_time’;

# ... rest of http config ...

}
“`

This defines a new format called detailed which is similar to combined but adds two new fields at the end: rt= followed by the request processing time ($request_time) and urt= followed by the upstream response time ($upstream_response_time).

The log_format directive must be placed within the http context, making the defined format available globally within that context and its child contexts (like server and location). You cannot define log_format inside a server or location block.

2.2 The `access_log` Directive

This directive enables logging of requests to a specified file using a defined format. It tells Nginx where to log and which format to use.

Syntax: access_log <path> [<format_name> [buffer=<size>] [gzip[=<level>]] [flush=<time>] [if=<condition>]];
access_log off;
Context: http, server, location, if in location, limit_except
Purpose: Specifies the log file path and the format to use for logging requests within its context.

Breakdown:

<path>: The path to the log file. This can be a regular file path (e.g., /var/log/nginx/my_app.log) or special values like syslog: for logging via syslog, or /dev/stdout, /dev/stderr. Ensure the Nginx worker processes have write permissions to the specified file or directory.
<format_name>: (Optional) The name of the log format to use, as defined by a log_format directive. If omitted, the default combined format is used.
buffer=<size>: (Optional) Enables buffering of log entries. Instead of writing each entry immediately, Nginx stores them in memory and writes them to the file in batches when the buffer is full. This can significantly improve performance by reducing disk I/O operations. <size> specifies the buffer size (e.g., 32k, 1m). Buffered entries are also written when the worker process exits or when the configuration is reloaded.
gzip[=<level>]: (Optional, requires Nginx compiled with ngx_http_gzip_module) Compresses buffered log entries before writing them to the file. <level> specifies the compression level (1-9, default is 1). This requires the buffer parameter to be set. Saves disk space but consumes CPU resources for compression.
flush=<time>: (Optional) If buffering is enabled (buffer parameter is set), this parameter specifies the maximum time log entries can stay in the buffer before being flushed to disk (e.g., 5s, 1m). This ensures logs are written reasonably promptly even under low traffic.
if=<condition>: (Optional) Enables conditional logging. The request will only be logged if the specified <condition> evaluates to true (non-empty string and not “0”). This is powerful for excluding certain requests (like health checks) from logs. The condition often uses variables and potentially the map directive for more complex logic.
off: Special value to disable access logging within the current context. This overrides any access_log directives inherited from higher levels.

Example Usage:

“`nginx
http {
log_format detailed ‘$remote_addr – $remote_user [$time_local] ‘
‘”$request” $status $body_bytes_sent ‘
‘”$http_referer” “$http_user_agent” ‘
‘rt=$request_time urt=$upstream_response_time’;

log_format healthcheck_format '$remote_addr [$time_local] "$request" $status';

# Define a map to identify health check requests
map $request_uri $is_health_check {
    /healthz 0;
    /status  0;
    default  1; # Log everything else
}

server {
    listen 80;
    server_name example.com;

    # Log most requests using 'detailed' format with buffering
    # but only if it's NOT a health check (map result is 1)
    access_log /var/log/nginx/example.com_access.log detailed buffer=64k flush=1m if=$is_health_check;

    # Log health checks separately with a simpler format and no buffering
    location /healthz {
        access_log /var/log/nginx/health_checks.log healthcheck_format;
        return 200 "OK";
        # Note: access_log in location overrides server level for this location
    }

    location /status {
         access_log /var/log/nginx/health_checks.log healthcheck_format;
         # Or disable logging completely for this location
         # access_log off;
         stub_status; # Example Nginx status module
    }

    location / {
        # Requests here inherit the server-level access_log directive
        proxy_pass http://backend_app;
    }
}

}
“`

This example demonstrates:
* Defining multiple formats (detailed, healthcheck_format).
* Using buffering (buffer=64k) and flushing (flush=1m).
* Using conditional logging (if=$is_health_check) based on a map to exclude /healthz and /status requests from the main log.
* Overriding the server-level access_log within specific location blocks to log health checks differently (or disable logging).

Understanding the interplay between log_format (defining what) and access_log (defining where and when) is fundamental to mastering Nginx logging.

3. Nginx Variables: The Building Blocks of Custom Formats

The real power of custom log formats comes from Nginx variables. These variables provide access to a wealth of information about the client request, the server’s response, connection details, timing, and Nginx’s internal state. You embed these variables (prefixed with $) into your log_format string.

Nginx offers a vast number of built-in variables. Here’s a categorized list of some of the most commonly used and useful ones for log formatting:

Request-Related Variables:

$args: Arguments in the request line (e.g., foo=123&bar=abc).
$arg_<name>: Value of a specific argument <name> in the request line (e.g., $arg_foo would be 123).
$request: Full original request line (e.g., "GET /page?id=10 HTTP/1.1").
$request_method: Request method (e.g., GET, POST, PUT).
$request_uri: Full original request URI including arguments (e.g., /page?id=10).
$uri: Current URI in request, potentially normalized or rewritten by internal redirects. Excludes arguments. (e.g., /page).
$document_uri: Same as $uri.
$scheme: Request scheme (http or https).
$query_string: Same as $args.
$host: The hostname from the request line or the Host header. Order of preference: host name from request line, host name from “Host” request header field, or server name matching the request.
$hostname: The server’s hostname (from gethostname() syscall).
$content_length: Value of the Content-Length request header.
$content_type: Value of the Content-Type request header.
$is_args: ? if the request line has arguments, otherwise an empty string.
$request_length: Full request length including headers and body (bytes).
$request_body: The client request body (available if processed, might require client_body_in_single_buffer on;). Be cautious logging this due to size and potential sensitivity.
$request_body_file: Path to the temporary file holding the request body.

HTTP Headers & Cookies:

$http_<header_name>: Value of the specified request header <header_name>. Hyphens in the header name are replaced with underscores, and the name is lowercased (e.g., User-Agent becomes $http_user_agent, X-Forwarded-For becomes $http_x_forwarded_for).
$cookie_<cookie_name>: Value of the specified cookie <cookie_name> (e.g., $cookie_sessionid).

Response-Related Variables:

$status: Response status code (e.g., 200, 404, 503).
$body_bytes_sent: Number of bytes sent to the client in the response body (excluding headers).
$bytes_sent: Total number of bytes sent to the client (including headers).
$sent_http_<header_name>: Value of the specified response header <header_name>. Similar transformation rules as $http_ (underscores, lowercasing) apply (e.g., $sent_http_content_type, $sent_http_location).
$sent_trailer_<name>: Value of a trailer field sent in the response.

Connection & Client Variables:

$remote_addr: Client IP address.
$remote_port: Client port number.
$remote_user: Username supplied for HTTP Basic Authentication.
$server_addr: Address of the server which accepted the request.
$server_port: Port of the server which accepted the request.
$server_protocol: Protocol used for the request (e.g., HTTP/1.1, HTTP/2.0).
$connection: Connection serial number.
$connection_requests: Number of requests made through the current connection.
$binary_remote_addr: Client address in binary form (useful for certain modules).

Time-Related Variables:

$time_local: Local server time in the common log format (e.g., 15/Oct/2023:14:55:02 +0000).
$time_iso8601: Local server time in ISO 8601 standard format (e.g., 2023-10-15T14:55:02+00:00). Often preferred for machine parsing.
$msec: Current time in seconds with millisecond resolution (epoch time, e.g., 1697381702.123). Useful as a precise timestamp.
$request_time: Request processing time in seconds with millisecond resolution. Measured from the first byte received from the client to the last byte sent to the client. Crucial for performance monitoring.

Nginx Internal & Location Variables:

$nginx_version: Nginx version string.
$pid: Process ID (PID) of the Nginx worker process handling the request.
$pipe: p if the request was pipelined, . otherwise.
$limit_rate: Current rate limit set for the connection.
$server_name: The server_name of the virtual host that processed the request.

Upstream Variables (when using Nginx as a reverse proxy):

$upstream_addr: IP address and port (or socket path) of the upstream server(s) contacted. Multiple servers are comma-separated.
$upstream_status: Status code returned by the upstream server(s). Multiple codes are comma-separated.
$upstream_response_time: Time spent receiving the response from the upstream server(s), measured in seconds with millisecond resolution. For multiple upstreams, times are comma-separated. Crucial for diagnosing backend performance.
$upstream_response_length: Length of the response obtained from the upstream server(s).
$upstream_connect_time: Time spent establishing a connection with the upstream server(s).
$upstream_header_time: Time spent receiving the response header from the upstream server(s).
$upstream_cache_status: Shows whether the response was served from Nginx’s cache (HIT, MISS, EXPIRED, STALE, UPDATING, REVALIDATED, BYPASS).

SSL/TLS Variables (when using HTTPS):

$ssl_protocol: Protocol of the established SSL/TLS connection (e.g., TLSv1.2, TLSv1.3).
$ssl_cipher: Cipher suite used for the established SSL/TLS connection.
$ssl_client_s_dn: Subject DN (Distinguished Name) of the client certificate for established SSL connection.
$ssl_client_verify: Result of client certificate verification (SUCCESS, FAILED:reason, NONE).

GeoIP Module Variables (if GeoIP module is enabled and configured):

$geoip_country_code: Two-letter country code (e.g., US, GB).
$geoip_city: City name.
$geoip_region_name: Region or state name.
(Many others depending on the GeoIP database used)

Gzip Module Variables (if Gzip module is enabled):

$gzip_ratio: Achieved compression ratio (original size / compressed size).

This list is not exhaustive, but it covers many of the variables you’ll frequently use. The official Nginx documentation for the ngx_http_log_module and other modules (like ngx_http_upstream_module, ngx_http_ssl_module) provides the full list.

When constructing your log_format string, you simply include the desired variables prefixed with $. Nginx will substitute their values at request processing time.

Handling Missing Headers/Variables: Many variables (especially those derived from optional request headers like $http_referer or $http_user_agent, or authentication like $remote_user) will evaluate to a hyphen (-) or an empty string if the corresponding data is not present in the request. This is standard behavior in log formats like combined.

Getting the Real Client IP Behind a Proxy:
A very common requirement is logging the original client IP address when Nginx is behind a load balancer or reverse proxy. These proxies often add headers like X-Forwarded-For or X-Real-IP containing the original client IP.

You can log these directly:

nginx log_format proxy_aware '$http_x_forwarded_for - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" "$http_user_agent"';

However, the X-Forwarded-For header can contain a list of IPs (client, proxy1, proxy2). A more robust approach is to use the ngx_http_realip_module:

“`nginx
http {
# Define trusted proxy IPs/networks
set_real_ip_from 192.168.1.0/24;
set_real_ip_from 10.0.0.1;
# Specify which header contains the real IP
real_ip_header X-Forwarded-For;
# Optionally, use the rightmost IP if multiple exist in X-Forwarded-For
# real_ip_recursive on;

log_format realip_combined '$remote_addr - $remote_user [$time_local] "$request" '
                           '$status $body_bytes_sent "$http_referer" '
                           '"$http_user_agent"';

# ...
access_log /var/log/nginx/access.log realip_combined;

}
`` When therealipmodule is configured correctly, the$remote_addrvariable itself will be updated to contain the actual client IP extracted from the specified header (provided the request came from a trusted proxy). This allows you to keep using$remote_addr` in your log formats while getting the correct value.

4. Creating Your First Custom Log Format

Let’s create a practical custom log format that enhances the default combined format by adding request processing time and upstream response time, and uses the ISO 8601 timestamp format.

Step 1: Define the Format using log_format

Place this within the http block of your nginx.conf or a file included within the http block:

“`nginx
http {
# … other http settings …

# Define our custom format named 'enhanced'
log_format enhanced '$remote_addr - $remote_user [$time_iso8601] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent" '
                    'rt=$request_time urt=$upstream_response_time uct=$upstream_connect_time uht=$upstream_header_time';

# If behind a proxy, configure realip module (optional but recommended)
# set_real_ip_from 192.168.1.1;
# real_ip_header X-Forwarded-For;

# ... server blocks etc ...

}
“`

We named our format enhanced. Compared to combined, we changed $time_local to $time_iso8601 and added fields prefixed with descriptive labels:
* rt= for $request_time (total request time)
* urt= for $upstream_response_time (time waiting for upstream response)
* uct= for $upstream_connect_time (time connecting to upstream)
* uht= for $upstream_header_time (time waiting for upstream headers)

Using labels like rt= makes the log entries more self-descriptive, although it increases log size slightly. The upstream variables ($upstream_*) will log a hyphen (-) if the request wasn’t proxied (e.g., serving static files directly).

Step 2: Apply the Format using access_log

Now, apply this format within a server or location block where you want to use it. You can replace the default access_log directive or add a new one.

“`nginx
http {
# … log_format definition from Step 1 …

server {
    listen 80;
    server_name api.example.com;

    # Apply the 'enhanced' format to this server block
    access_log /var/log/nginx/api.example.com_access.log enhanced;

    location / {
        proxy_pass http://backend_api_server;
        # Standard proxy headers often needed by backends
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

server {
    listen 80;
    server_name static.example.com;

    # Maybe use a simpler format or the default 'combined' for static files
    access_log /var/log/nginx/static.example.com_access.log combined;

    location / {
        root /var/www/static_files;
        try_files $uri $uri/ =404;
    }
}

}
“`

In this setup:
* Requests to api.example.com will be logged to /var/log/nginx/api.example.com_access.log using our enhanced format, including the valuable upstream timing information.
* Requests to static.example.com will be logged to /var/log/nginx/static.example.com_access.log using the standard combined format.

Step 3: Test and Reload Configuration

Before applying changes to a production server, always test your Nginx configuration syntax:

bash sudo nginx -t

If the syntax is okay, reload Nginx gracefully to apply the changes without dropping connections:

“`bash
sudo systemctl reload nginx

or

sudo service nginx reload

or directly: sudo nginx -s reload

“`

Now, make some requests to api.example.com and static.example.com and check the contents of their respective log files (/var/log/nginx/*.log). You should see entries formatted according to the specified log_format.

An entry in api.example.com_access.log might look like:

1.2.3.4 - - [2023-10-15T15:10:30+00:00] "POST /v1/users HTTP/1.1" 201 55 "-" "MyAppClient/1.0" rt=0.152 urt=0.135 uct=0.002 uht=0.134

This clearly shows the request took 152ms total (rt=0.152), and most of that time (135ms) was spent waiting for the upstream backend (urt=0.135). The connection to upstream was fast (uct=0.002), and receiving upstream headers took 134ms (uht=0.134). This level of detail is invaluable for performance analysis.

5. Advanced Log Formatting Techniques

Beyond basic custom formats, Nginx offers more advanced capabilities.

5.1 Logging in JSON Format

Traditional space-separated or custom-delimited log formats can be challenging for automated tools to parse reliably, especially if fields like user agents or referrers contain unexpected characters or delimiters. JSON (JavaScript Object Notation) provides a structured, widely supported format that is much easier for machines to ingest and process. Log aggregation and analysis platforms like the ELK Stack (Elasticsearch, Logstash, Kibana), Graylog, Splunk, Datadog, etc., work exceptionally well with JSON logs.

To create JSON logs in Nginx, you use the log_format directive with the json escape type and structure your string as a JSON object.

Step 1: Define the JSON Format

“`nginx
http {
# … other settings …

log_format json_log escape=json
  '{'
    '"timestamp": "$time_iso8601", '
    '"client_ip": "$remote_addr", '
    '"real_ip": "$http_x_forwarded_for", ' # Log proxy header separately if needed
    '"request": "$request", '
    '"status": $status, ' # Status is numeric, no quotes needed in JSON
    '"body_bytes_sent": $body_bytes_sent, ' # Numeric
    '"request_time": $request_time, ' # Numeric
    '"http_referer": "$http_referer", '
    '"http_user_agent": "$http_user_agent", '
    '"upstream_addr": "$upstream_addr", '
    '"upstream_status": "$upstream_status", ' # Often numeric, but can be list, so keep as string
    '"upstream_response_time": "$upstream_response_time", ' # Can be list, keep as string or parse later
    '"server_name": "$server_name", '
    '"request_method": "$request_method", '
    '"uri": "$uri", '
    '"args": "$args"'
  '}';

# Configure realip module if necessary, $remote_addr will be correct then
# set_real_ip_from ...
# real_ip_header ...

# ... server blocks ...

}
“`

Key Points for JSON Formatting:

escape=json: This is crucial. It ensures that special characters within variable values (like quotes in the User-Agent) are properly escaped (" becomes \", etc.) to maintain valid JSON structure.
JSON Structure: The format string itself must be a valid JSON object structure. Each Nginx variable should correspond to a JSON value.
Quotes:
- JSON keys ("timestamp", "client_ip", etc.) must be enclosed in double quotes.
- Nginx variables that produce string values ($time_iso8601, $remote_addr, $request, etc.) should also be enclosed in double quotes within the format string. The escape=json will handle escaping any quotes inside the variable’s value.
- Nginx variables that produce numeric values ($status, $body_bytes_sent, $request_time) can be included without surrounding double quotes in the format string. This allows log analysis tools to treat them directly as numbers. Be careful though: if a variable might sometimes be non-numeric (e.g., $upstream_status could be - or a list), it’s safer to quote it as a string in the JSON format and handle parsing later.
Commas: Separate key-value pairs with commas, but ensure there’s no trailing comma after the last pair.
Whitespace: Whitespace (spaces, newlines) outside of quoted strings is ignored by JSON parsers but can make the log_format definition more readable.

Step 2: Apply the JSON Format

“`nginx
server {
listen 80;
server_name service.example.com;

# Use the json_log format defined in the http block
access_log /var/log/nginx/service_access.json json_log;

# ... rest of server config ...

}
“`

Resulting Log Entry:

A request logged using this format would look like this in /var/log/nginx/service_access.json:

json {"timestamp": "2023-10-15T15:30:45+00:00", "client_ip": "10.0.0.50", "real_ip": "1.2.3.4, 10.0.0.50", "request": "GET /api/data?limit=10 HTTP/1.1", "status": 200, "body_bytes_sent": 1024, "request_time": 0.045, "http_referer": "-", "http_user_agent": "curl/7.81.0", "upstream_addr": "192.168.10.10:8080", "upstream_status": "200", "upstream_response_time": "0.040", "server_name": "service.example.com", "request_method": "GET", "uri": "/api/data", "args": "limit=10"}

This structured format is immediately ready for ingestion into various log management systems.

5.2 Conditional Logging

Sometimes, you don’t want to log every single request. Common scenarios include:

Excluding health checks from main application logs.
Logging only specific error responses (e.g., 5xx errors).
Logging requests from specific IP addresses or user agents for debugging.

The if=<condition> parameter of the access_log directive allows for this. The condition is evaluated for each request, and logging only occurs if the condition is true (evaluates to a non-empty string that is not “0”).

Simple Example: Don’t Log Health Checks

“`nginx
server {
listen 80;
server_name app.example.com;

# Don't log requests to /healthz
location = /healthz {
    access_log off; # Disable logging specifically for this location
    return 200 "OK";
}

location / {
    # Log other requests normally
    access_log /var/log/nginx/app_access.log combined;
    proxy_pass http://app_backend;
}

}
`` This usesaccess_log off;` within the specific location, which is often the simplest way for exact path matches.

Using if= for More Complex Conditions:

The if= parameter provides more flexibility. However, using the if directive directly inside location blocks is generally discouraged (“if is evil”). A more robust and recommended approach for conditional logic affecting logging (and other directives) is to use the map directive.

Using map for Conditional Logging:

The map directive allows you to create a new variable whose value depends on the value of another variable. It’s typically defined in the http context.

Example: Don’t Log Static Assets or Health Checks

“`nginx
http {
# … log formats …

# Map request URI to a variable $loggable. 1 means log, 0 means don't log.
map $request_uri $loggable {
    ~*\.(jpg|jpeg|gif|png|css|js)$ 0; # Don't log common static files
    /healthz                      0; # Don't log health check endpoint
    /status                       0; # Don't log status endpoint
    default                       1; # Log everything else
}

server {
    listen 80;
    server_name www.example.com;

    # Apply the condition using the $loggable variable
    access_log /var/log/nginx/www_access.log combined if=$loggable;

    location / {
        # ... serve dynamic content or proxy ...
        proxy_pass http://main_app;
    }

    location ~* \.(jpg|jpeg|gif|png|css|js)$ {
        root /var/www/html/static;
        expires 1d;
        # No access_log needed here, it inherits the conditional one from server level
    }

    location = /healthz {
         # No access_log needed here either
         return 200 "OK";
    }
     location = /status {
         # No access_log needed here either
         stub_status;
    }
}

}
“`

In this example:
1. The map block checks the $request_uri.
2. If it matches the regular expression for common static file extensions (~*\.(jpg|...|js)$) or is exactly /healthz or /status, the $loggable variable is set to 0.
3. For any other URI (default), $loggable is set to 1.
4. The access_log directive in the server block uses if=$loggable. Only requests where $loggable is 1 will be written to the log file.

This map-based approach is generally considered cleaner and more performant than complex if structures within location blocks for conditional logging.

Example: Logging Only Errors

“`nginx
http {
# Map non-2xx/3xx status codes to $log_error. 1 means log, 0 means don’t.
map $status $log_error {
~^[23].. 0; # Don’t log 2xx or 3xx status codes
default 1; # Log everything else (1xx, 4xx, 5xx)
}

server {
    listen 80;
    server_name api.example.com;

    # Log only if $log_error is 1 (i.e., status is not 2xx/3xx)
    access_log /var/log/nginx/api_errors.log combined if=$log_error;

    # We probably still want a main log for all requests
    access_log /var/log/nginx/api_access.log combined;

    # ...
}

}
`` This uses twoaccess_log` directives: one conditional log for errors, and one unconditional log for all requests.

5.3 Multiple Access Logs

You can specify multiple access_log directives within the same context (http, server, location). This causes Nginx to write log entries to all specified destinations for matching requests.

Use Cases:

Different Formats: Log the same request using different formats to different files (e.g., a detailed JSON log for machine analysis and a simpler combined format for quick human checks).
Filtered Logs: Send all logs to a main file, and conditionally send a subset (e.g., errors, specific paths) to another file or system (like syslog).
Centralized and Local Logging: Log locally for immediate access and also send logs via syslog to a central collector.

Example:

“`nginx
http {
log_format combined_extra ‘$combined “$http_x_request_id”‘; # Add request ID
log_format json_detailed escape=json ‘{ … }’; # Your detailed JSON format

map $status $is_error {
    ~^[45].. 1; # 4xx or 5xx status
    default   0;
}

server {
    listen 80;
    server_name critical.example.com;

    # Log 1: Main log file using a custom 'combined_extra' format
    access_log /var/log/nginx/critical_access.log combined_extra buffer=128k flush=30s;

    # Log 2: Detailed JSON log file
    access_log /var/log/nginx/critical_access.json json_detailed buffer=1m flush=1m;

    # Log 3: Send only errors (4xx/5xx) via syslog using 'combined_extra' format
    access_log syslog:server=10.1.1.10:514,tag=nginx_errors combined_extra if=$is_error;

    location / {
        # Add a request ID header (requires ngx_http_random_index_module or similar)
        # Or better, use $request_id available in newer Nginx versions
        # proxy_set_header X-Request-ID $request_id; # Newer Nginx
        # set $reqid $pid-$msec-$connection; # Older way to approximate
        # proxy_set_header X-Request-ID $reqid;

        proxy_pass http://critical_backend;
    }
}

}
“`

In this setup, for every request to critical.example.com:
1. An entry is written to /var/log/nginx/critical_access.log using the combined_extra format, with buffering.
2. An entry is written to /var/log/nginx/critical_access.json using the json_detailed format, with potentially larger buffering.
3. If the response status is 4xx or 5xx ($is_error is 1), an entry is also sent via UDP syslog to 10.1.1.10:514 with the tag nginx_errors, using the combined_extra format.

Be mindful that each access_log directive adds overhead. Using multiple logs increases disk I/O and potentially CPU usage (especially with JSON formatting or compression).

5.4 Buffering and Flushing (`buffer`, `flush`, `gzip`)

As mentioned earlier, the buffer, flush, and gzip parameters on the access_log directive can significantly impact performance and resource usage.

buffer=<size>:
- Pros: Reduces the frequency of disk write operations, lowering disk I/O load and potentially improving overall Nginx performance, especially under high traffic. Disk writes happen in larger chunks.
- Cons: Log entries are delayed. If Nginx crashes unexpectedly, log entries still in the buffer might be lost. Requires extra memory for the buffer (per worker process).
- Recommendation: Generally recommended for high-traffic sites. Start with a moderate buffer size (e.g., 64k or 128k) and monitor performance and memory usage.
flush=<time>:
- Pros: Mitigates the log delay introduced by buffering. Ensures that logs are written periodically even if the buffer isn’t full, making recent entries available sooner for monitoring.
- Cons: Slightly increases the number of write operations compared to only flushing when the buffer is full.
- Recommendation: Use in conjunction with buffer. Choose a time that balances promptness with I/O efficiency (e.g., 5s to 1m).
gzip[=<level>]:
- Pros: Significantly reduces the size of log files on disk, saving storage space. Especially effective with text-based logs (including JSON).
- Cons: Consumes CPU cycles for compression. Adds a slight delay to log writing due to the compression step. Requires buffering (buffer must be enabled). Requires Nginx compiled with the ngx_http_gzip_module.
- Recommendation: Consider if disk space is a major constraint and you have spare CPU capacity. Test the impact on performance. A lower compression level (gzip=1) offers a good balance between compression ratio and CPU usage. Note that external tools like logrotate can also handle log compression, often preferred as it runs separately from Nginx worker processes.

Example with Buffering, Flushing, and Gzip:

nginx access_log /var/log/nginx/compressed_access.log.gz combined buffer=1m flush=5s gzip=1;

This logs using the combined format to a gzipped file, buffering up to 1MB of data, flushing buffered data every 5 seconds if not full, and using gzip compression level 1.

6. Error Log Customization (Limited)

Compared to the access log, the error_log directive offers less flexibility in formatting. You primarily control:

File Path: Where the errors are logged (file, stderr, syslog, memory).
Severity Level: Which messages get logged (debug, info, notice, warn, error, crit, alert, emerg).

Syntax: error_log <file> | stderr | syslog:server=<address>[,parameter=value] | memory:<size> [<level>];
Context: main, http, mail, stream, server, location

<file>: Path to the log file.
stderr: Log to standard error. Often used when running Nginx in containers.
syslog:server=...: Log to a syslog server.
memory:<size>: Log to an in-memory cyclic buffer (useful for debugging, accessible via gdb).
<level>: Minimum severity level to log (default is error).

Example:

“`nginx

Global default error log

error_log /var/log/nginx/error.log warn;

http {
# …
server {
listen 443 ssl;
server_name secure.example.com;

    # Override error log for this server, log info and higher
    error_log /var/log/nginx/secure_error.log info;

    location /debug-section {
        # Even more verbose logging for a specific location (requires debug build)
        error_log /var/log/nginx/debug_section_error.log debug;
        # ...
    }
}

}
“`

While you cannot easily change the structure of error log messages like you can with log_format for access logs, you can control their destination and verbosity effectively for different parts of your configuration.

7. Performance Considerations and Best Practices

Configuring logging effectively involves balancing the need for detailed information with performance and resource management.

Complexity vs. Performance: Very complex log_format strings involving many variables, especially those requiring significant computation (like regex captures within map directives used in the format, though less common), can add minor overhead to each request. Keep formats as simple as meets your needs.
Disk I/O: Logging is inherently I/O-bound.
- Use buffering (buffer parameter) on high-traffic sites to reduce disk write frequency.
- Ensure Nginx logs are written to fast storage (SSDs are highly recommended).
- Consider using syslog to send logs off-box, potentially reducing local disk I/O, but introducing network latency and reliance on the syslog collector.
- Avoid logging excessive data: Don’t log full request/response bodies unless absolutely necessary for debugging specific issues, and only temporarily.
JSON Logging Overhead: While incredibly useful, JSON formatting can slightly increase log volume compared to compact formats like combined due to keys and quotes. Compression (gzip or external logrotate) can mitigate this. The benefits of structured data usually outweigh the slight size increase.
Conditional Logging: Use map + if= effectively to filter out noise (health checks, static assets) from main application logs, reducing log volume and I/O.
Log Rotation: Nginx itself doesn’t rotate logs. Log files will grow indefinitely unless managed. Use standard system tools like logrotate (common on Linux) to periodically archive, compress, and delete old log files. Ensure logrotate is configured correctly to signal Nginx (USR1 signal) to reopen log files after rotation.
- A typical logrotate configuration for Nginx might include copytruncate or, preferably, a postrotate script that sends nginx -s reopen or kill -USR1 $(cat /var/run/nginx.pid). The latter is better as it avoids potential data loss during truncation.
Security and Privacy:
- Be extremely cautious about logging sensitive information like passwords, API keys, session tokens, or personally identifiable information (PII). Review your formats carefully.
- Avoid logging the full $request_body unless strictly necessary and ensure access to logs is tightly controlled.
- Scrub sensitive data if possible before logging, perhaps using Lua scripting with log_by_lua_block if complex filtering is needed (advanced topic).
Consistency: Use consistent log formats (especially JSON schemas) across your Nginx instances and applications where possible. This simplifies log aggregation, parsing, and analysis.
Monitoring Log Volume: Keep an eye on log generation rates and disk space usage. Adjust log levels, conditional logging, or sampling if necessary.
Testing: Always test configuration changes (nginx -t) before reloading (nginx -s reload). After reloading, monitor logs to ensure they are being written correctly and check Nginx error logs for any issues related to the new configuration.

8. Troubleshooting Common Issues

Syntax Errors: Running nginx -t is your first step. It will point out typos, incorrect directive placements, or invalid parameter values in your log_format or access_log lines. Pay close attention to quotes, semicolons, and valid variable names.
Permission Denied: Nginx worker processes (usually run as a non-root user like nginx or www-data) need write permissions on the log file and execute permissions on all parent directories leading to it. Check file and directory ownership and permissions (ls -l, namei -om /path/to/your/log/file.log).
Logs Not Appearing:
- Check permissions (see above).
- Verify the access_log path is correct.
- If using conditional logging (if=), ensure the condition evaluates correctly for the requests you expect to be logged. Add the condition variable temporarily to the log format to debug its value.
- If using buffering (buffer=), there might be a delay. Wait for the buffer to fill or the flush interval to pass. Try temporarily removing buffering to see if logs appear immediately.
- Check if access_log off; is specified in a higher-level or conflicting block.
- Ensure the Nginx configuration was actually reloaded (nginx -s reload).
Incorrect Data / Hyphens (-) in Logs:
- A hyphen often means the variable had no value (e.g., $http_referer when no referer header was sent, $upstream_response_time for non-proxied requests, $remote_user when no auth). This is often normal.
- Check for typos in variable names ($request_time vs $request_ime). Invalid variable names often evaluate to an empty string or their literal name, depending on context.
- Ensure the necessary modules are active (e.g., realip module for correct $remote_addr behind proxy, upstream module context for $upstream_* variables).
JSON Logs Invalid:
- Ensure you used escape=json in the log_format definition.
- Double-check the JSON structure: proper quoting of keys and string values, correct comma placement, balanced braces {}. Validate a sample log entry using a JSON validator.
High CPU Usage after Changes:
- If you enabled gzip logging, this might be the cause. Try lowering the level or disabling it.
- Very complex map directives or regexes used in conditions or formats could contribute, but this is less common.

Always check the Nginx error_log for clues when troubleshooting access log issues. It might contain messages about file permission problems or configuration errors encountered during request processing.

9. Integration with Log Analysis Tools

The primary reason for investing time in custom log formatting, especially JSON, is to facilitate easier integration with log management and analysis platforms.

ELK Stack (Elasticsearch, Logstash, Kibana): Logstash’s json filter plugin can directly parse Nginx JSON logs. Filebeat (an Elastic agent) can harvest log files and forward them to Logstash or directly to Elasticsearch. Kibana provides powerful visualization and analysis capabilities.
Graylog: Has built-in JSON parsers and can easily ingest structured Nginx logs sent via Filebeat, syslog, or other collectors.
Splunk: Uses forwarders to collect log data. Properly formatted JSON logs are easily parsed by Splunk, mapping JSON fields to Splunk index fields.
Datadog, Grafana Loki, Fluentd, etc.: Most modern observability and log aggregation tools have robust support for JSON log ingestion.

Using a consistent, structured format like JSON significantly reduces the complexity of setting up parsing rules in these tools, making data readily available for dashboards, alerting, and ad-hoc analysis. Key metrics like request rates, error rates (4xx/5xx counts), request latency ($request_time), upstream latency ($upstream_response_time), and traffic volume ($body_bytes_sent) become trivial to track and visualize.

Conclusion

Nginx logging is a powerful feature that transforms raw server activity into actionable insights. While the default combined format provides a starting point, mastering the log_format and access_log directives unlocks the ability to tailor logs precisely to your operational needs.

We’ve journeyed from understanding the default logs to defining custom formats using a wide array of Nginx variables. We explored the significant advantages of structured JSON logging for automated analysis and the techniques for conditional logging to reduce noise. We also touched upon performance implications, best practices like log rotation and security awareness, and common troubleshooting steps.

By leveraging custom Nginx log formats, you can:

Enhance Debugging: Capture specific headers, timing details, or upstream information crucial for diagnosing problems.
Improve Monitoring: Track key performance indicators like request latency, upstream performance, and error rates accurately.
Streamline Analysis: Produce logs in structured formats (like JSON) that are easily ingested and queried by log management platforms.
Optimize Resources: Use buffering, compression, and conditional logging to manage disk I/O, storage space, and CPU usage effectively.

Investing time in configuring your Nginx logs thoughtfully is an investment in the observability, stability, and performance of your web applications. Start with small additions to the default format, experiment with JSON, filter out noise, and build a logging strategy that provides clear visibility into how your Nginx instances are serving your users. The detailed information captured in well-formatted logs is often the key to quickly resolving issues and optimizing performance in complex web environments.

Mastering Nginx Logs: A Comprehensive Guide to Custom Log Formatting

1. Understanding Nginx’s Default Logging Behavior

1.1 Default Access Log

1.2 Default Error Log

2. The Core Directives: log_format and access_log

2.1 The log_format Directive

2.2 The access_log Directive

3. Nginx Variables: The Building Blocks of Custom Formats

4. Creating Your First Custom Log Format

or

or directly: sudo nginx -s reload

5. Advanced Log Formatting Techniques

5.1 Logging in JSON Format

5.2 Conditional Logging

5.3 Multiple Access Logs

5.4 Buffering and Flushing (buffer, flush, gzip)

6. Error Log Customization (Limited)

Global default error log

7. Performance Considerations and Best Practices

8. Troubleshooting Common Issues

9. Integration with Log Analysis Tools

Conclusion

Leave a Comment Cancel Reply

2. The Core Directives: `log_format` and `access_log`

2.1 The `log_format` Directive

2.2 The `access_log` Directive

5.4 Buffering and Flushing (`buffer`, `flush`, `gzip`)