This is how you should write and collect logs from Docker containers

25 Aug 2025 - 5 min read

Traditionally, applications write logs to one or more files on a persistent disk, like /var/log/app.log. This approach is simple and effective for applications running on a stable, long-lived server.

However, Docker fundamentally changes this paradigm. Containers are designed to be ephemeral and immutable. They can be stopped, destroyed, and replaced at any moment. Storing log files inside a container is a bad practice because the logs will be lost forever when the container is removed.

The modern best practice, aligned with the Twelve-Factor App methodology, is for containerized applications to treat logs as event streams. Instead of writing to a file, the application writes its logs to the standard output (stdout) and standard error (stderr) streams. The containerization platform, Docker, is then responsible for capturing these streams.

Where does Docker store the logs?

Since Docker captures the log streams, where do they go?

By default, they are stored on the host machine using the json-file driver. You can view these logs with docker logs <container-id>.

But the json-file driver is not ideal for production systems because:

  • Logs are siloed on individual host machines, making it difficult to get a unified view of a distributed application.
  • Storage is not managed, which can lead to disks filling up.
  • Analysis is difficult without a centralized query and visualization tool.

This is where log forwarding becomes essential. A log forwarding mechanism is responsible for collecting logs from all containers across all hosts, processing them, and sending them to a centralized logging backend for storage, analysis, and alerting. This decouples the log management from the application's lifecycle and the host machine's state.

Log forwarding mechanisms for Docker

There are several common patterns for forwarding Docker logs to a central location.

1. Docker Logging Drivers

This is the most direct method. You can configure the Docker daemon to use a specific logging driver that sends logs directly from the container's stdout/stderr streams to a logging backend. Each container can be configured to use a specific driver.

Pros:
* Simple to configure.
* High performance as it's built into the Docker daemon.

Cons:
* Limited flexibility in parsing and enriching logs.
* Requires daemon-level configuration, which can be rigid.

Common drivers include fluentd (for Fluentd), gelf (for Graylog), awslogs (for AWS CloudWatch), and syslog.

2. The Sidecar Pattern

In this pattern, a dedicated logging agent container (a "sidecar") runs alongside each application container. The application container might write logs to a file on a shared volume, and the sidecar container's only job is to tail that file and forward its contents to the logging backend.

Pros:
* Decouples logging logic from the application code.
* Allows for application-specific logging configurations.

Cons:
* Increases resource overhead, as you run an extra container for each application instance.
* Can be complex to manage deployment configurations.

3. Dedicated Agent per Node

This is the most popular and efficient approach for large-scale deployments. A single logging agent container is deployed on each host machine (e.g., as a DaemonSet in Kubernetes). This agent is responsible for collecting logs from all containers running on that host. It typically does this by accessing the log files that the Docker daemon creates locally (usually in /var/lib/docker/containers/).

Pros:
* Resource-efficient: Only one agent per host.
* Centralized configuration: Easy to manage and update the agent for the entire host.
* Automatic discovery: Automatically picks up logs from new containers started on the host.

Cons:
* Requires host-level access (mounting the Docker log directory).

How popular logging tools do it

Different open-source logging stacks leverage these mechanisms to collect container logs.

  • ELK Stack (Elasticsearch, Logstash, Kibana)
    The most common forwarder for the ELK stack is Filebeat. It is typically deployed using the dedicated agent per node pattern. Filebeat is a lightweight agent that runs on each Docker host, discovers container logs, enriches them with metadata (like container name and labels), and forwards them to either Logstash for further processing or directly to Elasticsearch for indexing.

  • Loki
    Loki uses its own purpose-built agent called Promtail. Like Filebeat, Promtail is designed to be run as a dedicated agent per node. Its primary function is to discover container logs, attach labels (especially from Kubernetes), and push the log streams to a central Loki instance. Its design is highly efficient and tightly integrated with the Prometheus ecosystem.

  • Graylog
    Graylog is very flexible but has a strong affinity for the GELF (Graylog Extended Log Format). The recommended method is often to configure Docker to use the built-in gelf logging driver. This allows the Docker daemon to send logs directly to a Graylog input over the network without needing a separate agent. Alternatively, Beats agents (like Filebeat) can also be used to forward logs to Graylog.

  • Fluentd / Fluent Bit
    Fluentd is the "Swiss Army knife" of logging and can be used in every pattern.

    • Logging Driver: Docker has a native fluentd driver, allowing the daemon to forward logs directly to a Fluentd aggregator.
    • Dedicated Agent: The lightweight version, Fluent Bit, is excellent for the dedicated agent per node pattern. It collects, parses, and forwards logs from all containers on a host.
    • Sidecar: Fluentd can also be used as a sidecar for complex, application-specific log processing needs.

Choosing the right forwarding mechanism depends on your architecture's complexity, performance needs, and the specific logging backend you use.

But, never create log files in Docker container. Always write logs to stdout and stderr and collect the logs with any of the three methods.

  1. Docker Logging Drivers
  2. The Sidecar Pattern
  3. Dedicated Agent per Node

However, the dedicated agent per node pattern has emerged as the most scalable and manageable solution for most modern containerized environments.

Read the next post in your Inbox