How to Set Up Monitoring With Prometheus and Grafana (2026)
Complete guide to setting up Prometheus and Grafana monitoring with Docker Compose in 2026. Includes prometheus.yml config, alerting rules, and Grafana dashboard setup.
Get more content like this on Telegram!
Daily AI tips, notes & resources ā free
I've been on the receiving end of 3am pages for production incidents, and every single time I wish I had better metrics. Once you've spent an hour trying to diagnose whether a memory leak or a traffic spike caused your app to die ā with nothing but logs and gut feeling ā you get motivated to set up proper monitoring.
Prometheus and Grafana have become the default open-source monitoring stack for good reasons. Prometheus has excellent operational characteristics, an expressive query language (PromQL), and a huge exporter ecosystem. Grafana turns raw metrics into dashboards that actually communicate information.
This guide sets up the full stack with Docker Compose. You'll have metrics flowing and dashboards working by the end.
Architecture Overview
Before touching any config, understand how data flows:
Your App ā exposes /metrics endpoint
ā
Prometheus ā scrapes /metrics every N seconds, stores in TSDB
ā
Grafana ā queries Prometheus via PromQL, renders dashboards
ā
Alertmanager ā routes alerts to Slack/PagerDuty/email
Prometheus pulls metrics from targets on a schedule (default 15 seconds). This "pull" model is different from traditional monitoring that pushes metrics to a central server. Pull makes it easy to know when a target is down ā if Prometheus can't scrape it, the target is up to something.
Project Structure
monitoring/
āāā docker-compose.yml
āāā prometheus/
ā āāā prometheus.yml
ā āāā alert-rules.yml
āāā grafana/
ā āāā provisioning/
ā ā āāā datasources/
ā ā ā āāā prometheus.yml
ā ā āāā dashboards/
ā ā āāā dashboard.yml
ā āāā dashboards/
ā āāā app-overview.json
āāā app/
āāā index.js
āāā package.json
Step 1: The Sample Application With Metrics
Let's build a Node.js app that exposes Prometheus metrics. Install the official Prometheus client:
mkdir -p monitoring/app && cd monitoring/app
npm init -y
npm install express prom-client
app/index.js:
const express = require('express');
const promClient = require('prom-client');
const app = express();
// Collect default Node.js metrics (memory, CPU, event loop, etc.)
const register = new promClient.Registry();
promClient.collectDefaultMetrics({ register });
// Custom metrics
const httpRequestDuration = new promClient.Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2, 5],
registers: [register],
});
const httpRequestsTotal = new promClient.Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status_code'],
registers: [register],
});
const activeConnections = new promClient.Gauge({
name: 'active_connections',
help: 'Current number of active connections',
registers: [register],
});
// Middleware to track request metrics
app.use((req, res, next) => {
const start = Date.now();
activeConnections.inc();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
const labels = {
method: req.method,
route: req.path,
status_code: res.statusCode,
};
httpRequestDuration.observe(labels, duration);
httpRequestsTotal.inc(labels);
activeConnections.dec();
});
next();
});
// Application routes
app.get('/', (req, res) => {
res.json({ status: 'ok', message: 'Hello World' });
});
app.get('/slow', async (req, res) => {
// Simulate slow endpoint
await new Promise(resolve => setTimeout(resolve, Math.random() * 2000));
res.json({ status: 'ok', message: 'Slow response' });
});
// Prometheus metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.send(await register.metrics());
});
app.listen(3000, () => console.log('App running on :3000'));
The three metric types matter:
- Counter ā only goes up (request count, error count)
- Gauge ā goes up and down (active connections, memory usage, temperature)
- Histogram ā samples observations in buckets (request duration, response size)
Test that the metrics endpoint works:
curl http://localhost:3000/metrics
# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
# http_requests_total{method="GET",route="/",status_code="200"} 3
# ...
Step 2: Prometheus Configuration
prometheus/prometheus.yml:
global:
scrape_interval: 15s # How often to scrape targets
evaluation_interval: 15s # How often to evaluate alert rules
scrape_timeout: 10s
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
rule_files:
- "/etc/prometheus/alert-rules.yml"
scrape_configs:
# Prometheus monitors itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Our Node.js application
- job_name: 'node-app'
static_configs:
- targets: ['app:3000']
metrics_path: '/metrics'
# System metrics from the Docker host
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
# Docker container metrics
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
prometheus/alert-rules.yml:
groups:
- name: application
rules:
# Alert if request error rate exceeds 5%
- alert: HighErrorRate
expr: |
rate(http_requests_total{status_code=~"5.."}[5m])
/ rate(http_requests_total[5m]) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value | humanizePercentage }} over the last 5 minutes"
# Alert if 95th percentile response time exceeds 1 second
- alert: SlowRequests
expr: |
histogram_quantile(0.95,
rate(http_request_duration_seconds_bucket[5m])
) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Slow request latency"
description: "p95 latency is {{ $value }}s"
- name: infrastructure
rules:
# Alert if memory usage exceeds 85%
- alert: HighMemoryUsage
expr: |
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage is {{ $value | humanizePercentage }}"
# Alert if CPU usage exceeds 80% for 10 minutes
- alert: HighCpuUsage
expr: |
100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 10m
labels:
severity: warning
annotations:
summary: "High CPU usage"
Step 3: Docker Compose Setup
docker-compose.yml:
version: '3.9'
services:
app:
build: ./app
ports:
- "3000:3000"
restart: unless-stopped
networks:
- monitoring
prometheus:
image: prom/prometheus:v2.51.0
volumes:
- ./prometheus:/etc/prometheus
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=15d'
- '--web.enable-lifecycle' # Allows config reload via API
- '--web.enable-admin-api'
ports:
- "9090:9090"
restart: unless-stopped
networks:
- monitoring
grafana:
image: grafana/grafana:10.4.0
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
- ./grafana/dashboards:/var/lib/grafana/dashboards
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD:-changeme}
- GF_USERS_ALLOW_SIGN_UP=false
- GF_SERVER_ROOT_URL=http://localhost:3001
ports:
- "3001:3000"
restart: unless-stopped
networks:
- monitoring
depends_on:
- prometheus
alertmanager:
image: prom/alertmanager:v0.27.0
volumes:
- ./alertmanager:/etc/alertmanager
command:
- '--config.file=/etc/alertmanager/config.yml'
ports:
- "9093:9093"
restart: unless-stopped
networks:
- monitoring
node-exporter:
image: prom/node-exporter:v1.8.0
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
ports:
- "9100:9100"
restart: unless-stopped
networks:
- monitoring
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.49.1
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8080:8080"
restart: unless-stopped
networks:
- monitoring
volumes:
prometheus_data:
grafana_data:
networks:
monitoring:
driver: bridge
Step 4: Grafana Data Source Provisioning
grafana/provisioning/datasources/prometheus.yml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: true
jsonData:
timeInterval: "15s"
queryTimeout: "60s"
grafana/provisioning/dashboards/dashboard.yml:
apiVersion: 1
providers:
- name: default
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 30
options:
path: /var/lib/grafana/dashboards
Step 5: Starting Everything
cd monitoring
docker compose up -d
# Check all services are healthy
docker compose ps
Access:
- Grafana: http://localhost:3001 (admin/changeme)
- Prometheus: http://localhost:9090
- Alertmanager: http://localhost:9093
Step 6: Grafana Dashboard
In Grafana, go to Dashboards ā New ā Import. Enter dashboard ID 1860 for the Node Exporter Full dashboard ā it's the most downloaded Prometheus dashboard and gives you comprehensive system metrics immediately.
For application metrics, create a new dashboard with these PromQL queries:
Request rate:
rate(http_requests_total[5m])
Error rate percentage:
rate(http_requests_total{status_code=~"5.."}[5m])
/ rate(http_requests_total[5m]) * 100
p95 response time:
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
Memory usage:
process_resident_memory_bytes / 1024 / 1024
Monitoring Tool Comparison
| Tool | Type | Cost | Setup | Scale | Best For |
|---|---|---|---|---|---|
| Prometheus + Grafana | OSS | Free | Medium | Large | Full control, multi-source |
| Datadog | SaaS | $15ā23/host/mo | Easy | Very large | Enterprise, turnkey |
| New Relic | SaaS | Free tier / $25+/mo | Easy | Large | Full observability suite |
| Grafana Cloud | SaaS (managed) | Free tier / $8+/mo | Easy | Large | Managed Prometheus |
| CloudWatch | AWS native | Pay per metric | Easy (AWS) | Large | AWS-only stacks |
| VictoriaMetrics | OSS | Free | Medium | Very large | High-cardinality metrics |
I've run Prometheus in production at scale. The operational overhead is real but manageable if your team understands it. For teams that don't want to manage storage and retention, Grafana Cloud's free tier is surprisingly generous ā 10,000 series, 14 days retention, and it accepts remote write from your Prometheus instance.
Reload Config Without Restart
One Prometheus feature I appreciate: you can reload the configuration without restarting, which avoids a metrics gap:
curl -X POST http://localhost:9090/-/reload
This works because we started Prometheus with --web.enable-lifecycle.
Connecting Monitoring to Your Stack
Monitoring doesn't exist in isolation. If you're running microservices, centralized logging with ELK and Loki pairs naturally with Prometheus metrics ā metrics tell you something is wrong, logs tell you why.
For the infrastructure that runs your monitored applications, Terraform vs Pulumi vs CloudFormation covers setting up that infrastructure as code. And if your CI/CD pipeline deploys the app, CI/CD best practices explains how to wire monitoring health checks into deployment gates.
For backend developers building the applications being monitored, Node.js vs Go vs Python compares how each language exposes metrics and handles the observability instrumentation.
Conclusion
You now have a working Prometheus and Grafana monitoring setup with real application metrics, system metrics, container metrics, alerting rules, and automatic dashboard provisioning. That's production-grade observability from a docker-compose up command.
The setup takes maybe 30 minutes to deploy. The payoff is significant ā when something breaks at 3am, you'll have the data to diagnose it in minutes rather than hours. Invest in monitoring before you need it, not after.
FAQ
How much data does Prometheus store and how long does it keep it? By default, Prometheus stores 15 days of data and uses roughly 1ā2 bytes per sample. A typical setup scraping 10,000 metrics every 15 seconds generates about 5ā10 GB per month. For long-term storage, Prometheus supports remote write to Thanos, Cortex, VictoriaMetrics, or Grafana Mimir. Most production setups keep 2 weeks in Prometheus and send to remote storage for 6ā24 months.
What is the difference between Prometheus and Grafana? Prometheus is a time-series database and metrics collection system. It scrapes metrics from targets (your applications and infrastructure), stores them, and provides a query language (PromQL) to query them. Grafana is a visualization tool ā it queries data sources like Prometheus and displays the data as dashboards with graphs, charts, and alerts. You need both: Prometheus collects and stores, Grafana visualises.
Can Prometheus monitor non-containerised applications? Absolutely. Prometheus was designed before containers were common. Any application that exposes an HTTP endpoint returning metrics in the Prometheus text format can be scraped. For applications you can't modify, exporters bridge the gap ā node_exporter for system metrics, postgres_exporter for PostgreSQL, redis_exporter for Redis, etc. There are hundreds of official and community exporters available.
Frequently Asked Questions
AiTechWorlds Team
ā Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
7 Logging Strategies for Microservices (ELK, Loki, Fluentd)
Centralized logging for microservices: compare ELK, Loki, Fluentd, and Datadog with real configs, cost breakdown, and 7 battle-tested strategies.
5 CI/CD Pipeline Best Practices (GitHub Actions and GitLab CI)
5 proven CI/CD best practices for GitHub Actions and GitLab CI in 2026. YAML examples, comparison table, and common mistakes that silently break your pipelines.
Docker for Backend Developers: Containerize Your API (2026)
A practical Docker tutorial for backend developers ā Dockerfile, docker-compose with a database, multi-stage builds, and when to use Docker vs bare metal vs Kubernetes.
Docker for Beginners: Learn Containers in 1 Hour (2026)
Learn Docker from scratch in 2026. Understand containers vs images, write your first Dockerfile, and master essential commands in under an hour.