Health Check API

Health check endpoints for monitoring and Kubernetes probes.

Endpoints

Liveness Probe

Check if the application is alive and running.

GET /health
GET /health/live

Authentication: None (public endpoint)

Response (200 OK):

{
  "status": "ok",
  "timestamp": "2024-01-01T00:00:00.000Z"
}

Use Case: Kubernetes liveness probe

Kubernetes Configuration:

livenessProbe:
  httpGet:
    path: /health/live
    port: 3000
  initialDelaySeconds: 10
  periodSeconds: 30
  timeoutSeconds: 5
  failureThreshold: 3

When to Use:

Pod health monitoring
Load balancer health checks
Basic uptime monitoring

Example:

curl http://localhost:3000/health

Readiness Probe

Check if the application is ready to accept traffic.

GET /health/ready

Authentication: None (public endpoint)

Response (200 OK - All services healthy):

{
  "status": "ok",
  "timestamp": "2024-01-01T00:00:00.000Z",
  "info": {
    "database": {
      "status": "up"
    },
    "redis": {
      "status": "up"
    },
    "clickhouse": {
      "status": "up"
    }
  },
  "error": {},
  "details": {
    "database": {
      "status": "up"
    },
    "redis": {
      "status": "up"
    },
    "clickhouse": {
      "status": "up"
    }
  }
}

Response (503 Service Unavailable - One or more services down):

{
  "status": "error",
  "timestamp": "2024-01-01T00:00:00.000Z",
  "info": {
    "database": {
      "status": "up"
    },
    "redis": {
      "status": "down",
      "message": "Connection refused"
    },
    "clickhouse": {
      "status": "up"
    }
  },
  "error": {
    "redis": {
      "status": "down",
      "message": "Connection refused"
    }
  },
  "details": {
    "database": {
      "status": "up"
    },
    "redis": {
      "status": "down",
      "message": "Connection refused"
    },
    "clickhouse": {
      "status": "up"
    }
  }
}

Checks:

Service	Check	Impact if Down
Database (PostgreSQL)	Connection + ping	Cannot process payments, queries fail
Redis	Connection + ping	No caching, distributed locking fails
ClickHouse	Connection + ping	Audit logging fails (non-critical)

Use Case: Kubernetes readiness probe

Kubernetes Configuration:

readinessProbe:
  httpGet:
    path: /health/ready
    port: 3000
  initialDelaySeconds: 15
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

When to Use:

Kubernetes pod readiness
Rolling deployment health checks
Pre-production smoke tests
Dependency validation

Example:

curl http://localhost:3000/health/ready

Health Check Behavior

Startup Sequence

Application starts
- /health/live returns 200 OK (process is running)
- /health/ready returns 503 Service Unavailable (dependencies not checked yet)
Dependencies initializing
- Database connecting...
- Redis connecting...
- ClickHouse connecting...
All dependencies up
- /health/live returns 200 OK
- /health/ready returns 200 OK (ready to accept traffic)

Dependency Failure

If any dependency goes down during runtime:

Redis fails
- /health/live: 200 OK (application still running)
- /health/ready: 503 Service Unavailable
- Kubernetes removes pod from service load balancer
- No new traffic routed to this pod
- Existing requests may fail
Application auto-recovery
- Redis reconnects automatically (retry logic)
- /health/ready: 200 OK
- Kubernetes adds pod back to service

Monitoring Integration

Prometheus

Health check metrics are exported at /metrics:

# Application uptime
http_health_check_total{endpoint="/health/live",status="ok"} 1500
http_health_check_total{endpoint="/health/ready",status="ok"} 1498
http_health_check_total{endpoint="/health/ready",status="error"} 2

# Dependency status
dependency_health{service="database",status="up"} 1
dependency_health{service="redis",status="up"} 1
dependency_health{service="clickhouse",status="up"} 1

Grafana Dashboard

Sample queries:

Application Uptime:

up{job="billing-backend"}

Dependency Health:

dependency_health{service="database"}
dependency_health{service="redis"}
dependency_health{service="clickhouse"}

Health Check Failures:

rate(http_health_check_total{status="error"}[5m])

Troubleshooting

Readiness probe failing

Symptom: /health/ready returns 503

Possible Causes:

Database connection issues

# Check database connectivity
docker-compose logs postgres

# Verify connection string
echo $DATABASE_URL

# Test connection
psql $DATABASE_URL -c "SELECT 1"

Redis connection issues

# Check Redis status
docker-compose logs redis

# Test connection
redis-cli -h localhost -p 6379 ping

ClickHouse connection issues

# Check ClickHouse status
docker-compose logs clickhouse

# Test connection
curl http://localhost:8123/ping

Liveness probe failing

Symptom: /health/live not responding or returns error

Possible Causes:

Application crashed

# Check application logs
docker-compose logs billing-backend

# Restart application
docker-compose restart billing-backend

Port not accessible

# Check if port is listening
netstat -an | grep 3000

# Check firewall rules
sudo iptables -L

High load/deadlock

# Check CPU/memory usage
docker stats

# Check for deadlocks in logs
docker-compose logs billing-backend | grep -i deadlock

Best Practices

For Development

Quick health check:

curl -f http://localhost:3000/health || echo "Service is down!"

Detailed dependency check:

curl -s http://localhost:3000/health/ready | jq

Watch health status:

watch -n 5 'curl -s http://localhost:3000/health/ready | jq .status'

For Production

Liveness Probe:
- Set initialDelaySeconds to allow app startup (10-30s)
- Use short periodSeconds (10-30s)
- Allow 2-3 failures before restart
Readiness Probe:
- Set higher initialDelaySeconds (20-60s for dependencies)
- Use short periodSeconds (5-10s)
- Remove from load balancer quickly (1-2 failures)
Monitoring:
- Alert on liveness failures (immediate)
- Alert on readiness failures (> 5 minutes)
- Track health check response times
- Monitor dependency health trends

Example Scripts

Health Check Script

#!/bin/bash
# health-check.sh

API_URL="${API_URL:-http://localhost:3000}"

echo "Checking liveness..."
if curl -f -s "${API_URL}/health/live" > /dev/null; then
  echo "Liveness check passed"
else
  echo "Liveness check failed"
  exit 1
fi

echo "Checking readiness..."
READY_RESPONSE=$(curl -s "${API_URL}/health/ready")
READY_STATUS=$(echo "$READY_RESPONSE" | jq -r '.status')

if [ "$READY_STATUS" = "ok" ]; then
  echo "Readiness check passed"
  echo "$READY_RESPONSE" | jq '.info'
else
  echo "Readiness check failed"
  echo "$READY_RESPONSE" | jq '.error'
  exit 1
fi

echo "All health checks passed"

Dependency Monitor

#!/bin/bash
# monitor-dependencies.sh

API_URL="${API_URL:-http://localhost:3000}"

while true; do
  RESPONSE=$(curl -s "${API_URL}/health/ready")

  DB_STATUS=$(echo "$RESPONSE" | jq -r '.details.database.status')
  REDIS_STATUS=$(echo "$RESPONSE" | jq -r '.details.redis.status')
  CH_STATUS=$(echo "$RESPONSE" | jq -r '.details.clickhouse.status')

  clear
  echo "=== Dependency Health Monitor ==="
  echo "Database:   $DB_STATUS"
  echo "Redis:      $REDIS_STATUS"
  echo "ClickHouse: $CH_STATUS"
  echo ""
  echo "Last check: $(date)"

  sleep 5
done

Endpoints​

Liveness Probe​

Readiness Probe​

Health Check Behavior​

Startup Sequence​

Dependency Failure​

Monitoring Integration​

Prometheus​

Grafana Dashboard​

Troubleshooting​

Readiness probe failing​

Liveness probe failing​

Best Practices​

For Development​

For Production​

Example Scripts​

Health Check Script​

Dependency Monitor​