Scaling Microservices With Docker and Kubernetes on Production

This is a guide to scaling FastAPI microservices with Docker and K3s on Azure, covering service discovery, load balancing, observability, and optimized pipelines.

Ravi Teja Thutari

May. 23, 25 · Tutorial

Likes (5)

Comment

Save

4.3K Views

I am building a fleet of Python FastAPI microservices and aiming to scale them efficiently in a production environment. Each service runs inside a Docker container, which keeps it isolated and portable. I’m orchestrating these containers with Kubernetes, specifically using the lightweight K3s distribution on the Azure cloud.

In this article, I share my hands-on experience optimizing this setup for high performance and reliability in production.

Containerizing FastAPI Microservices

Containerization is a cornerstone of my microservices strategy. I package each FastAPI service with Docker, bundling its code and dependencies into an image. This ensures a modular, conflict-free architecture — each service effectively “lives” in its own environment, preventing dependency clashes and simplifying deployment. For example, a typical Dockerfile for one of my FastAPI microservices looks like:

    Dockerfile
   
 

   FROM python:3.10-slim  
WORKDIR /app  
COPY requirements.txt .  
RUN pip install --no-cache-dir -r requirements.txt  
COPY . .  
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]  
  

This produces a lean image: I start from a slim Python base, install only the required packages, copy in the application code, and launch Uvicorn (the ASGI server for FastAPI). Running each service as a container means I can deploy many instances in parallel without worrying that one service’s dependencies will interfere with another’s.

Why K3s on Azure?

Kubernetes is my go-to for managing containers at scale, but instead of using a heavyweight managed cluster, I chose K3s. K3s is a CNCF-certified Kubernetes distribution known for its lightweight footprint (a single binary under 100 MB) and minimal resource overhead.

By deploying K3s on Azure virtual machines, I get the benefits of Kubernetes (like self-healing, service discovery, and scaling) without the full complexity of a large cluster. K3s’s slim design means the control plane uses less memory and CPU, leaving more room for my FastAPI workloads.

Architecture Overview

Figure: High-level architecture of the microservices on K3s

An Azure load balancer directs traffic through an ingress (Traefik) to various FastAPI service pods running in the K3s cluster. Each service can scale horizontally, and services communicate internally with a shared database for persistence.

I have an Azure Load Balancer in front of the cluster that directs external traffic to an ingress controller within K3s. Out of the box, K3s comes with Traefik as the default ingress, which I use to route requests to the appropriate FastAPI service. Each microservice is deployed as a Kubernetes Deployment with multiple replicas (pods).

This architecture allows any service to scale independently based on demand. The diagram above illustrates how incoming requests hit the load balancer and then flow through Traefik to the appropriate service pods.

Kubernetes Deployments and Health Checks

Deploying a FastAPI service on K3s involves writing a Kubernetes manifest. Here’s a snippet of a deployment (with health checks) for one service:

    YAML
   
 

   spec:  
  replicas: 3  
  template:  
    spec:  
      containers:  
      - name: service-a  
        image: myregistry/service-a:1.0  
        livenessProbe:  
          httpGet: { path: /health, port: 80 }  
          initialDelaySeconds: 30  
          periodSeconds: 15  
        readinessProbe:  
          httpGet: { path: /health, port: 80 }  
          initialDelaySeconds: 5  
          periodSeconds: 5  
  

I start with replicas: 3 to run three instances of the service. The livenessProbe and readinessProbe are crucial for a robust system. The liveness probe periodically hits the /health endpoint — if a service fails this check, Kubernetes automatically restarts that container to recover it. The readiness probe ensures a new pod only starts receiving traffic after it reports healthy. These probes prevent sending requests to unready or hung instances, improving overall resilience.

Autoscaling and Performance

One big advantage of Kubernetes (including K3s) is easy horizontal scaling. I leverage the Horizontal Pod Autoscaler (HPA) to adjust the number of pod replicas based on load. I configure an HPA to keep CPU usage around 70%. If traffic spikes and CPU usage goes above that threshold, Kubernetes spawns additional pods to handle the load, then scales down when the load subsides.

FastAPI’s asynchronous model lets each instance handle many requests concurrently, but when that isn’t enough, scaling out with more pods keeps latency low. I also tuned the autoscaling behavior: too aggressive and it would spin up pods unnecessarily; too conservative and it might lag behind traffic bursts. Through trial and error, I achieved smooth scaling during peak hours without over-provisioning.

Observability: Logging and Monitoring

Running many microservice instances means observability is vital. I set up centralized logging, which allows a unified search across logs when debugging issues. For metrics, I integrated Prometheus to scrape cluster and application metrics (CPU, memory, request latencies, etc.).

In Kubernetes, observability revolves around gathering metrics, logs, and traces to understand the system’s internal state. Grafana dashboards provide real-time visibility into performance (e.g., response times, error rates per service). I also use distributed tracing to follow a request’s path through multiple services, helping pinpoint bottlenecks across service boundaries. These observability tools let me quickly detect misbehaving services and fine-tune our system’s performance.

Load Balancing and Service Discovery

In this K3s-based platform, load balancing happens at multiple levels. The Azure Load Balancer directs traffic to the K3s ingress (Traefik), which routes incoming requests to the correct service based on URL paths. Each microservice also has an internal Kubernetes Service for service discovery. This means services can call each other by name, and Kubernetes will load-balance the requests across the pods. I don’t need to hard-code any addresses; the combination of Traefik ingress and Kubernetes service discovery provides robust load balancing for both external and inter-service traffic.

CI/CD and Deployment Workflow

To manage deployments at scale, I rely on an automated CI/CD pipeline. My code is hosted on GitHub, and each commit triggers a GitHub Actions workflow. The pipeline builds a Docker image for each microservice (using the Dockerfile shown earlier), tags it with a version, and pushes it to a container registry.

After that, the pipeline applies the updated Kubernetes manifests to the K3s cluster using kubectl apply, which performs a rolling update with zero downtime. In this way, going from a git push to running code in production is largely automated. This workflow makes releases consistent and requires minimal manual intervention.

Lessons Learned in Production

Building and scaling microservices with Docker and K3s has taught me several valuable lessons:

Graceful shutdowns: Ensure each FastAPI service handles termination signals. Kubernetes sends a SIGTERM to pods during shutdown; it should finish ongoing requests before exiting.
Rollout timing: Tweak readiness probes and pod termination grace periods so new pods receive traffic only when ready, and old pods stay alive until finishing work. This prevents blips in availability during rolling deployments.
Monitoring overhead: Balance observability with performance. It’s easy to overload the cluster with too many monitoring agents or excessive metrics scrapes. I learned to monitor what matters and adjust intervals to reduce overhead.

Conclusion

By using Docker and Kubernetes (K3s) together, I’m running a production-grade microservices platform that scales on demand, remains resilient, and is observable. FastAPI provides speed and flexibility in development, Docker ensures consistency across environments, and K3s on Azure gives me the power of Kubernetes with a leaner footprint.

With health checks, autoscaling, ingress routing, and thorough logging in place, the system can handle real-world traffic and recover from issues with minimal manual intervention. This journey has reinforced that careful architecture planning and a focus on observability are key when deploying microservices at scale. This approach has so far delivered a smooth, high-performance experience.

Kubernetes Docker (software) Production (computer science) Scaling (geometry) microservices

Opinions expressed by DZone contributors are their own.

Related

Trending