Spring Boot on EKS: Optimizing Health Probes for Efficient Deployments | Software Engineering Blog

When deploying Spring Boot applications on Amazon Elastic Kubernetes Service (EKS), managing application health is crucial for reliable service operations. Kubernetes provides three types of health probes—startup, liveness, and readiness—that work together to ensure your applications start correctly, remain responsive, and handle traffic appropriately.

However, one significant challenge is probe timing configuration. If probe settings are too conservative, deployment times become unnecessarily long, while overly aggressive settings can result in errors when the application receives traffic before it's fully ready.

This article focuses on configuring Spring Boot Actuator with basic probe settings and optimizing probe timing to match your application's actual startup time, minimizing deployment delays while maintaining service stability.

Understanding Kubernetes Probes

Kubernetes uses three distinct probe types to determine the health and availability of your containers:

Probe Types

Startup Probe: Determines whether the application has started successfully. Disables liveness and readiness checks until it succeeds.
Liveness Probe: Verifies if the application is running. If it fails, the container is restarted.
Readiness Probe: Checks if the container is ready to receive traffic. If it fails, the pod is removed from service endpoints.

Each probe can be configured with the following parameters:

initialDelaySeconds: Seconds after container starts before probe begins
periodSeconds: How often the probe runs (default: 10 seconds)
timeoutSeconds: Seconds after which probe times out (default: 1 second)
successThreshold: Minimum consecutive successes to consider probe successful (default: 1)
failureThreshold: Number of failures before giving up (default: 3)

Spring Boot Actuator Setup

Spring Boot's Actuator provides health endpoints that integrate perfectly with Kubernetes probes:

1. Adding Actuator Dependency (Maven)

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Gradle Configuration

implementation("org.springframework.boot:spring-boot-starter-actuator")

2. Configuring Health Endpoints

management:
  endpoint:
    health:
      probes:
        enabled: true
      show-details: always
      group:
        readiness:
          include: readinessState, db, redis, diskSpace
        liveness:
          include: livenessState
        startup:
          include: startupState
  endpoints:
    web:
      exposure:
        include: health

With this configuration, Spring Boot provides the following endpoints:

/actuator/health/readiness: Checks if the application and dependent services are ready to handle traffic
/actuator/health/liveness: Verifies the application is responsive
/actuator/health/startup: Confirms application initialization is complete

Optimizing Probe Timing for Actual Startup Time

When your application has an actual startup time of around 20 seconds, it's important to optimize probe settings accordingly. Overly conservative settings unnecessarily extend deployment time, while too aggressive settings may cause errors.

Measuring Actual Startup Time

Started Application in 19.329 seconds (JVM running for 20.412)

This information serves as the baseline for your probe timing configuration.

Startup Probe Configuration for ~20s Startup Time

startupProbe:
  httpGet:
    path: /actuator/health/startup
    port: 8080
  initialDelaySeconds: 5    # Start checking 5 seconds after container starts
  periodSeconds: 5          # Check every 5 seconds
  failureThreshold: 6       # Allow up to 6 failures (total 30 seconds)
  timeoutSeconds: 1         # 1 second timeout for each check

This configuration allows up to 35 seconds for startup (5s initial delay + 5s × 6 failures), providing a reasonable buffer over the typical 20-second startup time to accommodate occasional slower starts.

Optimizing Liveness & Readiness Probes

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  periodSeconds: 20         # Check every 20 seconds
  failureThreshold: 3       # Restart after 3 consecutive failures
  timeoutSeconds: 3         # 3 second timeout

readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  periodSeconds: 10         # Check every 10 seconds
  failureThreshold: 3       # Remove from service after 3 failures
  timeoutSeconds: 3         # 3 second timeout

Note the absence of initialDelaySeconds. Since these probes only activate after the Startup Probe succeeds, additional delays are unnecessary.

Deployment Timing Issues & Solutions

Several common issues can affect your deployment timing when using health probes:

Issue 1: Overly Conservative Probe Settings

Problem: When an application starts in 20 seconds but probe timeout is set to 5 minutes, deployment times are unnecessarily extended.

Solution: Set timeouts based on actual startup time plus a reasonable buffer (e.g., 50%). For a 20-second startup time, around 30 seconds is appropriate.

Optimized Startup Probe Settings

startupProbe:
  # 20s startup time + 10s buffer = 30s total allowance
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 5  # 5 × 5s = 25s (plus 5s initial delay for 30s total)

Issue 2: Startup Time Variations Under Load

Problem: Application startup time may vary with system load or resource constraints.

Solution:

Test startup times under conditions similar to production
Monitor and collect statistics on application startup times
Set probe timeouts based on 95th percentile startup time

Issue 3: Dependency Initialization Delays

Problem: External dependencies like databases and caches can affect application startup time.

Solution:

Check external dependency states in Readiness Probes
Parallelize application startup and dependency initialization where possible
Use asynchronous initialization patterns to reduce startup time

Hikari Connection Pool Configuration

# application.yml
spring:
  datasource:
    hikari:
      initialization-fail-timeout: 0  # Fail fast on connection failures to reduce startup time

Optimized Deployment Example

Here's a complete Kubernetes deployment manifest optimized for a Spring Boot application with approximately 20 seconds startup time:

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-boot-service
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: spring-boot-service
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0  # Set to 0 for zero-downtime deployments
  template:
    metadata:
      labels:
        app: spring-boot-service
    spec:
      containers:
      - name: spring-boot-service
        image: ${ECR_REPO}/spring-boot-service:${IMAGE_TAG}
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "1000m"
            memory: "1Gi"
        env:
        - name: SPRING_PROFILES_ACTIVE
          value: "production"
        # Startup Probe - optimized for ~20s startup time
        startupProbe:
          httpGet:
            path: /actuator/health/startup
            port: 8080
          initialDelaySeconds: 5    # Wait 5s after container starts
          periodSeconds: 5          # Check every 5s
          failureThreshold: 6       # Allow up to 6 failures (total 30s)
          timeoutSeconds: 1

        # Liveness Probe - checks application responsiveness
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          periodSeconds: 20
          failureThreshold: 3
          timeoutSeconds: 3

        # Readiness Probe - checks traffic readiness
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          periodSeconds: 10
          failureThreshold: 3
          timeoutSeconds: 3

        # Graceful shutdown handling
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 5"]  # Time for request draining

      terminationGracePeriodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
  name: spring-boot-service
  namespace: production
spec:
  selector:
    app: spring-boot-service
  ports:
  - port: 80
    targetPort: 8080
  type: ClusterIP

Monitoring & Tuning Deployment Times

After optimizing probe settings, it's important to monitor and continuously tune deployment times:

1. Measuring Deployment Time

# Start deployment
DEPLOY_START=$(date +%s)
kubectl apply -f deployment.yaml

# Wait for deployment completion
kubectl rollout status deployment/spring-boot-service -n production

# Deployment complete
DEPLOY_END=$(date +%s)
echo "Deployment took $((DEPLOY_END-DEPLOY_START)) seconds"

2. Monitoring Probe Failures

# Check for container restarts and probe failures
kubectl get events -n production | grep -E 'Unhealthy|Pulled|Started'

3. Incrementally Adjusting Probe Settings

Gradually adjust probe settings based on your measurements. Finding the right balance between deployment speed and stability is key.

If deployments are consistently successful and fast, you might reduce timeouts slightly
If occasional failures occur, increase timeouts incrementally
Document the correlation between startup times and probe settings for future reference

Conclusion

Optimizing probe timing for Spring Boot applications on EKS is crucial for balancing deployment speed and service stability. By accurately measuring your application's actual startup time (around 20 seconds in our example) and configuring probes accordingly, you can minimize unnecessary deployment delays while maintaining reliable service operations.

The basic Spring Boot Actuator health endpoints integrate seamlessly with Kubernetes probes, and with proper timing settings, you can build an efficient deployment pipeline.

Key takeaways:

Configure probe settings based on actual startup time plus a small buffer
Use Startup Probes to protect application initialization
Continuously monitor and optimize deployment times and probe settings
Find the right balance between speed and stability for your specific application

Table of Contents