Comparing Spring Boot WAS Thread Models & Undertow Optimization in EKS

How do Web Application Servers collapse when requests outnumber threads? Exploring WAS defense strategies and survival tactics in Cloud Native environments.

1. Overview: How Servers Defend Against Traffic Spikes

When traffic surges, Web Application Servers (WAS) employ vastly different methodologies to queue incoming requests and defend themselves from collapsing. To prevent cascading failures and operate a resilient service, it remains crucial to comprehend the distinct I/O thread, worker thread architectures, and queuing behaviors inherent to each WAS.

🍅 Tomcat (NIO)

Architecture: Strictly segregated into Acceptor, Poller (I/O), and a Worker Thread Pool.
Queueing: Fills the Worker Queue → Suspends in Max Connections → Piles into OS TCP Backlog. Only when all are saturated does it throw Connection Refused.
Characteristics: A deeply tenacious 'multi-layered buffering' architecture that holds onto connections as long as possible without severing them.

⛵ Jetty

Architecture: Does not strictly segregate I/O and Worker threads, sharing a single QueuedThreadPool using the EWYK (Eat What You Kill) strategy.
Queueing: Transitions into 'Low Resources Mode' seamlessly when threads become exhausted.
Characteristics: Aggressively self-preserves memory by forcefully terminating idle connections or swiftly rejecting new incoming requests.

🌊 Undertow (XNIO)

Architecture: Built on the highly lightweight XNIO framework. Non-blocking operations are handled directly by CPU-bound I/O Threads, while blocking tasks are offloaded to a designated Worker Pool.
Queueing: Immediately stacks requests into a Task Queue without bounds once worker threads are depleted.
Characteristics: Exceptionally lightweight, fast, and agile, but demands meticulous caution due to its unrestricted queuing horizon.

2. Why Undertow Shines in MSA (EKS) Environments

In highly dynamic Kubernetes (EKS) environments where multitudes of Pods undergo relentless horizontal scaling (Scale-out and Scale-in), Undertow consistently proves to be overwhelmingly superior to the heavier, connection-clinging Tomcat.

Lightweight Memory Footprint: Its drastically reduced baseline Heap memory footprint allows EKS Nodes (EC2s) to host significantly more Pod density with limited resources.
Blazing Fast Startup: When the Horizontal Pod Autoscaler (HPA) triggers, the cold start time before a new Pod actively accepts traffic is remarkably shorter.
High I/O Throughput: MSA environments generate intensive intrinsic East-West networking traffic. Undertow's excellent non-blocking XNIO roots significantly alleviate network bottlenecking.

3. Undertow's Fatal Flaw: The Unbounded Queue & Cascading OOM Failures

While Undertow seems perfectly tailored for EKS, there lies a venomous trap when operated cleanly on default settings: Undertow's Worker Thread Task Queue is implicitly unbounded.

Memory Spikes: When traffic spikes intensely or external downstream DB bottlenecks exhaust your worker threads, the non-blocking I/O threads stubbornly persist in absorbing requests, blindly stacking them directly into an infinite queue.
OOM Killer Execution: An endlessly piling queue of un-collectable HTTP Request objects detonates your JVM Heap. EKS Nodes instantly penalize and evict Pods crossing their hard memory limits by executing a forceful OOMKilled.
The Cascading Blackhole: Once a single Pod yields to demise, ALBs rapidly re-route that immense surplus traffic onto the surviving adjacent Pods. Those remaining Pods, inevitably inheriting an unmanageable load, collapse successively in a catastrophic Cascading Failure.

4. The Solution: Fail-Fast Strategies & Dynamic Spec-Based Thread Provisioning

As a Cloud Native architect, it is drastically favorable to promptly reject an impossible load (Fail-fast, throwing a 503 Service Unavailable) than allow the request queue to obliterate the Pod. Firmly preserving a single Pod's memory boundaries guarantees survival. Let the Load Balancers or Client-side Retry configurations perform the recovery.

❌ The Anti-Pattern: Static application.yml

Many developers unconsciously set static limits in their YAML configurations:

server:
  undertow:
    threads:
      io: 4
      worker: 40

In a static monolithic environment, this might suffice. But in Kubernetes, your service might be simultaneously deployed across divergent node groups with varying capacities (e.g., fractional 0.5 vCPUs vs. 4.0 vCPUs). A static configuration will inevitably either severely underutilize a large node or instantly paralyze a small node through destructive thread context switching. Furthermore, Spring Boot's properties offer no natively exposed property to bound the Undertow task queue.

✅ The Kubernetes-Native Pattern: Dynamic Bootstrap

To survive elastic scaling, we must precisely probe for container cores and explicitly cap the task queue limit during bootstrap dynamically via code. (Since Java 10+, the JVM perfectly bounds fractional container CPUs courtesy of the default UseContainerSupport feature.)

import io.undertow.UndertowOptions
import org.springframework.boot.web.embedded.undertow.UndertowServletWebServerFactory
import org.springframework.boot.web.server.WebServerFactoryCustomizer
import org.springframework.context.annotation.Configuration
import org.xnio.Options
import kotlin.math.max

@Configuration
class UndertowConfig : WebServerFactoryCustomizer<UndertowServletWebServerFactory> {

    override fun customize(factory: UndertowServletWebServerFactory) {
        // 1. Retrieve the virtual CPU core constraints assigned to the EKS Pod (cgroup limit)
        val availableProcessors = Runtime.getRuntime().availableProcessors()
        
        // 2. Calculate dynamic Thread allocations based on available computing limits
        // I/O Threads map perfectly with the core count for pure non-blocking networking (Min: 2)
        val ioThreads = max(availableProcessors, 2)
        
        // Worker Threads: Handle internal blocking APIs. Common multiple of I/O (e.g., 10x - 20x)
        val workerThreads = ioThreads * 10
        
        // The absolute maximum buffer barrier before rejecting new incoming clients
        val maxTaskQueueSize = 1000 

        factory.addBuilderCustomizers({ builder ->
            // Programmatically inject dynamic thread horizons based on specific Pod instance
            builder.setServerOption(Options.WORKER_IO_THREADS, ioThreads)
            builder.setServerOption(Options.WORKER_TASK_CORE_THREADS, workerThreads)
            builder.setServerOption(Options.WORKER_TASK_MAX_THREADS, workerThreads)
            
            // 3. Enforce the ultimate safeguard against Unbounded Queue OOM Destructions
            // Instead of queueing to death, Undertow will immediately reject overflowing connections
            builder.setServerOption(Options.WORKER_TASK_LIMIT, maxTaskQueueSize)
        })
    }
}

5. Surviving the Aftermath: Handling the 503 Rejections

Once we implement the strict unbounded queue limit via our UndertowConfig, Undertow will ruthlessly reject overflow requests with a 503 Service Unavailable error. While this flawlessly prevents the JVM from bleeding into an OOM state, we cannot simply let these raw failures surface to the end-users.

The true power of the Fail-Fast mechanism is fully realized when paired with an intelligent load balancer or proxy layer (e.g., Envoy, Istio, or AWS ALB) capable of intercepting the 503 error and transparently retrying the payload onto an adjacent, healthy replica.

# Example: Istio VirtualService Retry Policy bounding 503 errors
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
spec:
  http:
  - route:
    - destination:
        host: spring-service
    retries:
      attempts: 3
      perTryTimeout: 2s
      retryOn: 5xx,connect-failure

By returning a decisive 503, Istio instantly interprets that this specific Pod has surpassed its capacity limit and immediately re-routes the traffic through its Service Mesh. This interplay guarantees a totally seamless client experience while keeping every single Pod's memory completely insulated from destruction.

💡 The Architect's Realization

Within elastic environments, analyzing structural integrity shouldn't primarily center on asking "How many vast requests can we sustain?". Instead, focus entirely on "How elegantly and safely can we Fail-fast when confronted with an unconditionally catastrophic load?" A precise dynamic thread logic bound with an unbreakable fixed boundary queue operates as the most uncompromising shield against destructive Cascading Failures.