We are running Keycloak deployed using the Bitnami Helm chart in Kubernetes with a PostgreSQL database. Our ingress setup consists of Istio Ingress Gateway & VirtualService, with AWS ALB as the ingress controller (60s timeout).
Issue:
Despite tuning Keycloak settings, token requests start failing when we exceed 100 concurrent users. The authentication process slows down, and eventually, requests time out.
Current Configurations:
PostgreSQL max connections: 5000
Keycloak Environment Variables:
- name: “KC_DB_POOL_MAX_SIZE”
value: “50000” - name: “KC_HTTP_POOL_MAX_THREADS”
value: “5000” - name: “KC_TRANSACTION_TIMEOUT”
value: “1500” - name: “KC_DB_POOL_INITIAL_SIZE”
value: “1000” - name: “KC_DB_POOL_MIN_SIZE”
value: “1000” - name: “KC_HTTP_MAX_QUEUED_REQUESTS”
value: “5000” - name: “KC_CACHE”
value: “ispn” - name: “KC_CACHE_STACK”
value: “kubernetes” - name: “KC_THREADS”
value: “500” - name: “KC_CACHE_EMBEDDED_CLIENT_SESSIONS_MAX_COUNT”
value: “5000”
Observations:
CPU and memory usage remain within normal limits.
PostgreSQL connections do not max out.
Istio and AWS ALB logs show no apparent errors.
The failure occurs when user sessions exceed 100 concurrent users.
Are these configurations optimal for handling 1000+ concurrent users?
Is Istio’s TLS termination or VirtualService causing overhead?
Should we tune Infinispan cache settings differently?
Would it help to adjust AWS ALB idle timeout or Istio timeouts?
Any insights or recommendations would be greatly appreciated!