Keycloak in HA mode inside EKS Cluster

Right now we are using a single instance of statefulset for deploying keycloak in staging env. The setup is pretty start forward,
We have a TLS termination at LB side, and we are using plain http within the cluster. We don’t require to expose keycloak to outside env, and is accessed by service that is deployed within the cluster. So traffic is within the cluster.

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: keycloak
  namespace: keycloak-staging
  labels:
    app: keycloak
spec:
  serviceName: staging-keycloak-headless
  replicas: 1
  selector:
    matchLabels:
      app: keycloak
  template:
    metadata:
      labels:
        app: keycloak
    spec:
      containers:
        - name: keycloak
          image: quay.io/keycloak/keycloak:26.4
          args: ["start", "--optimized"]
          env:
            - name: KC_BOOTSTRAP_ADMIN_USERNAME
              value: "user"
            - name: KC_BOOTSTRAP_ADMIN_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: keycloak-admin-secret
                  key: admin-password
            - name: KC_PROXY_HEADERS
              value: "xforwarded"
            - name: KC_HTTP_ENABLED
              value: "true"
            - name: KC_HOSTNAME_STRICT
              value: "false"
            - name: KC_HEALTH_ENABLED
              value: "true"
            - name: 'KC_CACHE'
              value: 'ispn'
            - name: 'KC_LOG'
              value: "console"
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: JAVA_OPTS_APPEND
              value: '-Djgroups.bind.address=$(POD_IP)'
            - name: 'KC_DB_URL_DATABASE'
              valueFrom:
                secretKeyRef:
                  name: keycloak-external-db-secret
                  key: db-name
            - name: 'KC_DB_URL_HOST'
              valueFrom:
                secretKeyRef:
                  name: keycloak-external-db-secret
                  key: db-host
            - name: 'KC_DB'
              value: 'postgres'
            - name: 'KC_DB_PASSWORD'
              valueFrom:
                secretKeyRef:
                  name: keycloak-external-db-secret
                  key: db-password
            - name: 'KC_DB_USERNAME'
              valueFrom:
                secretKeyRef:
                  name: keycloak-external-db-secret
                  key: db-user
          ports:
            - name: http
              containerPort: 8080
          startupProbe:
            httpGet:
              path: /health/started
              port: 9000
            periodSeconds: 1
            failureThreshold: 600
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 9000
            periodSeconds: 10
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /health/live
              port: 9000
            periodSeconds: 10
            failureThreshold: 3
          resources:
            limits:
              memory: 800Mi
            requests:
              memory: 700Mi  

Last week suddenly i got the issue as follows

Appending additional Java properties to JAVA_OPTS
Changes detected in configuration. Updating the server image.
Updating the configuration and installing your custom providers, if any. Please wait.
2026-02-12 13:40:43,359 INFO  [io.quarkus.deployment.QuarkusAugmentor] (main) Quarkus augmentation completed in 27839ms
Server configuration updated and persisted. Run the following command to review the configuration:

        kc.sh show-config

Next time you run the server, just run:

        kc.sh start --optimized

2026-02-12 13:40:57,533 INFO  [org.hibernate.orm.jdbc.batch] (JPA Startup Thread) HHH100501: Automatic JDBC statement batching enabled (maximum batch size 32)
2026-02-12 13:40:59,413 INFO  [org.keycloak.spi.infinispan.impl.embedded.JGroupsConfigurator] (main) JGroups JDBC_PING discovery enabled.
2026-02-12 13:40:59,494 INFO  [org.keycloak.spi.infinispan.impl.embedded.JGroupsConfigurator] (main) JGroups Encryption enabled (mTLS).
2026-02-12 13:40:59,904 INFO  [org.infinispan.CONTAINER] (main) Virtual threads support enabled
2026-02-12 13:41:00,255 INFO  [org.keycloak.jgroups.certificates.CertificateReloadManager] (main) Starting JGroups certificate reload manager
2026-02-12 13:41:00,651 INFO  [org.infinispan.CONTAINER] (main) ISPN000556: Starting user marshaller 'org.infinispan.commons.marshall.ImmutableProtoStreamMarshaller'
2026-02-12 13:41:01,005 INFO  [org.infinispan.CLUSTER] (main) ISPN000078: Starting JGroups channel `ISPN` with stack `jdbc-ping`
2026-02-12 13:41:01,007 INFO  [org.jgroups.JChannel] (main) local_addr: 00000000-0000-0000-0000-00000000001a, name: keycloak-0-11675
2026-02-12 13:41:01,033 INFO  [org.jgroups.protocols.FD_SOCK2] (main) server listening on *:57800
2026-02-12 13:41:03,084 WARN  [org.jgroups.protocols.pbcast.GMS] (main) keycloak-0-11675: JOIN(keycloak-0-11675) sent to keycloak-0-22350 timed out (after 2000 ms), on try 0
2026-02-12 13:41:05,098 WARN  [org.jgroups.protocols.pbcast.GMS] (main) keycloak-0-11675: JOIN(keycloak-0-11675) sent to keycloak-0-22350 timed out (after 2000 ms), on try 1
2026-02-12 13:41:07,112 WARN  [org.jgroups.protocols.pbcast.GMS] (main) keycloak-0-11675: JOIN(keycloak-0-11675) sent to keycloak-0-22350 timed out (after 2000 ms), on try 2
2026-02-12 13:41:09,125 WARN  [org.jgroups.protocols.pbcast.GMS] (main) keycloak-0-11675: JOIN(keycloak-0-11675) sent to keycloak-0-22350 timed out (after 2000 ms), on try 3
2026-02-12 13:41:11,137 WARN  [org.jgroups.protocols.pbcast.GMS] (main) keycloak-0-11675: JOIN(keycloak-0-11675) sent to keycloak-0-22350 timed out (after 2000 ms), on try 4
2026-02-12 13:41:13,148 WARN  [org.jgroups.protocols.pbcast.GMS] (main) keycloak-0-11675: JOIN(keycloak-0-11675) sent to keycloak-0-22350 timed out (after 2000 ms), on try 5
2026-02-12 13:41:15,161 WARN  [org.jgroups.protocols.pbcast.GMS] (main) keycloak-0-11675: JOIN(keycloak-0-11675) sent to keycloak-0-22350 timed out (after 2000 ms), on try 6
2026-02-12 13:41:17,176 WARN  [org.jgroups.protocols.pbcast.GMS] (main) keycloak-0-11675: JOIN(keycloak-0-11675) sent to keycloak-0-22350 timed out (after 2000 ms), on try 7
2026-02-12 13:41:19,195 WARN  [org.jgroups.protocols.pbcast.GMS] (main) keycloak-0-11675: JOIN(keycloak-0-11675) sent to keycloak-0-22350 timed out (after 2000 ms), on try 8
2026-02-12 13:41:21,221 WARN  [org.jgroups.protocols.pbcast.GMS] (main) keycloak-0-11675: JOIN(keycloak-0-11675) sent to keycloak-0-22350 timed out (after 2000 ms), on try 9
2026-02-12 13:41:21,221 WARN  [org.jgroups.protocols.pbcast.GMS] (main) keycloak-0-11675: too many JOIN attempts (10): becoming singleton

This issue is also resolved. i basically delete the entries from jgroups_ping and changed KC_CACHE from ispn to local.

Now i want to implement a HA setup. Minimum requirement is running 3 replicas. I looked into docs, i need to change following env

KC_CACHE = ispn
KC_CACHE_STACK = jdbc-ping [Since kubernetes is already deprecated]
JAVA_OPTS_APPEND = "-Djgroups.bind_addr=$(POD_IP) -Djgroups.external_addr=$(POD_IP)"

Are these update as per the best thing i can do with my setup ? Also how would this changes avoid the issue regarding singleton ?

Thanks.

First of all, your WARN log entries come from a not graceful shutdown of your Keycloak instance. You had 1 instance running in regular mode, which means it created a cluster of 1. After the non-graceful shutdown, there were leftovers in the registry of the old instance, the newly started instance tried to connect to the old one, before it removed it from the registry. That’s why the warnings stopped at some time. If your instance stops gracefully, at next start there won’t be these warnings.


For a working cluster, it’s pretty simple (nowadays, it was hard formerly):
You don’t need to set KC_CACHE=ispn and KC_CACHE_STACK=jdbc_ping, as these values are the default.
You only need to set the jgroups.bind_addr and jgroups.external_addr if your pods are in different networks. If your pods are running in the same K8s cluster network and can see/reach each other via their container ids or internal ip adresses, you don’t need to set these values. Remove the whole JAVA_OPTS_APPEND env var.
Make sure that port 7800 and 57800 is opened between your pods.

Now the pods should find each other and exchange cache data with each other.

1 Like

Thank you for your help.

Yes, i can conform this was not-graceful shutdown. Noisy neighbor caused such issue.

I have moved the keycloak to different node-groups as per the requirement, and with enough headroom.

Okay i will make necessary changes with additional port 57800. [ I already had 7800 in headless service exposed].

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.