Keycloak in a cluster behaves "weird"

I’m currently using Keycloak in a 2-node cluster setup, and while it generally works, the performance has been inconsistent.

Here are the main issues I’m encountering:

  • Often, when logging into the master realm, I receive the message, “Your login attempt timed out. Login will start from the beginning,” even if I’ve just opened the browser.
  • Occasionally, I’m unable to change user settings, or I receive random logout notifications.

I suspect these issues might be due to an error with the ISPN cache, though I haven’t been able to identify any specific error messages.

Below is the current configuration (identical on both nodes):

~/keycloak/conf % ../bin/kc.sh show-config
Current Mode: production
Current Configuration:
        kc.cache =  ispn (keycloak.conf)
        kc.cache-config-file =  cache-ispn.xml (keycloak.conf)
        kc.cache-stack =  tcp (keycloak.conf)
        kc.config.built =  true (SysPropConfigSource)
        kc.db =  oracle (keycloak.conf)
        kc.db-password =  ******* (keycloak.conf)
        kc.db-url =  [the url]
        kc.db-username =  keycloak (keycloak.conf)
        kc.features =  persistent-user-sessions (keycloak.conf)
        kc.health-enabled =  true (keycloak.conf)
        kc.hostname =  hostname (keycloak.conf)
        kc.http-enabled =  false (keycloak.conf)
        kc.http-host =  xx.xx.xx.xx (keycloak.conf)
        kc.https-key-store-file =  /opt/app/bpm/keycloak/conf/serverKeyStore.p12 (keycloak.conf)
        kc.https-key-store-password =  ******* (keycloak.conf)
        kc.https-port =  10443 (keycloak.conf)
        kc.log =  console,file (keycloak.conf)
        kc.log-console-output =  default (classpath keycloak.conf)
        kc.log-file =  /opt/app/bpm/keycloak/log/keycloak.log (keycloak.conf)
        kc.log-level =  info,org.keycloak.truststore:debug,org.keycloak.events:debug,org.infinispan:debug (keycloak.conf)
        kc.optimized =  true (Persisted)
        kc.provider.file.ojdbc10-19.24.0.0.jar.last-modified =  1727256467645 (Persisted)
        kc.spi-connections-infinispan-quarkus-config-file =  cache-ispn.xml (keycloak.conf)
        kc.spi-connections-infinispan-quarkus-stack =  tcp (keycloak.conf)
        kc.spi-hostname-v2-hostname =  [hostname] (keycloak.conf)
        kc.spi-truststore-file-hostname-verification-policy =  ANY (keycloak.conf)
        kc.tls-hostname-verifier =  ANY (keycloak.conf)
        kc.truststore-paths =  ${kc.home.dir}/conf/ldapserver.pem (keycloak.conf)
        kc.version =  25.0.4 (SysPropConfigSource)

My cache-ispn.xml

<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:infinispan:config:15.0 http://www.infinispan.org/schemas/infinispan-config-15.0.xsd" xmlns="urn:infinispan:config:15.0">
	<jgroups>
		<stack name="tcpping" extends="tcp">
			<TCP external_addr="xx.xx.xx.xx" bind_addr="xx.xx.xx.xx" bind_port="7800"/>
			<TCPPING initial_hosts="xx.xx.xx.xx[7800],yy.yy.yy.yy[7800]" port_range="0" max_dynamic_hosts="0" stack.combine="REPLACE" stack.position="MPING" num_initial_members="1"/>
		</stack>
	</jgroups>
	<cache-container name="keycloak">
		<transport lock-timeout="60000" stack="tcpping"/>
		<local-cache name="realms" simple-cache="true">
			<encoding>
				<key media-type="application/x-java-object"/>
				<value media-type="application/x-java-object"/>
			</encoding>
			<memory max-count="10000"/>
		</local-cache>
		<local-cache name="users" simple-cache="true">
			<encoding>
				<key media-type="application/x-java-object"/>
				<value media-type="application/x-java-object"/>
			</encoding>
			<memory max-count="10000"/>
		</local-cache>
		<distributed-cache name="sessions" owners="2">
			<expiration lifespan="-1"/>
		</distributed-cache>
		<distributed-cache name="authenticationSessions" owners="2">
			<expiration lifespan="-1"/>
		</distributed-cache>
		<distributed-cache name="offlineSessions" owners="2">
			<expiration lifespan="-1"/>
		</distributed-cache>
		<distributed-cache name="clientSessions" owners="2">
			<expiration lifespan="-1"/>
		</distributed-cache>
		<distributed-cache name="offlineClientSessions" owners="2">
			<expiration lifespan="-1"/>
		</distributed-cache>
		<distributed-cache name="loginFailures" owners="2">
			<expiration lifespan="-1"/>
		</distributed-cache>
		<local-cache name="authorization" simple-cache="true">
			<encoding>
				<key media-type="application/x-java-object"/>
				<value media-type="application/x-java-object"/>
			</encoding>
			<memory max-count="10000"/>
		</local-cache>
		<replicated-cache name="work">
			<expiration lifespan="-1"/>
		</replicated-cache>
		<local-cache name="keys" simple-cache="true">
			<encoding>
				<key media-type="application/x-java-object"/>
				<value media-type="application/x-java-object"/>
			</encoding>
			<expiration max-idle="3600000"/>
			<memory max-count="1000"/>
		</local-cache>
		<distributed-cache name="actionTokens" owners="2">
			<encoding>
				<key media-type="application/x-java-object"/>
				<value media-type="application/x-java-object"/>
			</encoding>
			<expiration max-idle="-1" lifespan="-1" interval="300000"/>
			<memory max-count="-1"/>
		</distributed-cache>
	</cache-container>
</infinispan>

The Log looks fine with no errors. Maybe someone has some more insights on the matter

Your description of “timed out” authentications and “random logout notification” yield most probably to a not properly configured cluster. Seems, that the nodes can’t talk to each other. But from an only brief look at your posted config, I don’t see anything obvious.

Thank you for your reply.

Is there a way to check if the nodes can communicate with each other? It could be a network issue, but I don’t see anything unusual in the DEBUG logs on either node. Does Keycloak indicate anywhere if a connection is successfully established?

I’m not an Infinispan expert, it’s a complex product…
Usally, if everything works properly, you should see messages in the log

  • ...received new cluster view... which indicates that the nodes are seeing each other if there is the proper number shown in braces, e.g. ...(3)... followed with the node-hostnames.
  • mentioning something like start rebalncing and finished rebalancing (or similar) if the nodes are able to communicate (exchange data) with each other (IIRC default port is 7800?)

HTH

Thank you - In fact I only see

Received new cluster view for channel ISPN: [hostname-node1-39890|0] (1) [hostname-node1-39890]

So it seems the nodes are seeing each other. So now I can go from here…

There is only 1 host in your cluster, so this host does not see any other hosts.

Some considerations:

  • are the external_addr and bind_addr values correct?
  • is port 7800 open on all nodes and also on possible firewalls/networks in between?

In fact, there were two errors in the cache-ispn.xml:

  • The Keycloak didn’t like the num_initial_members-Attribute. I removed it. Not sure why that’s an issue or wether or not this was the root cause
  • On my second node, I still had instead of - So yeah, that’s why my cluster did not work.