I have a problem that I have faced recently after upgarding Keycloak with embded cache from version 16 to 23.0.6.
My setup is a an EKS (k8s in AWS) cluster with 8 nodes and a keycloak statfulset with 8 replicas configured with Infinispan cache with persistence to a PVC each pod has it’s own PVC, If we start the cluster all the pods join the cluster and all works fine.
But as EKS scale down and scale up the cluster nodes leaving and rejoining the cluster and so the pods but the new pod joining the cluster can’t start the “clientsession” cache in embded infinispan and we face a CrashloopBackoff condition with this exception :
2024-02-27 10:51:12,156 DEBUG [org.jgroups.protocols.dns.DNS_PING] (jgroups-16,keycloak-ha-v2-8-48772) keycloak-ha-v2-8-48772: sending discovery requests to hosts [10.100.69.67:0, 10.100.204.128:0, 10.100.103.231:0, 10.100.246.45:0, 10.100.237.223:0, 10.100.201.119:0, 10.100.126.130:0, 10.100.247.42:0, 10.100.252.77:0] on ports [7800 .. 7800]
2024-02-27 10:51:31,000 DEBUG [org.infinispan.persistence.sifs.FileProvider] (blocking-thread--p3-t5) openChannel(/opt/keycloak/cache-kc/data/clientSessions/data/ispn12.12)
2024-02-27 10:51:31,096 DEBUG [org.infinispan.persistence.sifs.FileProvider] (blocking-thread--p3-t2) openChannel(/opt/keycloak/cache-kc/data/clientSessions/data/ispn12.12)
2024-02-27 10:51:40,869 DEBUG [org.infinispan.statetransfer.InboundTransferTask] (jgroups-16,keycloak-ha-v2-8-48772) Finished receiving state for segments {227-231}
2024-02-27 10:51:45,017 DEBUG [org.infinispan.cache.impl.CacheImpl] (keycloak-cache-init) Stopping cache as exception encountered waiting for state transfer
2024-02-27 10:51:45,017 DEBUG [org.infinispan.CONTAINER] (keycloak-cache-init) Passivating all entries to disk
2024-02-27 10:51:46,098 DEBUG [org.infinispan.CONTAINER] (non-blocking-thread--p2-t2) Passivated 500 entries in 1.08 seconds
2024-02-27 10:51:46,098 DEBUG [org.infinispan.topology.LocalTopologyManagerImpl] (keycloak-cache-init) Node keycloak-ha-v2-8-48772 leaving cache clientSessions
2024-02-27 10:51:46,555 ERROR [org.infinispan.CONFIG] (keycloak-cache-init) ISPN000660: DefaultCacheManager start failed, stopping any running components: org.infinispan.commons.CacheException
at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:243)
at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1013)
at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:504)
at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:727)
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:673)
at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:562)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:525)
at org.infinispan.security.actions.GetCacheAction.run(GetCacheAction.java:26)
at org.infinispan.security.actions.GetCacheAction.run(GetCacheAction.java:14)
at org.infinispan.security.Security.doPrivileged(Security.java:56)
at org.infinispan.globalstate.impl.SecurityActions.doPrivileged(SecurityActions.java:30)
at org.infinispan.globalstate.impl.SecurityActions.getCache(SecurityActions.java:39)
at org.infinispan.globalstate.impl.GlobalConfigurationManagerImpl.start(GlobalConfigurationManagerImpl.java:114)
at org.infinispan.globalstate.impl.CorePackageImpl$2.start(CorePackageImpl.java:61)
at org.infinispan.globalstate.impl.CorePackageImpl$2.start(CorePackageImpl.java:48)
at org.infinispan.factories.impl.BasicComponentRegistryImpl.invokeStart(BasicComponentRegistryImpl.java:616)
at org.infinispan.factories.impl.BasicComponentRegistryImpl.doStartWrapper(BasicComponentRegistryImpl.java:607)
at org.infinispan.factories.impl.BasicComponentRegistryImpl.startWrapper(BasicComponentRegistryImpl.java:576)
at org.infinispan.factories.impl.BasicComponentRegistryImpl$ComponentWrapper.running(BasicComponentRegistryImpl.java:807)
at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:379)
at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:252)
at org.infinispan.manager.DefaultCacheManager.internalStart(DefaultCacheManager.java:779)
at org.infinispan.manager.DefaultCacheManager.start(DefaultCacheManager.java:747)
at org.infinispan.manager.DefaultCacheManager.<init>(DefaultCacheManager.java:411)
at org.keycloak.quarkus.runtime.storage.legacy.infinispan.CacheManagerFactory.startCacheManager(CacheManagerFactory.java:96)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.InterruptedException
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:386)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:236)
... 28 more
2024-02-27 10:51:46,556 INFO [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000080: Disconnecting JGroups channel `ISPN`
2024-02-27 10:51:46,559 DEBUG [org.jgroups.protocols.TCP] (keycloak-cache-init) keycloak-ha-v2-8-48772: closing sockets and stopping threads
The configuration used for cache is :
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:14.0 http://www.infinispan.org/schemas/infinispan-config-14.0.xsd"
xmlns="urn:infinispan:config:14.0">
<cache-container name="keycloak">
<transport lock-timeout="60000"/>
<global-state>
<persistent-location path="/opt/keycloak/cache-kc" />
</global-state>
<local-cache name="realms" simple-cache="true">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<memory max-count="10000"/>
</local-cache>
<local-cache name="users" simple-cache="true">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<memory max-count="10000"/>
</local-cache>
<distributed-cache name="sessions" owners="${env.CACHE_OWNERS}">
<expiration lifespan="-1"/>
<persistence passivation="true">
<file-store shared="false" purge="false" preload="false" >
<data path="data"/>
<index path="index"/>
<write-behind modification-queue-size="2048" fail-silently="true" />
</file-store>
</persistence>
<memory max-count="500" when-full="REMOVE"/>
</distributed-cache>
<distributed-cache name="authenticationSessions" owners="${env.CACHE_OWNERS}">
<expiration lifespan="-1"/>
<persistence passivation="true">
<file-store shared="false" purge="false" preload="false" >
<data path="data"/>
<index path="index"/>
<write-behind modification-queue-size="2048" fail-silently="true" />
</file-store>
</persistence>
<memory max-count="500" when-full="REMOVE"/>
</distributed-cache>
<distributed-cache name="offlineSessions" owners="${env.CACHE_OWNERS}">
<expiration lifespan="-1"/>
<persistence passivation="true">
<file-store shared="false" purge="false" preload="false" >
<data path="data"/>
<index path="index"/>
<write-behind modification-queue-size="2048" fail-silently="true" />
</file-store>
</persistence>
<memory max-count="500" when-full="REMOVE"/>
</distributed-cache>
<distributed-cache name="clientSessions" owners="${env.CACHE_OWNERS}">
<expiration lifespan="-1"/>
<persistence passivation="true">
<file-store shared="false" purge="false" preload="false" >
<data path="data"/>
<index path="index"/>
<write-behind modification-queue-size="2048" fail-silently="true" />
</file-store>
</persistence>
<memory max-count="500" when-full="REMOVE"/>
</distributed-cache>
<distributed-cache name="offlineClientSessions" owners="${env.CACHE_OWNERS}">
<expiration lifespan="-1"/>
<persistence passivation="true">
<file-store shared="false" purge="false" preload="false" >
<data path="data"/>
<index path="index"/>
<write-behind modification-queue-size="2048" fail-silently="true" />
</file-store>
</persistence>
<memory max-count="500" when-full="REMOVE"/>
</distributed-cache>
<distributed-cache name="loginFailures" owners="${env.CACHE_OWNERS}">
<expiration lifespan="-1"/>
<persistence passivation="true">
<file-store shared="false" purge="false" preload="false" >
<data path="data"/>
<index path="index"/>
<write-behind modification-queue-size="2048" fail-silently="true" />
</file-store>
</persistence>
<memory max-count="500" when-full="REMOVE"/>
</distributed-cache>
<local-cache name="authorization" simple-cache="true">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<memory max-count="10000"/>
</local-cache>
<replicated-cache name="work">
<expiration lifespan="-1"/>
</replicated-cache>
<local-cache name="keys" simple-cache="true">
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<expiration max-idle="3600000"/>
<memory max-count="1000"/>
</local-cache>
<distributed-cache name="actionTokens" owners="${env.CACHE_OWNERS}">
<persistence passivation="true">
<file-store shared="false" purge="false" preload="false" >
<data path="data"/>
<index path="index"/>
<write-behind modification-queue-size="2048" fail-silently="true" />
</file-store>
</persistence>
<encoding>
<key media-type="application/x-java-object"/>
<value media-type="application/x-java-object"/>
</encoding>
<expiration max-idle="-1" lifespan="-1" interval="300000"/>
<memory max-count="-1"/>
</distributed-cache>
</cache-container>
</infinispan>
I’m not an expert of keycloak / infnispan configurations, I appreciate any help with this issue.
Thank you very much.
Best regards