We have configured the 24.0.3 keycloak HA cluster with ISPN cache enabled (no external ISPN server, just horizontal model) and only 2 nodes available. When both Keycloak instances are successfully started the server log files on both sides contain the cache initialization information showing both nodes are connected to cluster. The issue appears after performing following reproduction steps:
- Open the admin console on one of the nodes
- Create new realm - the success message is shown
- Create new user federation - the success message is shown
- Sync all user of newly created user federation - the success message is shown
- Switch to Realm settings tab, change one of properties on General subtab (even Display name will do) and click Save button - the error message is shown stating the Realm cannot be saved
The log file contains the following error message:
2024-09-30 05:46:33,052 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread–p4-t1) ISPN000136: Error executing command PutKeyValueCommand on Cache ‘work’, writing keys [task::ClearExpiredEvents]: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 1086 from PCADAPP02-43383 after 15 seconds
at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:179)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:88)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
2024-09-30 05:46:33,052 ERROR [org.keycloak.services.scheduled.ScheduledTaskRunner] (Timer-0) Failed to run scheduled task ClearExpiredEvents: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 1086 from PCADAPP02-43383 after 15 seconds
at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:258)
at org.infinispan.cache.impl.InvocationHelper.doInvoke(InvocationHelper.java:323)
at org.infinispan.cache.impl.InvocationHelper.invoke(InvocationHelper.java:111)
at org.infinispan.cache.impl.InvocationHelper.invoke(InvocationHelper.java:93)
at org.infinispan.cache.impl.CacheImpl.putIfAbsent(CacheImpl.java:1334)
at org.infinispan.cache.impl.CacheImpl.putIfAbsent(CacheImpl.java:1328)
at org.infinispan.cache.impl.CacheImpl.putIfAbsent(CacheImpl.java:1324)
at org.infinispan.cache.impl.CacheImpl.putIfAbsent(CacheImpl.java:236)
at org.infinispan.cache.impl.AbstractDelegatingCache.putIfAbsent(AbstractDelegatingCache.java:118)
at org.infinispan.cache.impl.AbstractDelegatingCache.putIfAbsent(AbstractDelegatingCache.java:118)
at org.infinispan.cache.impl.EncoderCache.putIfAbsent(EncoderCache.java:207)
at org.keycloak.cluster.infinispan.InfinispanClusterProviderFactory.lambda$putIfAbsentWithRetries$1(InfinispanClusterProviderFactory.java:149)
at org.keycloak.common.util.Retry.executeWithBackoff(Retry.java:102)
at org.keycloak.common.util.Retry.executeWithBackoff(Retry.java:89)
at org.keycloak.common.util.Retry.executeWithBackoff(Retry.java:80)
at org.keycloak.cluster.infinispan.InfinispanClusterProviderFactory.putIfAbsentWithRetries(InfinispanClusterProviderFactory.java:143)
at org.keycloak.cluster.infinispan.InfinispanClusterProvider.tryLock(InfinispanClusterProvider.java:144)
at org.keycloak.cluster.infinispan.InfinispanClusterProvider.executeIfNotExecuted(InfinispanClusterProvider.java:74)
at org.keycloak.services.scheduled.ClusterAwareScheduledTaskRunner.runTask(ClusterAwareScheduledTaskRunner.java:52)
at org.keycloak.services.scheduled.ScheduledTaskRunner.lambda$run$0(ScheduledTaskRunner.java:59)
at org.keycloak.models.utils.KeycloakModelUtils.lambda$runJobInTransaction$1(KeycloakModelUtils.java:257)
at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransactionWithResult(KeycloakModelUtils.java:379)
at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:256)
at org.keycloak.services.scheduled.ScheduledTaskRunner.run(ScheduledTaskRunner.java:53)
at org.keycloak.timer.basic.BasicTimerProvider$1.run(BasicTimerProvider.java:53)
at java.base/java.util.TimerThread.mainLoop(Timer.java:566)
at java.base/java.util.TimerThread.run(Timer.java:516)
The contents of our ispn-config.xml file:
The list of approaches we already unsuccessfully used:
- Change cache-stack from UDP to TCP (known issue)
- Configure ISPN caches as per instruction (some local, some distributed, work → replicated)
- Adding the JDGROUPS configuration element and using it as container stack
- Applying mode=“SYNC” to all caches, to ‘work’ cache
- Adding ‘socket-binding’ element with ‘MPING’ and DB connection configuration
- Adding <transaction-mode=“NON_XA” and locking=“OPTIMISTIC” />
- Changing caches owners to 2 + combination of mode=“SYNC” on ‘work’ cache and all caches
- Changing caches owners to 1 + combination of mode=“SYNC” on ‘work’ cache and all caches
- Removed ‘work’ cache at all
- Using User Federation with “NO CACHE” configured
- Increasing the “remote timeout” to 60 sec
- Increasing the “cache timeout” to 60 sec
- Applying simple cache=“true” parameter to all local caches (realms, users, authorization, etc.)
- Use latest Keycloak 25.0.6 (with latest infinispan) on both nodes
- Use ‘work’ cache as local
- Decrease ‘locking-timeout’ on both nodes from 60000 to 10000
- Decrease ‘locking-timeout’ on one of nodes from 60000 to 0
- The ‘work’ cache as replicated without any additional parameters
Could someone please point us out what should we check/update/configure to address this issue? The only workaround we currently found is to manually stop all the other Keycloak instances except the one used to run the console and perform actions listed above
