Keycloak ISPN cache failure after realm is updated

We have configured the 24.0.3 keycloak HA cluster with ISPN cache enabled (no external ISPN server, just horizontal model) and only 2 nodes available. When both Keycloak instances are successfully started the server log files on both sides contain the cache initialization information showing both nodes are connected to cluster. The issue appears after performing following reproduction steps:

  1. Open the admin console on one of the nodes
  2. Create new realm - the success message is shown
  3. Create new user federation - the success message is shown
  4. Sync all user of newly created user federation - the success message is shown
  5. Switch to Realm settings tab, change one of properties on General subtab (even Display name will do) and click Save button - the error message is shown stating the Realm cannot be saved

The log file contains the following error message:
2024-09-30 05:46:33,052 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread–p4-t1) ISPN000136: Error executing command PutKeyValueCommand on Cache ‘work’, writing keys [task::ClearExpiredEvents]: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 1086 from PCADAPP02-43383 after 15 seconds
at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:179)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:88)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
2024-09-30 05:46:33,052 ERROR [org.keycloak.services.scheduled.ScheduledTaskRunner] (Timer-0) Failed to run scheduled task ClearExpiredEvents: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 1086 from PCADAPP02-43383 after 15 seconds
at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:258)
at org.infinispan.cache.impl.InvocationHelper.doInvoke(InvocationHelper.java:323)
at org.infinispan.cache.impl.InvocationHelper.invoke(InvocationHelper.java:111)
at org.infinispan.cache.impl.InvocationHelper.invoke(InvocationHelper.java:93)
at org.infinispan.cache.impl.CacheImpl.putIfAbsent(CacheImpl.java:1334)
at org.infinispan.cache.impl.CacheImpl.putIfAbsent(CacheImpl.java:1328)
at org.infinispan.cache.impl.CacheImpl.putIfAbsent(CacheImpl.java:1324)
at org.infinispan.cache.impl.CacheImpl.putIfAbsent(CacheImpl.java:236)
at org.infinispan.cache.impl.AbstractDelegatingCache.putIfAbsent(AbstractDelegatingCache.java:118)
at org.infinispan.cache.impl.AbstractDelegatingCache.putIfAbsent(AbstractDelegatingCache.java:118)
at org.infinispan.cache.impl.EncoderCache.putIfAbsent(EncoderCache.java:207)
at org.keycloak.cluster.infinispan.InfinispanClusterProviderFactory.lambda$putIfAbsentWithRetries$1(InfinispanClusterProviderFactory.java:149)
at org.keycloak.common.util.Retry.executeWithBackoff(Retry.java:102)
at org.keycloak.common.util.Retry.executeWithBackoff(Retry.java:89)
at org.keycloak.common.util.Retry.executeWithBackoff(Retry.java:80)
at org.keycloak.cluster.infinispan.InfinispanClusterProviderFactory.putIfAbsentWithRetries(InfinispanClusterProviderFactory.java:143)
at org.keycloak.cluster.infinispan.InfinispanClusterProvider.tryLock(InfinispanClusterProvider.java:144)
at org.keycloak.cluster.infinispan.InfinispanClusterProvider.executeIfNotExecuted(InfinispanClusterProvider.java:74)
at org.keycloak.services.scheduled.ClusterAwareScheduledTaskRunner.runTask(ClusterAwareScheduledTaskRunner.java:52)
at org.keycloak.services.scheduled.ScheduledTaskRunner.lambda$run$0(ScheduledTaskRunner.java:59)
at org.keycloak.models.utils.KeycloakModelUtils.lambda$runJobInTransaction$1(KeycloakModelUtils.java:257)
at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransactionWithResult(KeycloakModelUtils.java:379)
at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:256)
at org.keycloak.services.scheduled.ScheduledTaskRunner.run(ScheduledTaskRunner.java:53)
at org.keycloak.timer.basic.BasicTimerProvider$1.run(BasicTimerProvider.java:53)
at java.base/java.util.TimerThread.mainLoop(Timer.java:566)
at java.base/java.util.TimerThread.run(Timer.java:516)

The contents of our ispn-config.xml file:

The list of approaches we already unsuccessfully used:

  1. Change cache-stack from UDP to TCP (known issue)
  2. Configure ISPN caches as per instruction (some local, some distributed, work → replicated)
  3. Adding the JDGROUPS configuration element and using it as container stack
  4. Applying mode=“SYNC” to all caches, to ‘work’ cache
  5. Adding ‘socket-binding’ element with ‘MPING’ and DB connection configuration
  6. Adding <transaction-mode=“NON_XA” and locking=“OPTIMISTIC” />
  7. Changing caches owners to 2 + combination of mode=“SYNC” on ‘work’ cache and all caches
  8. Changing caches owners to 1 + combination of mode=“SYNC” on ‘work’ cache and all caches
  9. Removed ‘work’ cache at all
  10. Using User Federation with “NO CACHE” configured
  11. Increasing the “remote timeout” to 60 sec
  12. Increasing the “cache timeout” to 60 sec
  13. Applying simple cache=“true” parameter to all local caches (realms, users, authorization, etc.)
  14. Use latest Keycloak 25.0.6 (with latest infinispan) on both nodes
  15. Use ‘work’ cache as local
  16. Decrease ‘locking-timeout’ on both nodes from 60000 to 10000
  17. Decrease ‘locking-timeout’ on one of nodes from 60000 to 0
  18. The ‘work’ cache as replicated without any additional parameters

Could someone please point us out what should we check/update/configure to address this issue? The only workaround we currently found is to manually stop all the other Keycloak instances except the one used to run the console and perform actions listed above