Keycloak 26.6.0 - moving from utf8mb3 to utf8mb4 on MariaDB

Hi folks,

first-timer here :waving_hand: (that’s why I have to put some links as code block, as I cannot have more than two links in my first post).

In the “notable changes” section of the upgrade guide for Keycloak 26.6.0 [3] there is this paragraph:

For MySQL and MariaDB, the documentation now recommends using utf8mb4 as the character set. The previously documented utf8mb3 (or utf8) character set has been removed from the documentation due to its limitations in storing certain Unicode characters.

So naturally, one would go about converting one’s current data from utf8mb3 to utf8mb4 and changing the default character set for the database. Not converting the database and simply changing the default database/table character set to utf8mb4 does not seem like a good idea, as it would lead to a situation where you would possibly have a mixture of both in your database at some point.

After a little diversion [1] I now have a converted database that should work. Yet I’m still wondering how liquibase would handle the automatic conversion of TEXT columns to MEDIUMTEXT columns, which automatically happens [2] in MariaDB to preserve the effective column length. Wouldn’t this go against what “should be” the database state, or is the question of whether it is TEXT or MEDIUMTEXT abstracted away entirely and just describes something like “needs to be a text column” but would not, in fact, care about the implementation in your specific DBMS?

Also, maybe as a little side question: is there some place in the documentation or code where I could have a look into what a Keycloak database in a specific DBMS (MariaDB in this case) would look like at which release?

Just for better illustration, here’s an example of a diff of a dump after the character set conversion

CREATE TABLE `BROKER_LINK` (                                    CREATE TABLE `BROKER_LINK` (
  `IDENTITY_PROVIDER` varchar(255) NOT NULL,                      `IDENTITY_PROVIDER` varchar(255) NOT NULL,
  `STORAGE_PROVIDER_ID` varchar(255) DEFAULT NULL,                `STORAGE_PROVIDER_ID` varchar(255) DEFAULT NULL,
  `REALM_ID` varchar(36) NOT NULL,                                `REALM_ID` varchar(36) NOT NULL,
  `BROKER_USER_ID` varchar(255) DEFAULT NULL,                     `BROKER_USER_ID` varchar(255) DEFAULT NULL,
  `BROKER_USERNAME` varchar(255) DEFAULT NULL,                    `BROKER_USERNAME` varchar(255) DEFAULT NULL,
  `TOKEN` text DEFAULT NULL,                                  |   `TOKEN` mediumtext DEFAULT NULL,
  `USER_ID` varchar(255) NOT NULL,                                `USER_ID` varchar(255) NOT NULL,
  PRIMARY KEY (`IDENTITY_PROVIDER`,`USER_ID`),                    PRIMARY KEY (`IDENTITY_PROVIDER`,`USER_ID`),
  KEY `IDX_BROKER_LINK_USER_ID` (`USER_ID`),                      KEY `IDX_BROKER_LINK_USER_ID` (`USER_ID`),
  KEY `IDX_BROKER_LINK_IDENTITY_PROVIDER` (`REALM_ID`,`IDENTI     KEY `IDX_BROKER_LINK_IDENTITY_PROVIDER` (`REALM_ID`,`IDENTI
);                                                              );

Also, if I just create a new database on 26.6.0 (with the database default character set set to utf8mb4 before liquibase initiates the database, the column (BROKER_LINK.TOKEN) gets created as TEXT data type.

If you think that I have an entirely wrong understanding somewhere, please don’t hesitate to point it out :slight_smile:

Cheers & Thanks already for any insight you might be able to provide.
Thomas

[1] contrary, it seems, to historically just needing to disable foreign key checks, it requires to drop all foreign key constraints before converting data and re-add them after that.

[2]

For VARCHAR or TEXT columns, CONVERT TO CHARACTER SET changes the data type if needed to ensure the new column is long enough to store as many characters as the original column.

For example, an ascii TEXT column requires a single byte per character, so the column can hold up to 65,535 characters. If the column is converted to utf8mb4, 4 bytes can be required for each character, so the column will be converted to MEDIUMTEXT to be able to hold the same number of characters.

[3] https://www.keycloak.org/docs/latest/upgrading/#database-character-encoding-checks-during-startup