2014-12-18

UTF8MB4 character set in Amazon RDS MySQL server and SQLAlchemy client

I've recently had to move a massive dataset that includes UTF-8 strings which contains extended set code points (i.e. planes other than the Basic Multilingual Plane, including Emoji) into a MySQL database hosted using Amazon's RDS.

Even though all of the databases and tables were configured to use purely utf8 character set and the unicode_ci collation, and though SQLAlchemy was also configured to use UTF8, I quickly ran into issues:

Incorrect string value: '\xF0\x9D\x8C\x86' for column 'column_name' at row 1 

The solution was:
  1. Read Mathias Bynens' awesome tutorial: The things we do to store U+01F4A9 PILE OF POO () correctly 
  2. After you created a backup of your current database, change the MySQL server settings via the Parameter Groups section of the RDS console:
    • Click "Create DB Parameter Group" 
    • Choose the correct Group Family (probably mysql5.6)
    • Input a group name and description (probably "mysql5.6-utf8mb4" and "MySQL 5.6 using UTF8MB4 charset by default"
    • Select this new Parameter Group in the console and click "Edit Parameters". Set the following parameter values:
      character_set_client: utf8mb4
      character_set_database: utf8mb4
      character_set_results: utf8mb4
      character_set_connection: utf8mb4
      collation_connection: utf8mb4_unicode_ci
      collation_server: utf8mb4_unicode_ci
      character_set_server: utf8mb4
      and click "Save Changes".
    • Go to the "Instances" dashboard, right click your RDS instance and "Modify" it, change the "Parameter Group" option to your newly created Parameter Group and click "Continue", "Modify DB Instance".
    • You can "Reboot" the instance if you want to be extra sure the new configuration was loaded.
  3. (Optional) Change the MySQL client settings. For the CLI mysql client, edit /etc/mysql/my.cnf and under [client] add:
    [client]
    default-character-set = utf8mb4
    This is to allow proper viewing of data using the mysql tool.
  4. Modify your existing databases, tables and columns to use UTF8MB4, as explained in the tutorial in part 1.
  5. Modify your SQLAlchemy connection string from:
    mysql+mysqldb://user:password@host:port/dbname
    to:
    mysql+mysqldb://user:password@host:port/dbname?charset=utf8mb4
    (whether or not to add use_unicode=0 is left for the programmer's discretion.)
Enjoy.

12 comments:

  1. Your website is really cool and this is a great inspiring article.
    Amazon

    ReplyDelete
  2. Hi Alon,

    Can you please publish the link for part 1 of this blog ?

    ReplyDelete
  3. Tried this, somhow doesn't seem to change the character_set_client, character_set_connection, character_set_results values. The other variables got changed, just not these. Any idea why?

    ReplyDelete
  4. Congratulations on your article, it was very helpful and successful. b77e02dc4c0ae23603f8fe42ed28e7c4
    website kurma
    website kurma
    numara onay

    ReplyDelete
  5. Thank you for your explanation, very good content. b17be783c868ebad62d9ee2931b23a2f
    define dedektörü

    ReplyDelete

  6. Thanks for this for this article
    pleas share more articles about this
    my website talking about home improvement you can visite it from this url
    صيانة افران بمكة
    شركة صيانة افران بمكة

    ReplyDelete