MySQL-UTF8 Special Chaos — Jun 25, 2013, 1:32 pm
This problem is really a pain in the backside if you forget about it, especially with a fully grown database.Even if you set all your website's charsets to UTF-8 (file, header, html, tags) and selected the right collations (i.e. utf8_general_ci) and charset in your database, it still happens. And all because of MySQL's strange behaviour. It always expects it's received data to be latin1-charset.
If you know that, it's easy to prevent this behavior by simply putting the SET NAMES-command right behind your database connection. For PHP that would be:
<?php $link = mysql_connect('localhost', 'mysql_user', 'mysql_password'); $db_selected = mysql_select_db('foo', $link); mysql_query("SET NAMES 'utf8'"); // this is important, no INSERTs, UPDATES, etc before it ...
It's embarrassing, but it took me a while to get behind it and finally I had my PhpMyAdmin print out all the special characters correctly.
But again, I forgot and had to do some extra work, because what's expected to be utf8 is latin1 and you can't just switch collations. Nevertheless it's quite simple to clear the mess:
- First of all backup the affected table(s) to be on the safe side!
- Since MySQL expects the entered data to be latin1, set the collation for all the messy fields to i.e. latin1_general_ci
- After saving, set the column type to BLOB. It should be a big enough binary storage form for our transfer. With this transformation the collation gets irrelevant and automatically cleared.
- Last step: Set back both, the column type to what you had before (luckily you made your backup table to sneak a peek), and the collation (i.e. to utf8_general_ci).