MySQL Character Set
Summary: in this tutorial, you will learn about the MySQL character set. After the tutorial, you will know how to get all character sets in MySQL, how to convert strings between character sets, and how to configure proper character sets for client connections.
Introduction to MySQL character set
A MySQL character set is a set of characters that are legal in a string. For example, we have an alphabet with letters from a
to z. We assign each letter a number, for example, a = 1
, b = 2
etc. The letter a
is a symbol, and the number 1
that associates with the letter is the encoding. The combination of all letters from a to z and their corresponding encodings is a character set.
Each character set has one or more collations that define a set of rules for comparing characters within the character set. Check it out the MySQL collation tutorial to learn about the collations in MySQL.
MySQL supports various character sets that allow you to store almost every character in a string. To get all available character sets in the MySQL database server, you use the SHOW CHARACTER SET
a statement as follows:
SHOW CHARACTER SET;
The default character set in MySQL is latin1
. If you want to store characters from multiple languages in a single column, you can use Unicode character sets, which is utf8
or ucs2
.
The values in the Maxlen
column specifies the number of bytes that a character in a character set holds. Some character sets contain single-byte characters e.g., latin1
, latin2
, cp850
, etc., whereas other character sets contain multi-byte characters.
MySQL provides the LENGTH
function to get a length of a string in bytes, and the CHAR_LENGTH
function to get the length of a string in characters. If a string contains the multi-bytes character, the result of the LENGTH
function is greater than the result of the CHAR_LENGTH()
function. See the following example:
SET @str = CONVERT('MySQL Character Set' USING ucs2);
SELECT LENGTH(@str), CHAR_LENGTH(@str);
The CONVERT
the function converts a string into a specific character set. In this example, it converts the character set of the MySQL Character Set
string into ucs2
. Because ucs2
the character set contains 2-byte characters, therefore the length of the @str
string in bytes is greater than its length in characters.
Notice that some character sets contain multi-byte characters, but their strings may contain only single-byte characters e.g., utf8
as shown in the following statements:
SET @str = CONVERT('MySQL Character Set' USING utf8);
SELECT LENGTH(@str), CHAR_LENGTH(@str);
However, if a utf8
string contains special character e.g., ü
in the pingüino
string; its length in bytes is different, see the following example:
SET @str = CONVERT('pingüino' USING utf8);
SELECT LENGTH(@str), CHAR_LENGTH(@str);
Converting between different character sets
MySQL provides two functions that allow you to convert strings between different character sets: CONVERT
and CAST
. We have used the CONVERT
function several times in the above examples.
The syntax of the CONVERT
the function is as follows:
CONVERT(expression USING character_set_name)
The CAST
the function is similar to the CONVERT
function. It converts a string to a different character set:
CAST(string AS character_type CHARACTER SET character_set_name)
Take a look at the following example of using the CAST
function:
SELECT CAST(_latin1'MySQL character set' AS CHAR CHARACTER SET utf8);
Setting character sets for client connections
When an application exchanges data with a MySQL database server, the default character set is latin1
. However, if the database stores Unicode strings in the utf8
character set, using the latin1
character set in the application would not be sufficient. Therefore, the application needs to specify a proper character set when it connects to the MySQL database server.
To configure a character set for a client connection, you can do one of the following ways:
- Issue the
SET NAME
statement after the client is connected to the MySQL database server. For example, to set a Unicode character setutf8
, you use the following statement:
SET NAMES 'utf8';
- If the application supports the
--default-character-set
option, you can use it to set the character set. For example, the MySQL client tool supports--default-character-set
and you can set it up in the configuration file as follows:
[mysql]
default-character-set=utf8
- Some MySQL connectors allow you to set the character set, for example, if you use PHP PDO, you can set the character set in the data source name as follows:
$dsn ="mysql:host=$host;dbname=$db;charset=utf8";
Regardless of which way you use, make sure that the character set used by the application matches the character set stored in the MySQL database server.
In this tutorial, you have learned about MySQL character set, how to convert strings between character sets, and how to configure proper character sets for client connections.