mysql 4字节utf8_MySQL 4字节utf8字符更新失败一例
MySQL 4字節(jié)utf8字符更新失敗一例
業(yè)務(wù)的小伙伴反映了下面的問(wèn)題
問(wèn)題
有一個(gè)4字節(jié)的utf8字符'????'插入到MySQL數(shù)據(jù)庫(kù)中時(shí)報(bào)錯(cuò)
java.sql.SQLException: Incorrect string value: '\xF0\xA0\x99\xB6' for column 'c_utf8mb4' at row 1
數(shù)據(jù)庫(kù)中存放該字符的列已經(jīng)定義為utf8mb4編碼了,但相關(guān)的參數(shù)character_set_server的值為utf8。 比較奇怪的是使用mysql-connector-java-5.1.15.jar驅(qū)動(dòng)時(shí)沒(méi)有問(wèn)題,使用更高版本的驅(qū)動(dòng)如mysql-connector-java-5.1.22.jar,就會(huì)出錯(cuò)。JDBC的下面2個(gè)連接參數(shù),不過(guò)設(shè)置與否,都沒(méi)有影響。
characterEncoding=utf8
useUnicode=true
原因
jdbc驅(qū)動(dòng)未正確設(shè)置SET NAMES utf8mb4導(dǎo)致轉(zhuǎn)碼錯(cuò)誤。
根據(jù)MySQL官方手冊(cè),在MySQL Jdbc中正確使用4字節(jié)UTF8字符的方法如下:
http://dev.mysql.com/doc/relnotes/connector-j/5.1/en/news-5-1-14.html:
Connector/J mapped both 3-byte and 4-byte UTF8 encodings to the same Java UTF8 encoding.
To use 3-byte UTF8 with Connector/J set characterEncoding=utf8 and set useUnicode=true in the connection string.
To use 4-byte UTF8 with Connector/J configure the MySQL server with character_set_server=utf8mb4. Connector/J will then use that setting as long as characterEncoding has not been set in the connection string. This is equivalent to autodetection of the character set. (Bug #58232)
按照MySQL官方手冊(cè)提供的方法,MySQL JDBC驅(qū)動(dòng)內(nèi)部會(huì)在建立連接時(shí)發(fā)送SET NAMES utf8mb4給服務(wù)端,確保正確進(jìn)行字符編碼。 所以,本問(wèn)題屬于應(yīng)用未按要求使用MySQL JDBC。但5.1.15可以插入4字節(jié)字符也是比較奇怪的事情。 mysql-connector官網(wǎng)的 change log中并且提交5.1.15~5.1.22之間有相關(guān)的改動(dòng)。但是,通過(guò)比較代碼發(fā)現(xiàn),這部分邏輯確實(shí)發(fā)生了變更。
5.1.15
com\mysql\jdbc\ConnectionImpl.java:
private boolean configureClientCharacterSet(boolean dontCheckServerMatch)
throws SQLException
{
...
if(getEncoding() != null)
{
String mysqlEncodingName = CharsetMapping.getMysqlEncodingForJavaEncoding(getEncoding().toUpperCase(Locale.ENGLISH), this);
if(getUseOldUTF8Behavior())
mysqlEncodingName = "latin1";
if(dontCheckServerMatch || !characterSetNamesMatches(mysqlEncodingName))
execSQL(null, (new StringBuilder()).append("SET NAMES ").append(mysqlEncodingName).toString(), -1, null, 1003, 1007, false, database, null, false);
realJavaEncoding = getEncoding();
}
...
}
給CharsetMapping.getMysqlEncodingForJavaEncoding()傳入的參數(shù)是UTF-8,對(duì)應(yīng)的mysql的編碼有2個(gè),utf8和utf8mb4, 其中utf8mb4優(yōu)先,所以這個(gè)函數(shù)返回的mysql編碼是utf8mb4。即之后執(zhí)行了SET NAMES utf8mb4
相關(guān)代碼:
com\mysql\jdbc\CharsetMapping.java:
public static final String getMysqlEncodingForJavaEncoding(String javaEncodingUC, Connection conn)
throws SQLException
{
List mysqlEncodings = (List)JAVA_UC_TO_MYSQL_CHARSET_MAP.get(javaEncodingUC);
if(mysqlEncodings != null)
{
Iterator iter = mysqlEncodings.iterator();
VersionedStringProperty versionedProp = null;
do
{
if(!iter.hasNext())
break;
VersionedStringProperty propToCheck = (VersionedStringProperty)iter.next();
if(conn == null)
return propToCheck.toString();
if(versionedProp != null && !versionedProp.preferredValue && versionedProp.majorVersion == propToCheck.majorVersion && versionedProp.minorVersion == propToCheck.minorVersion && versionedProp.subminorVersion == propToCheck.subminorVersion)
return versionedProp.toString();
if(!propToCheck.isOkayForVersion(conn))
break;
if(propToCheck.preferredValue)
return propToCheck.toString();
versionedProp = propToCheck;
} while(true);
if(versionedProp != null)
return versionedProp.toString();
}
return null;
}
...
CHARSET_CONFIG.setProperty("javaToMysqlMappings", "US-ASCII =\t\t\tusa7,US-ASCII =\t\t\t>4.1.0 ascii,...
UTF-8 = \t\tutf8,UTF-8 =\t\t\t\t*> 5.5.2 utf8mb4,...");
注:上面的定義UTF-8 =\t\t\t\t*> 5.5.2 utf8mb4中的*代表有多個(gè)mysql編碼對(duì)應(yīng)于同一個(gè)Java編碼時(shí),該編碼優(yōu)先
5.1.22
com\mysql\jdbc\ConnectionImpl.java:
private boolean configureClientCharacterSet(boolean dontCheckServerMatch)
throws SQLException
{
...
if(getEncoding() != null)
{
String mysqlEncodingName = getServerCharacterEncoding();
if(getUseOldUTF8Behavior())
mysqlEncodingName = "latin1";
boolean ucs2 = false;
if("ucs2".equalsIgnoreCase(mysqlEncodingName) || "utf16".equalsIgnoreCase(mysqlEncodingName) || "utf32".equalsIgnoreCase(mysqlEncodingName))
{
mysqlEncodingName = "utf8";
ucs2 = true;
if(getCharacterSetResults() == null)
setCharacterSetResults("UTF-8");
}
if(dontCheckServerMatch || !characterSetNamesMatches(mysqlEncodingName) || ucs2)
execSQL(null, (new StringBuilder()).append("SET NAMES ").append(mysqlEncodingName).toString(), -1, null, 1003, 1007, false, database, null, false);
realJavaEncoding = getEncoding();
}
...
}
...
public String getServerCharacterEncoding()
{
if(io.versionMeetsMinimum(4, 1, 0))
{
String charset = (String)indexToCustomMysqlCharset.get(Integer.valueOf(io.serverCharsetIndex));
if(charset == null)
charset = (String)CharsetMapping.STATIC_INDEX_TO_MYSQL_CHARSET_MAP.get(Integer.valueOf(io.serverCharsetIndex));
return charset == null ? (String)serverVariables.get("character_set_server") : charset;
} else
{
return (String)serverVariables.get("character_set");
}
}
解決辦法
一直使用舊版的5.1.15驅(qū)動(dòng)不是一個(gè)好辦法,因此在使用新版驅(qū)動(dòng)時(shí),采取以下措施之一解決這個(gè)問(wèn)題。
參考官網(wǎng)的說(shuō)明,修改my.cnf
character_set_server=utf8mb4
在應(yīng)用中獲取連接后執(zhí)行下面的SQL
stmt.executeUpdate("set names utf8mb4")
補(bǔ)充
根據(jù)5.1.22的MySQL JDBC驅(qū)動(dòng)代碼,MySQL JDBC支持utf8mb4需要滿足以下2個(gè)條件
1.?MySQL系統(tǒng)變量`character_set_server`的值為utf8mb4
2.?MySQL JDBC連接參數(shù)characterEncoding的值為以下值之一
-?null
-?UTF8
-?UTF-8
總結(jié)
以上是生活随笔為你收集整理的mysql 4字节utf8_MySQL 4字节utf8字符更新失败一例的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 地下城与勇士dnf剑豪用暴击伤害还是附加
- 下一篇: 检查不孕不育去陇南哪家医院好