转结构体_golang处理gb2312转utf8编码的问题
問題描述:
如果你有把曾經(jīng)的php或者java的老代碼用go重寫的經(jīng)驗,很可能會遇到gb2312轉(zhuǎn)utf-8的問題
最近有同學(xué)在工作有使用到iconv-go這個庫,涉及到轉(zhuǎn)換字符的,出現(xiàn)如下報錯,然后再咨詢我,然后我自己也學(xué)習(xí)了一下。
報錯信息如下:
invalid or incomplete multibyte or wide character
用到的golang轉(zhuǎn)化庫為:
github.com/djimenez/iconv-go
使用的函數(shù)為:
body, err = iconv.ConvertString(body, "GBK", "utf-8")解決思路:
進去github.com/djimenez/iconv-go點擊源碼查看
首先iconv.ConvertString的實現(xiàn)是在iconv.go中
func ConvertString(input string, fromEncoding string, toEncoding string) (output string, err error) { // create a temporary converter converter, err := NewConverter(fromEncoding, toEncoding) if err == nil { // convert the string output, err = converter.ConvertString(input) // close the converter converter.Close() } return}通過以上發(fā)現(xiàn), 它調(diào)用了
NewConverter(fromEncoding, toEncoding)新建了一個結(jié)構(gòu)體Converter,調(diào)用下面結(jié)構(gòu)體的實現(xiàn)的
output, err = converter.ConvertString(input)繼續(xù)跟蹤這個結(jié)構(gòu)方法,在converter.go內(nèi)找到實現(xiàn)
type Converter struct { context C.iconv_t open bool}// Initialize a new Converter. If fromEncoding or toEncoding are not supported by// iconv then an EINVAL error will be returned. An ENOMEM error maybe returned if// there is not enough memory to initialize an iconv descriptorfunc NewConverter(fromEncoding string, toEncoding string) (converter *Converter, err error) { converter = new(Converter) // convert to C strings toEncodingC := C.CString(toEncoding) fromEncodingC := C.CString(fromEncoding) // open an iconv descriptor converter.context, err = C.iconv_open(toEncodingC, fromEncodingC) // free the C Strings C.free(unsafe.Pointer(toEncodingC)) C.free(unsafe.Pointer(fromEncodingC)) // check err if err == nil { // no error, mark the context as open converter.open = true } return}可以看出,它底層調(diào)用的是CGO庫轉(zhuǎn)化實現(xiàn)
converter.context, err = C.iconv_open(toEncodingC, fromEncodingC)通過查詢C庫的文檔man iconv_open,DESCRIPTION部分有如下介紹
The empty encoding name "" is equivalent to "char": it denotes the locale dependent character encoding.When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the targetcharacter set, it can be approximated through one or several similarly looking characters.When the string "//IGNORE" is appended to tocode, characters that cannot be represented in the target character set will be silently discarded.The resulting conversion descriptor can be used with iconv any number of times. It remains valid until deallocated using iconv_close.A conversion descriptor contains a conversion state. After creation using iconv_open, the state is in the initial state. Using iconv modifies the descrip-tor's conversion state. (This implies that a conversion descriptor can not be used in multiple threads simultaneously.) To bring the state back to the ini-tial state, use iconv with NULL as inbuf argument.重點是這句話
When the string "//IGNORE" is appended to tocode, characters that cannot be represented in the target character set will be silently discarded.
大致意思是說,在"tocode"之后加"//IGNORE",那些不能被tocode顯示的字符將會自動被忽略,oh good,正好是我想要的.
由這些層層調(diào)用關(guān)系
ConvertString(input string, fromEncoding string, toEncoding string)NewConverter(fromEncoding string, toEncoding string) (converter *Converter, err error)C.iconv_open(toEncodingC, fromEncodingC)我們只需將//IGNORE傳遞到c庫既可支持
所以代碼改為:
body, err = iconv.ConvertString(body, "GBK", "utf-8//IGNORE")經(jīng)測試,沒有報err,大功告成.
重述一下解決方案:
body, err = iconv.ConvertString(body, "GBK", "utf-8//IGNORE")推薦閱讀
Java 微服務(wù)能像 Go 一樣快嗎?
總結(jié)
以上是生活随笔為你收集整理的转结构体_golang处理gb2312转utf8编码的问题的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 高邮机器人_仲尼:省机器人项目荣获一等奖
- 下一篇: 大厂程序员年薪_程序员羡慕深圳老师的待遇