當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

用java程序将GBK字符转成UTF-8编码格式(转)

發布時間：2025/3/21 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了用java程序将GBK字符转成UTF-8编码格式(转) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

UTF-8 編碼是一種被廣泛應用的編碼，這種編碼致力于把全球的語言納入一個統一的編碼，
目前已經將幾種亞洲語言納入。UTF 代表 UCS Transformation Format.
UTF-8 采用變長度字節來表示字符，理論上最多可以到 6 個字節長度。
UTF-8 編碼兼容了 ASC II(0-127)，也就是說 UTF-8 對于 ASC II 字符的編碼是和 ASC II 一樣的。
對于超過一個字節長度的字符，才用以下編碼規范：
左邊第一個字節1的個數表示這個字符編碼字節的位數，
例如
單字節編碼樣式為：0xxxxxxx
兩位字節字符編碼樣式為：110xxxxx 10xxxxxx；
三位字節字符的編碼樣式為：1110xxxx 10xxxxxx 10xxxxxx.；
以此類推，六位字節字符的編碼樣式為：1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx。
?
xxx 的值由字符編碼的二進制表示的位填入。只用最短的那個足夠表達一個字符編碼的多字節串。
實際表示ASCII字符的UNICODE字符，將會編碼成1個字節，并且UTF-8表示與ASCII字符表示是一樣的。所有其他的UNCODE字符轉化成UTF-8將需要至少2個字節。每個字節由一個換碼序列開始。第一個字節由唯一的換碼序列，由n位1加一位0組成。n位1表示字符編碼所需的字節數。
UTF-8每個編碼字符都不可能以“10”開頭，“10”是以連接符的形式出現在后面的編碼字節開頭。因此UTF-8編碼在存儲和傳輸時是不容易出錯的。
例如：
Unicode 字符： 00 A9（版權符號） = 1010 1001，
UTF-8 編碼為：11000010 10101001 = 0x C2 0xA9;
字符 22 60 (不等于符號) = 0010 0010 0110 0000，
UTF-8 編碼為：11100010 10001001 10100000 = 0xE2 0x89 0xA0
?
package com.lang.string;

public class ConverFromGBKToUTF8 {
?public static void main(String[] args){
?
? try {
??
???????? ConverFromGBKToUTF8 convert = new ConverFromGBKToUTF8();
???????? byte [] fullByte = convert.gbk2utf8(chenese);
???????? String fullStr = new String(fullByte, "UTF-8");
???????? System.out.println("string from GBK to UTF-8 byte:? " + fullStr);

???? } catch (Exception e) {
????? e.printStackTrace();
???? }
?}
?
?public byte[] gbk2utf8(String chenese){
? char c[] = chenese.toCharArray();
??????? byte [] fullByte =new byte[3*c.length];
??????? for(int i=0; i<c.length; i++){
???????? int m = (int)c[i];
???????? String word = Integer.toBinaryString(m);
//???????? System.out.println(word);
????????
???????? StringBuffer sb = new StringBuffer();
???????? int len = 16 - word.length();
???????? //補零
???????? for(int j=0; j<len; j++){
????????? sb.append("0");
???????? }
???????? sb.append(word);
???????? sb.insert(0, "1110");
???????? sb.insert(8, "10");
???????? sb.insert(16, "10");
????????
//???????? System.out.println(sb.toString());
????????
???????? String s1 = sb.substring(0, 8);?????????
???????? String s2 = sb.substring(8, 16);?????????
???????? String s3 = sb.substring(16);
????????
???????? byte b0 = Integer.valueOf(s1, 2).byteValue();
???????? byte b1 = Integer.valueOf(s2, 2).byteValue();
???????? byte b2 = Integer.valueOf(s3, 2).byteValue();
???????? byte[] bf = new byte[3];
???????? bf[0] = b0;
???????? fullByte[i*3] = bf[0];
???????? bf[1] = b1;
???????? fullByte[i*3+1] = bf[1];
???????? bf[2] = b2;
???????? fullByte[i*3+2] = bf[2];
????????
??????? }
??????? return fullByte;
?}
}

UTF-8的編碼原理和特性：

U+0000~U+007E 1 _ _ _ _ _ _ _ (7bits)

U+0080~U+07FF 1 1 0_ _ _ _ _ 1 0_ _ _ _ _ _ (11bits)

U+0800~U+FFFF 1 1 1 0 _ _ _ _ 1 0 _ _ _ _ _ _ 1 0 _ _ _ _ _ _ (16bits)

轉載于:https://www.cnblogs.com/Fskjb/archive/2009/08/21/1551732.html

總結

以上是生活随笔為你收集整理的用java程序将GBK字符转成UTF-8编码格式(转)的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。