當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

java字符编码方式总结

發布時間：2023/12/2 编程问答 24 豆豆

生活随笔收集整理的這篇文章主要介紹了 java字符编码方式总结小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

java字符編碼方式總結一、概要在JAVA應用程序特別是基于WEB的程序中，經常遇到字符的編碼問題。為了防止出現亂碼，首先需要了解JAVA是如何處理字符的，這樣就可以有目的地在輸入/輸出環節中增加必要的轉碼。其次，由于各種服務器有不同的處理方式，還需要多做試驗，確保使用中不出現亂碼。二、基本概念 2．1 JAVA中字符的表達JAVA中有char、byte、String這幾個概念。char 指的是一個UNICODE字符，為16位的整數。byte 是字節，字符串在網絡傳輸或存儲前需要轉換為byte數組。在從網絡接收或從存儲設備讀取后需要將byte數組轉換成String。String是字符串，可以看成是由char組成的數組。String 和 char 為內存形式，byte是網絡傳輸或存儲的序列化形式。舉例：英 String ying = “英”; char ying = ying.charAt(0); String yingHex = Integer.toHexString(ying); 82 F1 byte yingGBBytes = ying.getBytes(“GBK”); GB編碼的字節數值 D3 A2 2．2 編碼方式的簡介String序列化成byte數組或反序列化時需要選擇正確的編碼方式。如果編碼方式不正確，就會得到一些0x3F的值。常用的字符編碼方式有ISO8859_1、GB2312、GBK、UTF-8/UTF-16/UTF-32。 ISO8859_1用來編碼拉丁文，它由單字節（0－255）組成。GB2312、GBK用來編碼簡體中文，它有單字節和雙字節混合組成。最高位為1的字節和下一個字節構成一個漢字，最高位為0的字節是ASCII碼。UTF-8/UTF-16/UTF-32是國際標準UNICODE的編碼方式。用得最多的是UTF-8，主要是因為它在對拉丁文編碼時節約空間。 UNICODE值 UTF-8編碼 U-00000000 - U-0000007F: 0xxxxxxx U-00000080 - U-000007FF: 110xxxxx 10xxxxxx U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx U-00010000 - U-001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx U-00200000 - U-03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx U-04000000 - U-7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 三、J2SE中相關的函數 String str =”英”; //取得GB2312編碼的字節 byte[] bytesGB2312 = str.getBytes(“GB2312”); //取得平臺缺省編碼的字節(solaris為ISO8859_1,windows為GB2312) byte[] bytesDefault = str.getBytes(); //用指定的編碼將字節轉換成字符串 String newStrGB = new String(bytesGB2312, “GB2312”);//用平臺缺省的編碼將字節轉換成字符串(solaris為ISO8859_1,windows為GB2312) String newStrDefault = new String(bytesDefault); //用指定的編碼從字節流里面讀取字符 InputStream in = xxx; InputStreamReader reader = InputStreamReader( in, “GB2312”); char aChar = reader.read(); 四、JSP、數據庫的編碼 4．1 JSP中的編碼 (1) 靜態聲明: CHARSET有兩個作用： JSP文件的編碼方式：在讀取JSP文件、生成JAVA類時，源JSP文件中漢字的編碼 JSP輸出流的編碼方式：在執行JSP時，往response流里面寫入數據的編碼方式 (2) 動態改變:在往response流里面寫數據前可以調用response.setContentType()，設定正確的編碼類型。 (3) 在TOMCAT中，由Request.getParameter() 得到的參數，編碼方式都是ISO8859_1。所以如果在瀏覽器輸入框內輸入一個漢字“英”，在服務器端就得到一個ISO8859_1編碼的（0x00,0xD3,0x00,0xA2）。所以通常在接收參數時轉碼： String wrongStr = response.getParameter(“name”); String correctStr = new String(wrongStr.getBytes(“ISO8859_1”),”GB2312”); 在最新的SERVLET規范里面，也可以在獲取參數之前執行如下代碼： request.setCharacterEncoding(“GB2312”); 4．2 數據庫的編碼 (1) 數據庫使用UTF-16 如果String中是UNICODE字符，寫入讀出時不需要轉碼 (2) 數據庫使用ISO8859_1 如果String中是UNICODE字符，寫入讀出時需要轉碼寫入：String newStr = new String(oldStr.getByte(“GB2312”), “ISO8859_1”); 讀出：String newStr = new String(oldStr.getByte(“ISO8859_1”),”GB2312”); 五、源文件的編碼 5．1 資源文件資源文件的編碼方式和編輯平臺相關。在WINDOWS平臺下編寫的資源文件，以GB2312方式編碼。在編譯時需要轉碼，以確保在各個平臺上的正確性： native2ascii –encoding GB2312 source.properties 這樣從資源文件中讀出的就是正確的UNICODE字符串。 5．2 源文件源文件的編碼方式和編輯平臺相關。在WINDOWS平臺下開發的源文件，以GB2312方式編碼。在編譯的時候，需要指定源文件的編碼方式： javac –encoding GB2312 JAVA編譯后生成的字節文件的編碼為UTF-8。①最新版TOMCAT4.1.18支持request.setCharacterEncoding(String enc) ②資源文件轉碼成company.name=\u82f1\u65af\u514b ③如果數據庫使用utf-16則不需要這部分轉碼 ④頁面上應有轉碼ⅰ: String s = new String (request.getParameter(“name”).getBytes(“ISO8859_1”),”GB2312”); 轉碼ⅱ: String s = new String(name.getBytes(“GB2312”),”ISO8859_1”); 轉碼ⅲ: String s = new String(name.getBytes(“ISO8859_1”),” GB2312”); 一、關鍵技術點： 1、當前流行的字符編碼格式有：US-ASCII、ISO-8859-1、UTF-8、UTF-16BE、UTF-16LE、UTF-16、GBK、GB2312等，其中GBK、GB2312是專門處理中文編碼的。 2、String的getBytes方法用于按指定編碼獲取字符串的字節數組，參數指定了解碼格式，如果沒有指定解碼格式，則按系統默認編碼格式。 3、String的“String(bytes[] bs, String charset)”構造方法用于把字節數組按指定的格式組合成一個字符串對象二、實例演示： package book.String; import java.io.UnsupportedEncodingException; /** *//** * 轉換字符串的編碼 * @author joe * */ public class ChangeCharset ...{ /** *//** 7位ASCII字符，也叫作ISO646-US、Unicode字符集的基本拉丁塊 */ public static final String US_ASCII = "US-ASCII"; /** *//** ISO拉丁字母表 No.1，也叫做ISO-LATIN-1 */ public static final String ISO_8859_1 = "ISO-8859-1"; /** *//** 8 位 UCS 轉換格式 */ public static final String UTF_8 = "UTF-8"; /** *//** 16 位 UCS 轉換格式，Big Endian(最低地址存放高位字節）字節順序 */ public static final String UTF_16BE = "UTF-16BE"; /** *//** 16 位 UCS 轉換格式，Litter Endian（最高地址存放地位字節）字節順序 */ public static final String UTF_16LE = "UTF-16LE"; /** *//** 16 位 UCS 轉換格式，字節順序由可選的字節順序標記來標識 */ public static final String UTF_16 = "UTF-16"; /** *//** 中文超大字符集 **/ public static final String GBK = "GBK";public static final String GB2312 = "GB2312";/** *//** 將字符編碼轉換成US-ASCII碼 */ public String toASCII(String str) throws UnsupportedEncodingException ...{return this.changeCharset(str, US_ASCII); }/** *//** 將字符編碼轉換成ISO-8859-1 */ public String toISO_8859_1(String str) throws UnsupportedEncodingException ...{return this.changeCharset(str, ISO_8859_1); }/** *//** 將字符編碼轉換成UTF-8 */ public String toUTF_8(String str) throws UnsupportedEncodingException ...{return this.changeCharset(str, UTF_8); }/** *//** 將字符編碼轉換成UTF-16BE */ public String toUTF_16BE(String str) throws UnsupportedEncodingException...{return this.changeCharset(str, UTF_16BE); }/** *//** 將字符編碼轉換成UTF-16LE */ public String toUTF_16LE(String str) throws UnsupportedEncodingException ...{return this.changeCharset(str, UTF_16LE); }/** *//** 將字符編碼轉換成UTF-16 */ public String toUTF_16(String str) throws UnsupportedEncodingException ...{return this.changeCharset(str, UTF_16); }/** *//** 將字符編碼轉換成GBK */ public String toGBK(String str) throws UnsupportedEncodingException ...{return this.changeCharset(str, GBK); }/** *//** 將字符編碼轉換成GB2312 */ public String toGB2312(String str) throws UnsupportedEncodingException ...{return this.changeCharset(str,GB2312); }/** *//*** 字符串編碼轉換的實現方法* @param str 待轉換的字符串* @param newCharset 目標編碼*/ public String changeCharset(String str, String newCharset) throws UnsupportedEncodingException ...{if(str != null) ...{//用默認字符編碼解碼字符串。與系統相關，中文windows默認為GB2312byte[] bs = str.getBytes();return new String(bs, newCharset); //用新的字符編碼生成字符串}return null; }/** *//*** 字符串編碼轉換的實現方法* @param str 待轉換的字符串* @param oldCharset 源字符集* @param newCharset 目標字符集*/ public String changeCharset(String str, String oldCharset, String newCharset) throws UnsupportedEncodingException ...{if(str != null) ...{//用源字符編碼解碼字符串byte[] bs = str.getBytes(oldCharset);return new String(bs, newCharset);}return null; }public static void main(String[] args) throws UnsupportedEncodingException ...{ChangeCharset test = new ChangeCharset();String str = "This is a 中文的 String!";System.out.println("str：" + str);String gbk = test.toGBK(str);System.out.println("轉換成GBK碼：" + gbk);System.out.println();String ascii = test.toASCII(str);System.out.println("轉換成US-ASCII：" + ascii);System.out.println();String iso88591 = test.toISO_8859_1(str);System.out.println("轉換成ISO-8859-1碼：" + iso88591);System.out.println();gbk = test.changeCharset(iso88591, ISO_8859_1, GBK);System.out.println("再把ISO-8859-1碼的字符串轉換成GBK碼：" + gbk);System.out.println();String utf8 = test.toUTF_8(str);System.out.println();System.out.println("轉換成UTF-8碼：" + utf8);String utf16be = test.toUTF_16BE(str);System.out.println("轉換成UTF-16BE碼：" + utf16be);gbk = test.changeCharset(utf16be, UTF_16BE, GBK);System.out.println("再把UTF-16BE編碼的字符轉換成GBK碼：" + gbk);System.out.println();String utf16le = test.toUTF_16LE(str);System.out.println("轉換成UTF-16LE碼：" + utf16le);gbk = test.changeCharset(utf16le, UTF_16LE, GBK);System.out.println("再把UTF-16LE編碼的字符串轉換成GBK碼：" + gbk);System.out.println();String utf16 = test.toUTF_16(str);System.out.println("轉換成UTF-16碼：" + utf16);String gb2312 = test.changeCharset(utf16, UTF_16, GB2312);System.out.println("再把UTF-16編碼的字符串轉換成GB2312碼：" + gb2312); } } 文章出處：http://www.diybl.com/course/3_program/java/javaxl/20071126/87571.html

總結

以上是生活随笔為你收集整理的java字符编码方式总结的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Twitter创始人Jack Dorse
下一篇： Mathematica函数大全