當前位置：首頁 > 编程语言 > python >内容正文

python

python中文字符_python处理中文字符

發布時間：2025/4/16 python 12 豆豆

生活随笔收集整理的這篇文章主要介紹了 python中文字符_python处理中文字符小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.在py文件中使用中文字符

unicode.py文件內容如下所示：

# -*- coding:utf-8 -*-

str_ch = '我們women'

uni_ch = u'我們women'

print "type:", type(str_ch), "content:", str_ch, repr(str_ch)

print "type:", type(uni_ch), "content:", uni_ch, repr(uni_ch)

需要在文件第一行輸入以下內容：“# -*- coding: utf-8 -*-"，否則在執行時將會拋出如下異常信息。

SyntaxError: Non-ASCII character '\xe6' in file unicode.py on line 3, but no encoding declared;

在聲明編碼類別后，執行結果如下：

type: content: 我們women '\xe6\x88\x91\xe4\xbb\xacwomen'

type: content: 我們women u'\u6211\u4eecwomen'

使用命令“od -t c unicode.py”查看文件在硬盤上的內容如下：

0000000 # - * - c o d i n g : u t f

0000020 - 8 - * - \n \n s t r _ c h =

0000040 ' 346 210 221 344 273 254 w o m e n ' \n u

0000060 n i _ c h = u ' 346 210 221 344 273 254

0000100 w o m e n ' \n \n p r i n t " t

0000120 y p e : " , t y p e ( s t r _

0000140 c h ) , " c o n t e n t : " ,

0000160 s t r _ c h , r e p r ( s

0000200 t r _ c h ) \n p r i n t " t y

0000220 p e : " , t y p e ( u n i _ c

0000240 h ) , " c o n t e n t : " ,

0000260 u n i _ c h , r e p r ( u n

0000300 i _ c h ) \n

注：346為8進制。

可以看到中文字符在硬盤中以utf-8形式保存，在執行時被python解釋器讀入內存，遇到非ascii字符時，需要用指定的編碼進行轉換。

2. Python中字符類型str和unicode

Unicode使用code point描述字符，一個code point就是一個整數值，16-bit。所以，unicode字符串就是一串code point。

書寫的方式可以是：

uni_str = u"我們"

uni_str = u"\xac" # 2個16進制數表示

uni_str = u"\u1234" # 4個16進制數表示

uni_str = u"\U00008000" # 8個16進制數表示

str是8bit，從0-255。書寫方式如下：

s = '0'

s = '\x30'

s = '\060'

s = chr(48)

# ord(s) 都是48

encoding：將unicode字符串轉換成一串bytes（0-255）。

python默認的encoding和decoding都是ascii，當數值超過128時都將會報編碼或解碼錯誤。

《新程序員》：云原生和全面數字化實踐50位技術專家共同創作，文字、視頻、音頻交互閱讀

總結

以上是生活随笔為你收集整理的python中文字符_python处理中文字符的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： hadoop官方文档_hadoop体系简
下一篇： java求数列的最大子段和_天下无双的公