當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

blob字段乱码怎么处理_下载的附件名总乱码？你该去读一下 RFC 文档了！

發布時間：2024/9/27 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 blob字段乱码怎么处理_下载的附件名总乱码？你该去读一下 RFC 文档了！小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

紙上得來終覺淺，絕知此事要躬行

Web 開發過程中，相信大家都遇到過附件下載的場景，其中，各瀏覽器下載后的文件名中文亂碼問題或許一度讓你苦惱不已。

網上搜索一下，大部分都是通過Request Headers中的UserAgent字段來判斷瀏覽器類型，根據不同的瀏覽器做不同的處理，類似下面的代碼：

//?MicroSoft?Browser
if?(agent.contains("msie")?||?agent.contains("trident")?||?agent.contains("edge"))?{
??//?filename?特殊處理
}
//?firefox
else?if?(agent.contains("firefox"))?{
??//?filename?特殊處理
}
//?safari
else?if?(agent.contains("safari"))?{
??//?filename?特殊處理
}
//?Chrome
else?if?(agent.contains("chrome"))?{
??//?filename?特殊處理
}
//?其他
else{
?//?filename?特殊處理
}
//最后把特殊處理后的文件名放到head里
response.setHeader("Content-Disposition",
????????????????????"attachment;fileName="?+?filename);

不過，這樣的代碼看起來很魔幻，為什么每個瀏覽器的處理方式都不一樣？難道每次新出一個瀏覽器都要做兼容嗎？就沒有一個統一標準來約束一下這幫瀏覽器嗎？

帶著這個疑惑，我翻閱了 RFC 文檔，最終得出了一個優雅的解決方案：

//?percentEncodedFileName?為百分號編碼后的文件名
response.setHeader("Content-disposition",
????????"attachment;filename="?+?percentEncodedFileName?+
????????????????";filename*=utf-8''"?+?percentEncodedFileName);

經過測試，這段響應頭可以兼容市面上所有主流瀏覽器，由于是 HTTP 協議范疇，所以語言無關。只要按這個規則設置響應頭，就能一勞永逸地解決惱人的附件名中文亂碼問題。

接下來課代表帶大家抽絲剝繭，通過閱讀 RFC 文檔，還原一下這個響應頭的產出過程。

1. Content-Disposition

一切要從 RFC 6266^[1] 開始，在這份文檔中，介紹了Content-Disposition響應頭，其實它并不屬于HTTP標準，但是因為使用廣泛，所以在該文檔中進行了約束。它的語法格式如下：

content-disposition?=?"Content-Disposition"?":"
????????????????????????????disposition-type?*(?";"?disposition-parm?)

?????disposition-type????=?"inline"?|?"attachment"?|?disp-ext-type
?????????????????????????;?case-insensitive
?????disp-ext-type???????=?token

?????disposition-parm????=?filename-parm?|?disp-ext-parm

?????filename-parm???????=?"filename"?"="?value
?????????????????????????|?"filename*"?"="?ext-value

其中的disposition-type有兩種：

inline 代表默認處理，一般會在頁面展示
attachment 代表應該被保存到本地，需要配合設置filename或filename*

注意到disposition-parm中的filename和filename*，文檔規定：這里的信息可以用于保存的文件名。

它倆的區別在于，filename 的 value 不進行編碼，而filename*遵從 RFC 5987^[2]中定義的編碼規則：

Producers?MUST?use?either?the?"UTF-8"?([RFC3629])?or?the?"ISO-8859-1"
???([ISO-8859-1])?character?set.

由于filename*是后來才定義的，許多老的瀏覽器并不支持，所以文檔規定，當二者同時出現在頭字段中時，需要采用filename*，忽略filename。

至此，響應頭的骨架已經呼之欲出了，摘錄 [RFC 6266] 中的示例如下：

?Content-Disposition:?attachment;
??????????????????????filename="EURO?rates";
??????????????????????filename*=utf-8''%e2%82%ac%20rates

這里對filename*=utf-8''%e2%82%ac%20rates做一下說明，這個寫法乍一看可能會覺得很奇怪，它其實是用單引號作為分隔符，將等號右邊分成了三部分：第一部分是字符集(utf-8)，中間部分是語言(未填寫)，最后的%e2%82%ac%20rates代表了實際值。對于這部分的組成，在RFC 2231^[3].section 4 中有詳細說明：

?A?single?quote?is?used?to
???separate?the?character?set,?language,?and?actual?value?information?in
???the?parameter?value?string,?and?an?percent?sign?is?used?to?flag
???octets?encoded?in?hexadecimal.

2.PercentEncode

PercentEncode 又叫 Percent-encoding 或 URL encoding.

正如前文所述，filename*遵守的是[RFC 5987] 中定義的編碼規則，在[RFC 5987] 3.2中定義了必須支持的字符集：

recipients?implementing?this?specification?
MUST?support?the?character?sets?"ISO-8859-1"?and?"UTF-8".

并且在[RFC 5987] 3.2.1規定，百分號編碼遵從 RFC 3986^[4].section 2.1中的定義，摘錄如下：

A?percent-encoding?mechanism?is?used?to?represent?a?data?octet?in?a
component?when?that?octet's?corresponding?character?is?outside?the
allowed?set?or?is?being?used?as?a?delimiter?of,?or?within,?the
component.??A?percent-encoded?octet?is?encoded?as?a?character
triplet,?consisting?of?the?percent?character?"%"?followed?by?the?two
hexadecimal?digits?representing?that?octet's?numeric?value.??For
example,?"%20"?is?the?percent-encoding?for?the?binary?octet
"00100000"?(ABNF:?%x20),?which?in?US-ASCII?corresponds?to?the?spacecharacter?(SP).??Section?2.4?describes?when?percent-encoding?and
decoding?is?applied.

注意了，[RFC 3986] 明確規定了空格會被百分號編碼為%20

而在另一份文檔 RFC 1866^[5].Section 8.2.1 The form-urlencoded Media Type 中卻規定：

The?default?encoding?for?all?forms?is?`application/x-www-form-
???urlencoded'.?A?form?data?set?is?represented?in?this?media?type?as
???follows:
????????1.?The?form?field?names?and?values?are?escaped:?space
????????characters?are?replaced?by?`+',?and?then?reserved?characters
????????are?escaped?as?per?[URL]

這里要求application/x-www-form-urlencoded類型的消息中，空格要被替換為+,其他字符按照[URL]中的定義來轉義，其中的[URL]指向的是RFC 1738^[6] 而它的修訂版中和 URL 有關的最新文檔恰恰就是 [RFC 3986]

這也就是為什么很多文檔中描述空格(white space)的百分號編碼結果都是 +或%20，如：

w3schools:URL encoding normally replaces a space with a plus (+) sign or with %20.

MDN:Depending on the context, the character ' ' is translated to a '+' (like in the percent-encoding version used in an application/x-www-form-urlencoded message), or in '%20' like on URLs.

那么問題來了，開發過程中，對于空格符的百分號編碼我們應該怎么處理？

課代表建議大家遵循最新文檔，因為 [RFC 1866] 中定義的情況僅適用于application/x-www-form-urlencoded類型，就百分號編碼的定義來說，我們應該以 [RFC 3986] 為準，所以，任何需要百分號編碼的地方，都應該將空格符百分號編碼為%20，stackoverflow 上也有支持此觀點的答案：When to encode space to plus (+) or %20?^[7]

3. 代碼實踐

有了理論基礎，代碼寫起來就水到渠成了，直接上代碼：

@GetMapping("/downloadFile")
public?String?download(String?serverFileName,?HttpServletRequest?request,?HttpServletResponse?response)?throws?IOException?{

????request.setCharacterEncoding("utf-8");
????response.setContentType("application/octet-stream");

????String?clientFileName?=?fileService.getClientFileName(serverFileName);
????//?對真實文件名進行百分號編碼
????String?percentEncodedFileName?=?URLEncoder.encode(clientFileName,?"utf-8")
????????????.replaceAll("\\+",?"%20");

????//?組裝contentDisposition的值
????StringBuilder?contentDispositionValue?=?new?StringBuilder();
????contentDispositionValue.append("attachment;?filename=")
????????????.append(percentEncodedFileName)
????????????.append(";")
????????????.append("filename*=")
????????????.append("utf-8''")
????????????.append(percentEncodedFileName);
????response.setHeader("Content-disposition",
????????????contentDispositionValue.toString());
????
????//?將文件流寫到response中
????try?(InputStream?inputStream?=?fileService.getInputStream(serverFileName);
?????????OutputStream?outputStream?=?response.getOutputStream()
????)?{
????????IOUtils.copy(inputStream,?outputStream);
????}

????return?"OK!";
}

代碼很簡單，其中有兩點需要說明一下：

URLEncoder.encode(clientFileName, "utf-8")方法之后，為什么還要.replaceAll("\\+", "%20")。

正如前文所述，我們已經明確，任何需要百分號編碼的地方，都應該把空格符編碼為 %20，而URLEncoder這個類的說明上明確標注其會將空格符轉換為+:

The space character " ? " is converted into a plus sign "{@code +}".

其實這并不怪 JDK，因為它的備注里說明了其遵循的是application/x-www-form-urlencoded( PHP 中也有這么一個函數，也是這么個套路)

Translates a string into {@code application/x-www-form-urlencoded} format using a specific encoding scheme. This method uses the

所以這里我們用.replaceAll("\\+", "%20") 把+號處理一下，使其完全符合 [RFC 3986] 的百分號編碼規范。這里為了方便說明問題，把所有操作都展現出來了。當然，你完全可以自己實現一個PercentEncoder類，豐儉由人。

[RFC 6266] 標準中filename=的value是不需要編碼的，這里的filename=后面的 value 為什么要百分號編碼？

回顧 [RFC 6266] 文檔， filename和filename*同時出現時取后者，瀏覽器太老不支持新標準時取前者。

目前主流的瀏覽器都采用自升級策略，所以大部分都支持新標準------除了老版本IE。老版本的IE對 value 的處理策略是進行百分號解碼并使用。所以這里專門把filename=的value進行百分號編碼，用來兼容老版本 IE。

PS：課代表實測 IE11 及 Edge 已經支持新標準了。

4. 瀏覽器測試

根據下圖 statcounter 統計的 2019 年中國市場瀏覽器占有率，課代表設計了一個包含中文，英文，空格的文件名下載-down test .txt用來測試

測試結果：

BrowserVersionpass

Chrome	84.0.4147.125	true
UC	V6.2.4098.3	true
Safari	13.1.2	true
QQ Browser	10.6.1(4208)	true
IE	7-11	true
Firefox	79.0	true
Edge	44.18362.449.0	true
360安全瀏覽器12	12.2.1.362.0	true
Edge(chromium)	84.0.522.59	true

根據測試結果可知：基本已經能夠兼容市面上所有主流瀏覽器了。

5.總結

回顧本文內容，其實就是瀏覽器兼容性問題引發的附件名亂碼，為了解決這個問題，查閱了兩類標準文檔：

HTTP 響應頭相關標準

[RFC 6266]、[RFC 1866]

編碼標準

[RFC 5987]、[RFC 2231]、[3986]、[1738]

我們以 [RFC 6266] 為切入點，全文總共引用了 6 個 [RFC] 相關文檔，引用都標明了出處，感興趣的同學可以跟著文章思路閱讀一下原文檔，相信你會對這個問題有更深入的理解。文中代碼已上傳 github^[8]

最后不禁要感嘆一下：規范真是個好東西，它就像 Java 語言中的 interface，只制定標準，具體實現留給大家各自發揮。

參考資料

[1]

RFC 6266: https://tools.ietf.org/html/rfc6266

[2]

RFC 5987: https://tools.ietf.org/html/rfc5987

[3]

RFC 2231: https://tools.ietf.org/html/rfc2231

[4]

RFC 3986: https://tools.ietf.org/html/rfc3986

[5]

RFC 1866: https://tools.ietf.org/html/rfc1866

[6]

RFC 1738: https://tools.ietf.org/html/rfc1738

[7]

When to encode space to plus (+) or %20?: https://stackoverflow.com/questions/2678551/when-to-encode-space-to-plus-or-20

[8]

課代表的 github: https://github.com/zhengxl5566/springboot-demo

精彩推薦

強大，10k+點贊的 SpringBoot 后臺管理系統竟然出了詳細教程！冒著被開除的風險也要給大家看看看這份SpringCloud 總結微服務 2.0 技術棧選型手冊天天在用Stream，那你知道如此強大的Stream的實現原理嗎？

總結

以上是生活随笔為你收集整理的blob字段乱码怎么处理_下载的附件名总乱码？你该去读一下 RFC 文档了！的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： pythonmysql查询转list_p
下一篇： dom文档对象手册_HTML5学习之DO