去除文件头部的u+feff_关于FEFF的简短故事,一个不可见的UTF-8字符破坏了我们的CSV文件
去除文件頭部的u+feff
Today, we encountered an error while trying to create some database seeds from a CSV. This CSV was originally generated by me using a Ruby script which piped the output to a file and saved as a CSV.
今天,我們在嘗試從CSV創建一些數據庫種子時遇到錯誤。 該CSV最初是由我使用Ruby腳本生成的,該腳本將輸出通過管道傳輸到文件并另存為CSV。
The CSV was checked in to Git and had been used for awhile until we had to update some parts of it by adding a new column and fixing some values.
CSV已簽入Git,并使用了一段時間,直到我們不得不通過添加新列并修復一些值來更新其中的某些部分。
While we don’t know the exact reason yet, my theory is that somehow, Excel for Mac (we are all using Macs) added some additional metadata to it even after saving the file as a CSV.
盡管我們尚不知道確切原因,但我的理論是,即使將文件另存為CSV,Excel for Mac(我們都在使用Mac)也向其中添加了一些其他元數據。
This in turn made anyone using the seed receive the following error:
反過來,這使使用種子的任何人都收到以下錯誤:
CSV::MalformedCSVError: Illegal quoting in line 1.I opened the CSV file and nothing looked suspicious. My first thought was some left/right quotation marks were somehow mixed into the file instead of just the ‘normal’ double quotes: ". But upon further investigation, there was nothing out of the ordinary. This led me to just wipe out the whole file, and actually type out the first row again.
我打開了CSV文件,但沒有任何可疑的地方。 我首先想到的是,文件中混入了一些左/右引號,而不僅僅是“正常”雙引號: " 。但是,經過進一步的調查,發現并沒有什么不尋常的地方。這導致我只消了整個內容。文件,然后再次鍵入第一行。
I saved that file again and ran the migration:
我再次保存該文件并運行遷移:
CSV::MalformedCSVError: Illegal quoting in line 1.What?!
什么?!
Okay, this was driving me nuts. I opened up a new file, typed the exact single line again, and ran the migration. It worked. So what was in that file?!
好吧,這真讓我發瘋。 我打開了一個新文件,再次鍵入了確切的單行,然后運行了遷移。 有效。 那那個文件里有什么?
Only one way to find out:
只有一種方法可以找出:
cat companies.csv | pbcopy | pbpaste > temp.csv rm companies.csv mv temp.csv companies.csv git diffSo OSX has these two functions that are very useful: pbcopy and pbpaste. Basically anything piped to pbcopy gets into your clipboard and pbpaste puts what you have on your clipboard to standard output (stdout). But it removes all formatting.
因此OSX具有這兩個非常有用的功能: pbcopy和pbpaste 。 基本上,通過管道傳輸到pbcopy都會進入剪貼板,而pbpaste會將剪貼板上的pbpaste放入標準輸出(stdout)。 但是它將刪除所有格式。
Very useful when you want to just copy some text from somewhere and you want to paste it into a WYSIWYG editor without all the formatting. Like when writing an email from Gmail, for example.
當您只想從某處復制一些文本并將其粘貼到WYSIWYG編輯器而不使用所有格式時,此功能非常有用。 例如,從Gmail編寫電子郵件時。
I then removed the original file and saved the new ‘unformatted’ file with the same file name so I could see the difference.
然后,我刪除了原始文件,并使用相同的文件名保存了新的“未格式化”文件,這樣我就可以看到區別。
And we finally saw the invisible man:
最后我們看到了那個看不見的人:
A quick Google search told us that our friend U+FEFF was called a ZERO WIDTH NO-BREAK SPACE. Also, a quick trip to Wikipedia told us about the actual uses for U+FEFF, more commonly known as Byte order mark or BOM.
快速的Google搜索告訴我們,我們的朋友U+FEFF被稱為ZERO WIDTH NO-BREAK SPACE 。 另外, 快速訪問Wikipedia告訴了我們U+FEFF的實際用法,通常被稱為Byte order mark或BOM 。
Our friend FEFF means different things, but it’s basically a signal for a program on how to read the text. It can be UTF-8 (more common), UTF-16, or even UTF-32.
我們的朋友FEFF意味著不同的事情,但這基本上是一個程序如何閱讀文本的信號。 它可以是UTF-8 (更常見), UTF-16甚至UTF-32 。
FEFF itself is for UTF-16 — in UTF-8 it is more commonly known as 0xEF,0xBB, or 0xBF.
FEFF本身是針對UTF-16 -在UTF-8它通常被稱為0xEF,0xBB, or 0xBF 。
From my understanding, when the CSV file was opened in Excel and saved, Excel created a space for our invisible stowaway, U+FEFF. And in front of the file to boot!
據我了解,當在Excel中打開并保存CSV文件時,Excel為我們的隱形U+FEFF創建了一個空間。 并在文件前面啟動!
Excel did some magic, and it was probably saved in UTF-16 instead of UTF-8. UTF-8 does not understand BOM and just treats it as a non-character so visually, the file was okay. But Ruby’s CSV thought that there was something wrong because it assumed the file it was reading was UTF-8 and it couldn’t ignore Mr. U+FEFF.
Excel做了一些魔術,它可能保存在UTF-16而不是UTF-8 。 UTF-8不了解BOM而只是將其視為非字符,因此從視覺上看,該文件還可以。 但是Ruby的CSV認為出了點問題,因為它假定正在讀取的文件是UTF-8 ,并且不能忽略U+FEFF先生。
So lesson learned: don’t open (and save!) a CSV file in Excel if you want to feed it to Ruby’s CSV parser.
因此,我們汲取了教訓:如果您想將其饋送到Ruby的CSV解析器中,請不要在Excel中打開(并保存!)CSV文件。
If you do ever encounter an error like that, be sure to look for hidden characters not shown by your editor. If you still can’t see it and are using OSX, then pbcopy and pbpaste will help you out — they strip out any formatting or hidden characters from text in addition to copying and pasting it.
如果您確實遇到過這樣的錯誤,請確保查找編輯器未顯示的隱藏字符。 如果您仍然看不到它并使用OSX,則pbcopy和pbpaste將為您提供幫助-除了復制和粘貼外,它們還會從文本中刪除所有格式或隱藏字符。
翻譯自: https://www.freecodecamp.org/news/a-quick-tale-about-feff-the-invisible-character-cd25cd4630e7/
去除文件頭部的u+feff
總結
以上是生活随笔為你收集整理的去除文件头部的u+feff_关于FEFF的简短故事,一个不可见的UTF-8字符破坏了我们的CSV文件的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 顶尖科技棋牌游戏开发_如何接受顶尖科技公
- 下一篇: 梦到自己又怀孕了是什么意思