當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

shell脚本读取csv_shell script 处理 CSV 文件(Excel)

發(fā)布時(shí)間：2025/3/19 编程问答 38 豆豆

生活随笔收集整理的這篇文章主要介紹了 shell脚本读取csv_shell script 处理 CSV 文件(Excel) 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

CSV 是一種非常方便的數(shù)據(jù)交換格式。業(yè)務(wù)人員可以方便的在 Excel 進(jìn)行編輯，然后上傳到業(yè)務(wù)系統(tǒng)中。但是對(duì)于 Developer，Excel 略顯笨重，并且編程方便并不那么友好。

本文將介紹一種方式，在 shell script 中處理 CSV 文件。內(nèi)容涉及：

提取數(shù)據(jù)行

統(tǒng)計(jì)行數(shù)

輸出指定列

按列排序

比較兩列數(shù)據(jù)

其中將使用如下指令：

cat

grep

awk

sort

vimdiff

提取數(shù)據(jù)行

中國城市近 4 年房產(chǎn)價(jià)格：

City,2013,2014,2015,2016

Beijing,22000,30000,35000,38000

Shanghai,20000,25000,30000,35000

Shaanxi-xi'an,6000,5800,5700,6000

Shaanxi-baoji,1000,1200,2000,2000

提取 Shaanxi 價(jià)格

cat apartment_prices.csv | grep ^Shaanxi

# output

# Shaanxi-xi'an,6000,5800,5700,6000

# Shaanxi-baoji,1000,1200,2000,2000

提取非 Shaanxi 價(jià)格

cat apartment_prices.csv | grep -v ^Shaanxi

# output

# City,2013,2014,2015,2016

# Beijing,22000,30000,35000,38000

# Shanghai,20000,25000,30000,35000

cat 用于輸出 apartment_prices.csv 文件內(nèi)容，grep 用于按照正則過濾我們需要的文件。

grep ^Shaanxi：提取以 Shaanxi 開頭的數(shù)據(jù)行，^ 在 Regex 中用于指定行首。

grep -v ^Shaanxi：-v 參數(shù)為 --invert-match，提取不滿足 Regex 條件的數(shù)據(jù)。

統(tǒng)計(jì)行數(shù)

中國城市近 4 年房產(chǎn)價(jià)格：

City,2013,2014,2015,2016

Beijing,22000,30000,35000,38000

Shanghai,20000,25000,30000,35000

Shaanxi-xi'an,6000,5800,5700,6000

Shaanxi-baoji,1000,1200,2000,2000

總數(shù)據(jù)行:

cat apartment_prices.csv | wc -l

# 5

wc -l：wc(Word Count) 用于行數(shù)，字?jǐn)?shù)等數(shù)據(jù)統(tǒng)計(jì)，-l 代表按行數(shù)統(tǒng)計(jì)。

數(shù)據(jù)行數(shù)(沒有header)：

cat apartment_prices.csv | tail -n +2

# Beijing,22000,30000,35000,38000

# Shanghai,20000,25000,30000,35000

# Shaanxi-xi'an,6000,5800,5700,6000

# Shaanxi-baoji,1000,1200,2000,2000

cat apartment_prices.csv | tail -n +2 | wc -l

# 4

tail -n +2： tail 用于從文件尾部讀取數(shù)據(jù)，-n +2 指定從第二行讀取到行尾。我們常用 tail -f logfile.log 來監(jiān)控 log 輸出。

Shaanxi數(shù)據(jù)量:

cat apartment_prices.csv | grep ^Shaanxi | wc -l

# 2

輸出指定列

中國城市近 4 年房產(chǎn)價(jià)格：

City,2013,2014,2015,2016

Beijing,22000,30000,35000,38000

Shanghai,20000,25000,30000,35000

Shaanxi-xi'an,6000,5800,5700,6000

Shaanxi-baoji,1000,1200,2000,2000

輸出 2014 年的數(shù)據(jù)：

cat apartment_prices.csv | awk -F, '{ print $3; }'

awk -F, '{ print $3; }'：awk 是 linux 中非常強(qiáng)大的列表處理工具，linux 系統(tǒng)中的幾乎所有輸出都可以用 awk 處理。

-F,：指定列分隔符為 ',' (CSV格式)，默認(rèn)為空格，制表符。

pring $3：輸出第3列數(shù)據(jù)。

output：

# 2014

# 30000

# 25000

# 5800

# 1200

# 輸出前 3 列

cat apartment_prices.csv | awk -F, '{ print $1","$2","$3; }'

output：

City,2013,2014

Beijing,22000,30000

Shanghai,20000,25000

Shaanxi-xi'an,6000,5800

Shaanxi-baoji,1000,1200

按列排序

中國城市近 4 年房產(chǎn)價(jià)格：

City,2013,2014,2015,2016

Beijing,22000,30000,35000,38000

Shanghai,20000,25000,30000,35000

Shaanxi-xi'an,6000,5800,5700,6000

排序 2014 年數(shù)據(jù)：

# 僅輸出數(shù)據(jù) 2014 的數(shù)據(jù)

cat apartment_prices.csv | tail -n +2 | awk -F, '{ print $3 }' | sort -g

sort -g：sort 用于排序數(shù)據(jù)，默認(rèn)按照數(shù)據(jù)長度排序，-g 指定按照數(shù)字值排序。

output：

1200

5800

25000

30000

# 輸出 city 信息

cat apartment_prices.csv | tail -n +2 | awk -F, '{ print $1 "," $3 }' | sort -t ',' -k 2 -g

sort -t ',' -k 2 -g： -k 2 指定按照第二列排序，-t , 指定列分隔符。

output：

Shaanxi-baoji,1200

Shaanxi-xi'an,5800

Shanghai,25000

Beijing,30000

比較兩列數(shù)據(jù)

比較數(shù)據(jù)需要將數(shù)據(jù)輸出到文件，然后使用git diff 或者 vim -d file1 file2 來比較。

比如比較 2014 和 2015 的數(shù)據(jù)

# 1. 提取 2014 年的數(shù)據(jù)

cat apartment_prices.csv | tail -n +2 | awk -F, '{ print $4; }' > 2014

# 2. 提取 2015 年的數(shù)據(jù)

cat apartment_prices.csv | tail -n +2 | awk -F, '{ print $5; }' > 2015

# 3. git diff 2014 2015

git diff 2014 2015

output diff：

diff --git a/2014 b/2015

index 162645c..6accdc5 100644

--- a/2014

+++ b/2015

@@ -1,4 +1,4 @@

+38000

35000

-30000

-5700

+6000

2000

實(shí)戰(zhàn)應(yīng)用

最近業(yè)務(wù)人員需要更新產(chǎn)品的數(shù)據(jù)，通常情況，業(yè)務(wù)人員使用 Excel 做好產(chǎn)品數(shù)據(jù)，然后使用 CSV 格式導(dǎo)入到系統(tǒng)中。但是由于遺留系統(tǒng)的原因，CSV數(shù)據(jù)需要導(dǎo)入到不同的系統(tǒng)中。遺留系統(tǒng)的數(shù)據(jù)同步需要 Developer 手動(dòng)進(jìn)行。

在整個(gè)數(shù)據(jù)同步過程中，涉及數(shù)據(jù)提取，比較，校驗(yàn)等過程。由于 Excel 對(duì)于 Developer 并不那么友好，因此用到了本文中的處理方法，使用 shell script 處理 CSV 文件。

參考資料

總結(jié)

以上是生活随笔為你收集整理的shell脚本读取csv_shell script 处理 CSV 文件(Excel)的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： elasticsearch最大节点数_E
下一篇： ddd架构无法重构_漫谈分层架构：为什