當前位置：首頁 > 编程语言 > python >内容正文

python

gdal 压缩tif_Python | GDAL处理影像

發布時間：2023/12/10 python 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 gdal 压缩tif_Python | GDAL处理影像小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

GDAL柵格數據處理

柵格數據介紹
柵格數據讀取
讀取部分數據集
坐標變換
重采樣

什么是柵格數據

基本上是一個大的二維或三維數組
沒有獨立的幾何對象，只有像素的集合
二維：黑白圖片
三維：彩色/假彩色，多光譜/高光譜
可以存儲幾乎任何類型的數據
體積比較大
應用場景：遙感、氣象、水文、農業、海洋……

柵格數據都能存儲什么？

高程、坡度、坡向
溫度、風速、降水、蒸發
可見光、近紅外、微波等遙感數據

柵格數據小知識

柵格數據的仿射變換與幾何校正：通過一組坐標，像素的大小和數據集的旋轉量
存儲空間：雙精度浮點數>單精度浮點數>整數>無符號整數
概視圖：遞減分辨率，用于大數據快速顯示
有損壓縮與無損壓縮：地理科學數據應使用無損壓縮

GDAL數據集的基本結構

柵格數據讀取

driver.Create(filename, xsize, ysize, [bands], [data_type], [options])
- filename: 文件名，創建一個新文件
- xsize: x方向，列數
- ysize: y方向，行數
- bands: 波段數，默認值為1，從1開始(不是從0開始)
- data_type: 數據類型，默認為GDT_Byte(8位無符號整型)
- options: 其它選項，按數據類型而有所不同
GDAL支持的數據類型

# 導入gdal，注意導入的名稱import osfrom osgeo import gdal #或者直接用import gdalfrom matplotlib import pyplot as pltimage_data_dir = 'D:\BaiduNetdiskDownload\Landsat\Washington'# Be sure to change your directory.# 切換當前路徑os.chdir(image_data_dir)band1_fn = 'p047r027_7t20000730_z10_nn10.tif'band2_fn = 'p047r027_7t20000730_z10_nn20.tif'band3_fn = 'p047r027_7t20000730_z10_nn30.tif'# 打開波段1(注意這里是從1開始數)# Open band 1.in_ds = gdal.Open(band1_fn)# 用索引1，而不是0，來獲取第一個波段in_band = in_ds.GetRasterBand(1)plt.imshow(in_band.ReadAsArray(),cmap='gray')# Create a 3-band GeoTIFF with the same dimensions, data type, projection,# and georeferencing info as band 1. This will be overwritten if it exists.# 使用驅動對象來創建數據集，因為使用的是GeoTIFF驅動，無論給它任何擴展名，輸出的文件都是GeoTIFFgtiff_driver = gdal.GetDriverByName('GTiff')out_ds = gtiff_driver.Create('nat_color.tif', in_band.XSize, in_band.YSize, 3, in_band.DataType)# 重要：獲取空間信息# 第一句：得到投影(SRS)并復制到新的數據集# 第二句：得到geotransform信息并復制到新的數據集# 兩者的信息都很重要。# 前者與矢量數據類似，包含有完整的空間參考信息；# 后者提供原始坐標、像素大小、旋轉值，是柵格數據獨有的out_ds.SetProjection(in_ds.GetProjection())out_ds.SetGeoTransform(in_ds.GetGeoTransform())0
print(out_ds.GetProjection())PROJCS["WGS 84 / UTM zone 10N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-123],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32610"]]
print(out_ds.GetGeoTransform())(343724.25, 28.5, 0.0, 5369585.25, 0.0, -28.5)
# Copy data from band 1 into the output image.# 向輸出數據源out_ds寫入數據# 先取出想要寫入的波段# 再將numpy數組in_data的數據寫入in_data = in_band.ReadAsArray()out_band = out_ds.GetRasterBand(3)out_band.WriteArray(in_data)0
# Copy data from band 2 into the output image.# 讀取波段2，直接從數據集中讀取像素句柄in_ds = gdal.Open(band2_fn)out_band = out_ds.GetRasterBand(2)out_band.WriteArray(in_ds.ReadAsArray())0
# Copy data from band 3 into the output image.# 讀取波段3，更簡潔的寫法out_ds.GetRasterBand(1).WriteArray( gdal.Open(band3_fn).ReadAsArray())0
# Compute statistics on each output band.# 計算每個波段的統計量# 注意用range(1,4)表示在波段1，2，3之間循環# 統計每個波段的：平均值、最小值、最大值、標準差# 參數取False：從現有數據直接計算，True：用概視圖估計值out_ds.FlushCache()for i in range(1, 4): out_ds.GetRasterBand(i).ComputeStatistics(False)# 建立概視圖# Build overview layers for faster display.out_ds.BuildOverviews('average', [2, 4, 8, 16, 32])# 關閉數據源，這個時候才將內存中的對象寫入硬盤# This will effectively close the file and flush data to disk.del out_ds# 打開QGIS，或者ArcGIS，看看輸出文件

讀取部分數據集

上述案例讀取了整個波段(band)的數據
但是有的時候數據太大了，內存裝不下；或者我們只用其中很小一塊，沒必要全部讀取，這時候就需要分塊讀取
分塊讀取波段數據，讀入后輸出numpy數組：
band.ReadAsArray([xoff], [yoff], [win_xsize], [win_ysize], [buf_xsize], [buf_ysize], [buf_obj])
- xoff: 列讀取的起點，默認為0
- yoff: 行讀取的起點，默認為0
- win_xsize: 要讀取的列數，默認讀取所有列
- win_ysize: 要讀取的行數，默認讀取所有行
- buf_xsize: 輸出數組里的列數，默認用win_xsize的值，如果值不同于win_xsize，則會重新采樣
- buf_ysize: 輸出數組里的行數，默認用win_ysize的值，如果值不同于win_ysize，則會重新采樣
- buf_obj: 是一個事先創建好的numpy數組，讀取的結果會存入這個數組，而不是新建一個。如果需要，數據將會重采樣以適應這個數組，值將會轉換為這種數組的類型。
讀取部分數據集舉例：
從第1400列，6000行開始，讀取6列3行，不做重采樣

注意讀取數據的數組下標不要越界！GDAL并不會自動幫你處理下標越界的問題，它只會報錯。因此特別當你想用部分讀取的方式處理一個很大的文件時，對邊界的處理需要你特別的注意，必須正好讀完不能越界也不能少讀。

逐塊處理大數據

如果數據太大，內存放不下，可以每次讀取部分數據集。流程如下：

用ReadAsArray逐塊讀取數據舉例
- 處理11行13列的柵格數據
- 塊大小為5行5列
- 在右邊界自動轉換為3列
- 在下邊界自動轉換為1行

# 逐塊處理大數據案例# 將數字高程模型的單位從米轉換為英尺import osimport numpy as npfrom osgeo import gdal# 先切換路徑data_dir = 'D:\BaiduNetdiskDownload\dem'os.chdir(data_dir)# Open the input raster and get its dimensions.# 打開tif文件，讀取文件的尺寸，塊大小，nodata值等屬性in_ds = gdal.Open('gt30w140n90.tif')in_band = in_ds.GetRasterBand(1)xsize = in_band.XSizeysize = in_band.YSizeprint(xsize,ysize)# Get the block size and NoData value.block_xsize, block_ysize = in_band.GetBlockSize()nodata = in_band.GetNoDataValue()print(block_xsize,block_ysize,nodata)plt.imshow(in_band.ReadAsArray(),cmap='gray')4800 6000
4800 1 -9999.0

# 新建輸出數據源# Create an output file with the same dimensions and data type.out_ds = in_ds.GetDriver().Create( 'dem_feet.tif', xsize, ysize, 1, in_band.DataType)out_ds.SetProjection(in_ds.GetProjection())out_ds.SetGeoTransform(in_ds.GetGeoTransform())out_band = out_ds.GetRasterBand(1)# 二重循環，逐塊讀取數據# Loop through the blocks in the x direction.for x in range(0, xsize, block_xsize): # Get the number of columns to read. if x + block_xsize < xsize: cols = block_xsize else: cols = xsize - x # Loop through the blocks in the y direction. for y in range(0, ysize, block_ysize): # Get the number of rows to read. if y + block_ysize < ysize: rows = block_ysize else: rows = ysize - y # 對其中的每一小塊，將其單位從米轉換為英尺(乘以常數)，寫入輸出波段 # Read in one block's worth of data, convert it to feet, and then # write the results out to the same block location in the output. data = in_band.ReadAsArray(x, y, cols, rows) data = np.where(data == nodata, nodata, data * 3.28084) out_band.WriteArray(data, x, y)# 計算統計量，建立概視圖，寫入數據源等收尾工作# Compute statistics after flushing the cache and setting the NoData value.out_band.FlushCache()out_band.SetNoDataValue(nodata)out_band.ComputeStatistics(False)out_ds.BuildOverviews('average', [2, 4, 8, 16, 32])del out_ds# 打開QGIS，或者ArcGIS，看看輸出文件

坐標變換

到目前為止，我們都在像處理數組一樣處理柵格數據，只考慮了像素偏移，沒有考慮真實世界的坐標
坐標的轉換并不困難，需要用到：
- 柵格數據的SRS(空間參考)信息
- geotransform也就是柵格數據的地理變換信息
需要使用GDAL提供的函數
- ApplyGeoTransform()
- GetGeoTransform()
- InvGeoTransform()
讀取geotransform信息，這是gt變換對象
- gt = ds.GetGeoTransform()
生成一個逆變換：
- GDAL 2.X以上版本：inv_gt = gdal.InvGeoTransform(gt)
- GDAL 1.X版本：success, inv_gt = gdal.InvGeoTransform(gt)
使用逆變換將坐標轉換為數組偏移量
- offsets = gdal.ApplyGeoTransform(inv_gt, 465200, 5296000)
- xoff, yoff = map(int, offsets)
- value = band.ReadAsArray(xoff, yoff, 1, 1)[0,0]

# 讀取geotransform信息# Get the geotransform from one of the Landsat bands.os.chdir(image_data_dir)ds = gdal.Open('p047r027_7t20000730_z10_nn10.tif')band = ds.GetRasterBand(1)gt = ds.GetGeoTransform()print(gt)(343724.25, 28.5, 0.0, 5369585.25, 0.0, -28.5)
# 生成一個逆變換# Now get the inverse geotransform. The original can be used to convert# offsets to real-world coordinates, and the inverse can be used to convert# real-world coordinates to offsets.# GDAL 1.x: You get a success flag and the geotransform.# success, inv_gt = gdal.InvGeoTransform(gt)# print(success, inv_gt)# GDAL 2.x: You get the geotransform or Noneinv_gt = gdal.InvGeoTransform(gt)print(inv_gt)(-12060.5, 0.03508771929824561, 0.0, 188406.5, 0.0, -0.03508771929824561)
# 生成數組偏移量# Use the inverset geotransform to get some pixel offsets from real-world# UTM coordinates (since that's what the Landsat image uses). The offsets# are returned as floating point.offsets = gdal.ApplyGeoTransform(inv_gt, 465200, 5296000)print(offsets)[4262.307017543859, 2581.9385964912362]
# 將偏移量轉換為整數# Convert the offsets to integers.xoff, yoff = map(int, offsets)print(xoff, yoff)4262 2581
# 按照偏移量讀取一個像元# And use them to read a pixel value.value = band.ReadAsArray(xoff, yoff, 1, 1)[0,0]print(value)62
# 調用ReadAsArray函數比較耗資源，# 所以最好不要調用次數特別多，特別是不要每個柵格點都調用# 可以先把大量數據讀入內存，再按照偏移量取出對應位置的像元# Reading in one pixel at a time is really inefficient if you need to read# a lot of pixels, though, so here's how you could do it by reading in all# of the pixel values first and then pulling out the one you need.data = band.ReadAsArray()x, y = map(int, gdal.ApplyGeoTransform(inv_gt, 465200, 5296000))value = data[y, x] # 注意numpy需要的偏移量為[行, 列]，與GDAL的恰恰相反，GDAL為[列，行]！print(value)62
# 坐標變換案例：從整幅的landsat影像中截取華盛頓州Vashon島(給定Vashon島圖幅左上角和右下角的坐標)import osfrom osgeo import gdal# Vashon島圖幅左上角和右下角的坐標# Coordinates for the bounding box to extract.vashon_ulx, vashon_uly = 532000, 5262600vashon_lrx, vashon_lry = 548500, 5241500# 載入數據os.chdir(os.path.join(data_dir, 'Landsat', 'Washington'))in_ds = gdal.Open('nat_color.tif')# 生成逆變換# Create an inverse geotransform for the raster. This converts real-world# coordinates to pixel offsets.in_gt = in_ds.GetGeoTransform()inv_gt = gdal.InvGeoTransform(in_gt)# 自動判斷GDAL的版本if gdal.VersionInfo()[0] == '1': if inv_gt[0] == 1: inv_gt = inv_gt[1] else: raise RuntimeError('Inverse geotransform failed')elif inv_gt is None: raise RuntimeError('Inverse geotransform failed')# 計算偏移量# Get the offsets that correspond to the bounding box corner coordinates.offsets_ul = gdal.ApplyGeoTransform( inv_gt, vashon_ulx, vashon_uly)offsets_lr = gdal.ApplyGeoTransform( inv_gt, vashon_lrx, vashon_lry)# 轉換為整數# The offsets are returned as floating point, but we need integers.off_ulx, off_uly = map(int, offsets_ul)off_lrx, off_lry = map(int, offsets_lr)# 從偏移量計算出Vashon島圖幅的行數和列數# Compute the numbers of rows and columns to extract, based on the offsets.rows = off_lry - off_ulycolumns = off_lrx - off_ulx# 新建輸出數據源# Create an output raster with the correct number of rows and columns.gtiff_driver = gdal.GetDriverByName('GTiff')out_ds = gtiff_driver.Create('vashon.tif', columns, rows, 3)out_ds.SetProjection(in_ds.GetProjection())0
# 設定輸出數據源的坐標變換，注意要修改坐標起點# Convert the offsets to real-world coordinates for the georeferencing info.# We can't use the coordinates above because they don't correspond to the# pixel edges.subset_ulx, subset_uly = gdal.ApplyGeoTransform(in_gt, off_ulx, off_uly)out_gt = list(in_gt)out_gt[0] = subset_ulxout_gt[3] = subset_ulyout_ds.SetGeoTransform(out_gt)# 本案例的很多內容是之前內容的重復# 最重要的是計算Vashon島左上角和右下角的坐標對應的偏移值# 打印出來比較一下print(in_gt)print(out_gt)(343724.25, 28.5, 0.0, 5369585.25, 0.0, -28.5)
[531995.25, 28.5, 0.0, 5262624.75, 0.0, -28.5]
# 按偏移量讀取數據，存入新的文件# Loop through the 3 bands.for i in range(1, 4): in_band = in_ds.GetRasterBand(i) out_band = out_ds.GetRasterBand(i) # Read the data from the input raster starting at the computed offsets. data = in_band.ReadAsArray(off_ulx, off_uly, columns, rows) # Write the data to the output, but no offsets are needed because we're # filling the entire image. out_band.WriteArray(data)del out_ds# 打開QGIS，或者ArcGIS，看看輸出文件

重采樣

在數據讀取的時候就可以一并進行重采樣
band.ReadAsArray([xoff], [yoff], [win_xsize], [win_ysize], [buf_xsize], [buf_ysize], [buf_obj])
通過制定buf_xsize和buf_ysize的大小來實現
如果它們比win_xsize和win_ysize大，那么會重采樣為更高的分辨率，更小的像素
如果它們比win_xsize和win_ysize小，那么會重采樣為更低的分辨率，更大的像素，使用最鄰近插值來實現!
重采樣為更高分辨率，更小的像素
重采樣為更低分辨率，更大的像素

#??重采樣舉例# Get the first band from the raster created with listing 8.1.os.chdir(os.path.join(data_dir, 'Landsat', 'Washington'))ds = gdal.Open('nat_color.tif')band = ds.GetRasterBand(1)# 讀入2行3列# Read in 2 rows and 3 columns.original_data = band.ReadAsArray(1400, 6000, 3, 2)print(original_data)[[28 29 29]
[28 30 29]]
# 重采樣為6行4列# Now resample those same 2 rows and 3 columns to a smaller pixel size by# doubling the number of rows and columns to read (now 4 rows and 6 columns).resampled_data = band.ReadAsArray(1400, 6000, 3, 2, 6, 4)print(resampled_data)[[28 28 29 29 29 29]
[28 28 29 29 29 29]
[28 28 30 30 29 29]
[28 28 30 30 29 29]]
# 讀入6行4列# Read in 4 rows and 6 columns.original_data2 = band.ReadAsArray(1400, 6000, 6, 4)print(original_data2)[[28 29 29 27 25 25]
[28 30 29 25 32 28]
[27 27 28 30 25 29]
[26 26 27 30 25 30]]
# 重采樣為2行3列# Now resample those same 4 rows and 6 columns to a larger pixel size by# halving the number of rows and columns to read (now 2 rows and 3 columns).resampled_data2 = band.ReadAsArray(1400, 6000, 6, 4, 3, 2)print(resampled_data2)[[28 28 28]
[28 28 28]]

下一期我們會講到利用ogr讀取矢量數據。點擊閱讀原文獲取代碼中示例數據，提取碼為svqw。

歡迎掃碼關注。

總結

以上是生活随笔為你收集整理的gdal 压缩tif_Python | GDAL处理影像的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：走在网页游戏开发的路上（十）
下一篇： PureMVC（AS3）剖析：吐槽