當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Storage Systems

發布時間：2024/1/18 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 Storage Systems 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

參考： $Computer\ Arichitecture\ (6\th\ Edition)$

Bus
Disk Storage
Use Arrays of Small Disks?
RAID
- RAID 0: Striping
- RAID 1: Disk Mirroring/Shadowing
- RAID 2: 位交叉式海明編碼陣列
- RAID 3: Bit-interleaved Parity Disk
- RAID 4: Block-interleaved Parity Disk
- RAID 5: Block-interleaved Distributed Parity
- RAID 6: 雙維奇偶校驗獨立存取盤陣列
- RAID 的實現
Storage Environment
- Direct Attached Storage (DAS)
- Network Attached Storage (NAS)
- Storage Area Network (SAN)

Memory (存儲系統): 內存
Storage Systems (存貯系統): 外存 (持久性、非易失性)

Bus

I/O buses tap into the processor-memory bus via bus adaptors: 適配器用于速度匹配（做緩存）、做接口

Main components of Intel Chipset: Pentium 4

Northbridge (接高速設備的適配器): Handles memory, Graphics
Southbridge (接低速設備的適配器): I/O, PCI bus, Disk controllers, USB controllers, Audio, Serial I/O, Interrupt controller, Timers

IMC（Integrated Memory Controller）

可以看到，CPU 集成度越來越高: Memory Controller 被集成到了 CPU 內部，北橋消失了。同時 L1 和 L2 Cache 被集成到了每個 Core 里，L3 Cache 被四個核共享，也被集成到了 CPU 里
QPI (Quick Path Interconnect)——“快速通道互聯”，支持多條系統總線連接，取代前端總線 (FSB)

下一步把 Memory 也集成進 CPU…

The move from Parallel to Serial I/O

Parallel I/O (ISA bus, PCI, SCSI, IDE)
- Parallel bus clock rate limited by clock skew across long bus (~100MHz)
- High power to drive large number of loaded bus lines
- Central bus arbiter (總線仲裁器) adds latency to each transaction, sharing limits throughput
- Expensive parallel connectors and backplanes/cables (all devices pay costs)
Dedicated Point-to-point Serial Links (Ethernet, Infiniband, PCI Express, SATA, USB, Firewire)
- Point-to-point links run at multi-gigabit speed using advanced clock/signal encoding (requires lots of circuitry at each end)
- Lower power since only one well-behaved load
- Multiple simultaneous transfers
- Cheap cables and connectors (trade greater endpoint transistor cost for lower physical wiring cost), customize bandwidth per device using multiple links in parallel
Examples: 硬盤接口: IDE (并行) $\rightarrow$ SATA (串行)

Disk Storage

Storage emphasizes reliability and scalability (可擴展性) as well as cost-performance (性價比)
What is “Software king” that determines which HW features actually used?
- Compiler for processor
- Operating System for storage

Flash: The future of disks? (固態硬盤)

Flash drive advantages: Lower power (no moving parts), Much faster seek time, 100X IOs per second (no moving parts), Greater reliability (no moving parts), Lower noise (no moving parts) (數據不移動時表現好)
Flash disadvantages: Cost (20-100x disk cost/GB), Slow writes with current design (competitive with disks), write endurance (耐久度不行，某一個位置寫的次數多就壞了) - not an issue for most applications since use write-leveling to spread wear around blocks on chip (通過軟件來處理該問題)

Disk Figure of Metric: Areal Density

Bits recorded along a track; Metric is Bits Per Inch (BPI)
Number of tracks per surface; Metric is Tracks Per Inch (TPI)
bit density per unit area; Metric is Bits Per Square Inch: Areal Density $\textrm{BPI} \times \textrm{TPI}$

Disk Drive Performance

Disk Service Time: Time taken by a disk to complete an I/O request is sum of
- Seek Time (尋道時間), Rotational Latency, Data Transfer Rate（MB/s）

Utilization vs. Response time

利用率和響應時間

利用率 (I/O 請求頻率) 越高，響應時間越長

反映存儲外設可靠性能的參數

Reliability 系統可靠性: 系統從初始狀態開始一直提供服務的能力
- 用平均無故障時間 MTTF (Mean Time to Failure) 來衡量
Availability 系統可用性: 系統正常工作時間在連續兩次正常服務間隔時間中所占的比率
- 用 $\frac{\textrm{MTTF}}{\textrm{MTTF} +\textrm{MTTR}}$ （Mean Time To Repair, 平均修復時間）來衡量 (修復 $\rightarrow$ 數據恢復)
- MTTF + MTTR = MTBF（Mean Time Between Failure, 平均故障間隔時間）
Dependability 系統可信性: 多大程度上可以合理地認為服務是可靠的
- 可信性不可度量

Use Arrays of Small Disks?

Replace Small Number of Large Disks with Large Number of Small Disks!

Disk Arrays have potential for large data and I/O rates, high MB per cu. ft., high MB per KW, but what about reliability?

Array Reliability

Reliability of $N$ disks = Reliability of 1 Disk $\ N$
Arrays (without redundancy) too unreliable to be useful!

RAID

Redundant Arrays of (Inexpensive) Disks; 廉價磁盤冗余陣列

Files are “striped” across multiple disks (將數據以條帶化的形式存儲在很多磁盤上)
Redundancy yields high data availability 可用性 (Disks will still fail)
- Availability: service still provided to user, even if some components failed
Contents reconstructed from data redundantly stored in the array
- Capacity penalty to store redundant info
- Bandwidth penalty to update redundant info

RAID 0: Striping

數據條帶化

RAID 0: 非冗余磁盤陣列，無冗余信息；
將數據分成條帶 (stripe)，以條帶為單位交叉地分布存放到各個磁盤中，形成一個容量更大，能并行工作的磁盤 (圖中 Stripe0, Stripe1… 為按順序排列的條帶，其大小稱為條帶寬度)

所有磁盤可以并行讀，因此性能很高；但不提供數據冗余，只要其中任一磁盤故障，整個系統都無法正常工作
- 適用于需要高帶寬磁盤訪問的場合

RAID 1: Disk Mirroring/Shadowing

Each disk is fully duplicated onto its “mirror”: Very high availability can be achieved

Bandwidth sacrifice on write: Logical write = two physical writes (并行寫入磁盤及其鏡像盤，且不需要計算校驗信息，因此寫入速度比級別更高的 RAID 都快)
Reads may be optimized: 從 RAID 1 讀取數據時，磁盤及其鏡像盤可獨立地同時工作，由最先讀出數據的磁盤提供數據
Most expensive solution: 100% capacity overhead

RAID 2: 位交叉式海明編碼陣列

每個數據盤存放數據字的一位，按位交叉存放，即 Disk0 存放所有數據字的第 0 位，Disk1 存放第 1 位… 各個數據盤上的相應位計算海明 Hamming 校驗碼，編碼位被存放在多個校驗（Ecc）磁盤的對應位上
從數據盤讀數據時，也要讀出 Hamming 碼，用于判斷數據是否有錯并加以糾正 (Hamming 碼可以糾正 1 位錯誤、檢測兩位錯誤)

需要多個磁盤來存放海明校驗碼信息，冗余磁盤數量與數據磁盤數量的對數成正比（ $log_2m$ ， $m$ 為數據盤的個數）

RAID 3: Bit-interleaved Parity Disk

位交叉奇偶校驗盤陣列

當某個磁盤發生故障時，磁盤控制器本身就能發現哪個磁盤出錯，因此不需要采用復雜的 Hamming 碼，使用奇偶校驗即可

Logically, a single high capacity, high transfer rate disk: good for large transfers 單盤容錯并行傳輸 (細粒度磁盤陣列，即條帶寬度較小 (1 個字節或 1 位)。因此對于絕大多數 I/O 請求都需要磁盤陣列中所有磁盤為之服務，因此能獲得很高的數據傳輸率)
$1 / N$ capacity cost for parity if $N$ data disks and $1$ parity disk
- Wider arrays reduce capacity costs, but decreases reliability/availability

RAID3 讀寫特點

假定：有 4 個數據盤和一個冗余盤
- 讀出數據，一共需要 5 次磁盤讀操作 (同時讀 4 個數據盤和一個冗余盤)
- 寫數據需要 3 次磁盤讀和 2 次磁盤寫操作

RAID 4: Block-interleaved Parity Disk

塊交叉奇偶校驗磁盤陣列

Inspiration for RAID 4

在 RAID 3 中，一次磁盤訪問將對磁盤陣列中的所有磁盤進行操作。RAID 4 希望使用較少的磁盤參與操作，以使磁盤陣列可以并行進行多個數據的磁盤操作

RAID 4 數據以塊交叉的方式存于各盤，奇偶校驗信息存在一臺專用盤上 (parity disk)，冗余代價與 RAID 3 相同 (采用粗粒度的磁盤陣列，即采用比較大的條帶(塊)為單位進行交叉存放和計算奇偶校驗)；訪問數據的方法與 RAID 3 不同
- Small read: every block has an error detection field——每個磁盤獨立的進行讀操作；Allows independent reads to different disks simultaneously (只有磁盤出現故障時，才會讀校驗盤，進行數據重建)
  - To catch errors on read, rely on error detection field vs. the parity disk
- Large write: 寫入操作時，由于要重新計算校驗碼，因此幾乎要訪問所有磁盤

RAID 5: Block-interleaved Distributed Parity

Inspiration for RAID 5

Small writes (write to one disk): since P has old sum, compare old data to new data, add the difference to P

Small Write Algorithm

1 Logical Write = 2 Physical Reads + 2 Physical Writes

Problems of Disk Arrays: Small Writes

Small writes are limited by Parity Disk:
- Write to $D_0$ , $D_5$ both also write to P disk (因此還是不能同時寫 $D_0$ 和 $D_5$ )

RAID 5: High I/O Rate Interleaved Parity

塊交叉分布式奇偶校驗盤陣列

為了解決上面的問題，把校驗信息分布到磁盤陣列中的各個磁盤上，無專用冗余盤，每一行數據塊的校驗塊被依次錯開、循環地存放到不同盤中，使奇偶校驗信息均勻分布在所有磁盤上
- Independent writes possible because of interleaved parity

RAID 6: 雙維奇偶校驗獨立存取盤陣列

Inspiration:

Recovering from 2 failures

RAID6 特點

雙維奇偶校驗獨立存取盤陣列: 在 RAID5 的基礎上增加了一個獨立的校驗信息，放在另一個校驗盤中，寫入數據要訪問 1 個數據盤和 2 個冗余盤，可容忍雙盤出錯
數據以塊交叉方式存于各盤，檢、糾錯信息均勻分布在所有磁盤上

RAID 的實現

軟件方式：陣列管理軟件由主機來實現
- 優點：成本低；
- 缺點：過多地占用主機時間，帶寬指標上不去
陣列卡方式：把 RAID 管理軟件固化在 I/O 控制卡上，從而可不占用主機時間，一般用于工作站和 PC 機
子系統方式：這是一種基于通用接口總線的開放式平臺，可用于各種主機平臺和網絡系統

Storage Environment

Direct Attached Storage (DAS)

直連

Servers connect directly to the disk array typically via a SCSI interface.

Network Attached Storage (NAS)

網絡附加存儲——網絡上的文件系統

Server 用來提供服務，有另外一套專門的體系負責存儲
NAS Devices access the disks in an array via direct connection or through external connectivity

Storage Area Network (SAN)

存儲區域網絡——網絡上的磁盤

Servers access the disk array through a dedicated network designated as SAN (consists of Fibre Channel switches) (專門構建一個網絡進行存儲介質和服務器之間的交互)

總結

以上是生活随笔為你收集整理的Storage Systems的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。