Multipath Intro
Multipath即多路徑,是個通用概念。這裡要介紹的是開源的儲存多路徑技術,也就是DM multipath。有關multipath介紹不少,這裡 主要記錄我對multipath最初幾個問題和答案:
-
在沒有企業儲存的情況下,怎麼玩multipath?
-
multipath device 是如何命名的?
-
這種以冒號分割4個數字的裝置地址如2:0:0:1怎麼解釋?
-
什麼是path group?
-
path grouping policy and IO scheduling policy?
在沒有企業儲存的情況下,怎麼玩multipath?
使用虛擬機器和iscsi。裝一虛擬機器,新增塊裝置,新增兩個網絡卡,再用這個塊裝置建一個iscsi target。然後在一個想玩multipath的機器 上面,用iscsi client去連線iscsi target。至此,用lsblk會檢視到原來的塊裝置有兩個裝置節點。
multipath device 是如何命名的?
有時看到一串16進位制數字(WWID), 有時是以mpath為字首的名字(user-friendly name), 有時是任意字母串(alia name)。multipath默 認用的是WWID,為什麼不用好記的名字呢? 好記的名字不能工作的一個情景:根檔案系統不能在multipath裝置上面。好記的名字和 WWID之間的對映是儲存在/etc/multipath/bindings檔案裡的。要訪問這個檔案,根檔案系統必須已經掛載上了,而multipath服務在initrd裡就要開始工作,那個時候還沒有根系統。因此,預設設定為wwid是為了安全。
這種以冒號分割4個數字的裝置地址如2:0:0:1怎麼解釋?
2:0:0:1
裝置地址,數字分別對應:Host:Bus:Target:Lun
。比如我們讓iscsi target走了兩個IP地址,那麼對於同一個裝置只有 host
欄位不同。比如:2:0:0:1
和3:0:01
。
什麼是path group?
起初,我對這個概念有混淆:認為一個真實裝置對應的所有路徑為一個path group,即認為下麵是一個path group:
multipath-demo:~ # multipath -l14945540000000000ccb70d0ceeee4280f8450284d6298b59 dm-0 IET,VIRTUAL-DISK
size=10G features='1 retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 2:0:0:0 sda 8:0 active undef unknown`-+- policy='service-time 0' prio=0 status=enabled
`- 3:0:0:0 sdc 8:32 active undef unknown
其實,dm-0裝置有兩個path group,每個PG都只有一個路徑(真實環境有多條),狀態active
的是正在工作的路徑,狀態enabled
處於備用狀態,並不下發IO。 為此,請教了做multipath的同事Martin:
Please have a look at
http://christophe.varoqui.free.fr/refbook.htmlPath groups are mainly used for active/passive setups, and for caseswhere some paths have a higher latency/lower bandwidth than others(imagine a mirrored storage with mirror legs in different physical
locations, disaster avoidance: the local mirror will be much faster
than remote mirrors).
Only one path group is "active" at any given time. The others are
serving as standby, for the case that all paths in the currently activegroup fail. Depending on the storage array, the host may need to takeexplicit action to switch from one path group to another (e.g. send a
certain SCSI command that forces the storage to activate the stand-by
ports).
If the active path group contains multiple paths, switching between
these paths (more precisely: between those paths in the path groupwhich are not in failed state) is controlled by the "path_selector"
algorithm in the kernel. The are 3 algorithms: "round-robin", "queue-
length", and "service-time". See multipath.conf(5). Switching of paths
inside a path group, unlike switching between path groups, is assumed
to be instanteneous, and to require no explicit action. Regardless
which path selector is in use, every healthy path will receive IO
sooner or later, unless the multipath device is completely idle.
How the paths are grouped into path groups at discovery time isdetermined by the "path_grouping_policy". It's "failover" by default,
meaning that there's a dedicated path group for every path. But
multipath's builtin hardware table sets different defaults for many
real-world storage arrays. For modern setups, "group_by_prio" is often
the best, combined with "detect_prio yes" or or a "prio" setting that
assigns different priority to paths with different quality (e.g."alua", "rdac", or "path_latency").
Path groups are assigned a priority which is calculated as the average
of all non-failed paths in the path group. At startup, the path groupwith the highest prio is set as active PG. When all paths in this PG
fail, the kernel will switch to the next-best PG. When paths in the
best PG return to good state, the "failback" configuration on
determines if, and when, to switch back to the best PG.
path grouping policy and IO scheduling policy?
path grouping policy 預設是failover
, 如martin所說,各裝置廠商預設策略不同,主流的在用group_by_prio
,作用就是把路徑分組。IO scheduling policy預設是service time
, 負責如何在一個PG的路徑中分配IO。對此,Martin給出了詳細的解釋。