Clever cluster

From PlcWiki

(Difference between revisions)

Revision as of 10:59, 21 December 2012

Clever Cluster is HA (high availability) cluster solution that ensures that a critical Clever server keeps running when a hardware failure occurs. The Clever server is a virtual machine that is able to be hosted on several cluster nodes. To achieve that goal, the virtual drive of the Clever server has to be stored on a shared resource.

We utilize a technology called DRBD (Distributed Replicated Block Device) for that purpose. DRBD can be seen as a RAID1 over network. Only one node can be DRBD primary at a time - the other ones are secondary DRBD nodes. The DRBD block device can be mounted only in primary role. In the secondary role, it just synchronizes it's state with the primary DRBD node. To make the synchronization reliable, it's necessary to interconnect the nodes with a direct network link and keep the connection working at all costs. Therefore we recommend bonding two or more dedicated NICs for this single purpose.

When a hardware failure occurs on an active (primary) node, any currently inactive node can be promoted to primary role, the shared storage can be mounted and the virtual machine can be started on such node without much hassle. The new primary node should contain all the changes written to DRBD storage before failure as any modifications are transferred to secondary nodes almost immediately.

Such fail-over procedure is simple but would require human intervention. We also need to make sure just and only one node acts as a primary at a time. To automatize the whole process, we need an enterprise-level cluster stack.

The cluster stack of choice in our case is Pacemaker+Corosync(OpenAIS implementation) - a solution preferred by both Red Hat and Novell.

More about Pacemaker here: http://www.clusterlabs.org and here: http://www.slideshare.net/DanFrincu/pacemaker-drbd .

Pacemaker's job is to keep replicated DRBD storage mounted on one and only one active node and, as a dependency, to move the virtualized Clever server to such node. At the same time it tests external connectivity on all nodes using pingd (sends ICMP packets to gateway) and automatically promotes to primary role a node that has not lost the connectivity.

As a result of defined dependencies, the virtualized server gets moved as well as the DRBD storage. The migration is done by hibernating on the original node first (in case of controlled migration) and by resuming on the target node then. The whole operation takes about 2 minutes or less, even e.g. an open SSH session survives this operation. Migration can be initiated manually: with a command or with a special application.

Another resource, that is automatically moved together with the primary DRBD storage is a special IP address denoting the active node.

It is recommended to install Linux Cluster Management Console for convenient cluster management: http://sourceforge.net/projects/lcmc/files/

Z konzole je možné cluster ovládat pomocí LSB služby clevervm, kterou jsem k tomuto účelu vytvořil (až na výjimky nezáleží na jakém uzlu se příkazy zadávají):

service clevervm status

Zobrazí podrobný stav cluster, příklad:

============
Last updated: Thu May 31 12:42:05 2012
Last change: Thu May 31 12:33:14 2012 via cibadmin on sasmbohost1.mbo.sas-automotive.com
Stack: openais
Current DC: sasmbohost1.mbo.sas-automotive.com - partition with quorum
Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
2 Nodes configured, 2 expected votes
8 Resources configured.
============

Node sasmbohost2.mbo.sas-automotive.com: online
        res_drbd_1:1    (ocf::linbit:drbd) Master
        res_Filesystem_1        (ocf::heartbeat:Filesystem) Started
        res_clevervm_1  (lsb:clevervm) Started
        res_libvirtd_libvirtd   (lsb:libvirtd) Started
        res_ping_gateway:1      (ocf::pacemaker:ping) Started
        res_IPaddr2_CIP (ocf::heartbeat:IPaddr2) Started
Node sasmbohost1.mbo.sas-automotive.com: online
        res_ping_gateway:0      (ocf::pacemaker:ping) Started
        res_drbd_1:0    (ocf::linbit:drbd) Slave

Inactive resources:


Migration summary:
* Node sasmbohost2.mbo.sas-automotive.com:
* Node sasmbohost1.mbo.sas-automotive.com:

====== DRBD storage =======

  1:a3/0  Connected Primary/Secondary UpToDate/UpToDate A r----- /opt ext4 253G 107G 134G 45%

==== Virtual machines =====

---------------------------
Name:           plcmba7krnl
State:          running
Managed save:   no
---------------------------

service clevervm start

Nastartuje virtuální stroj(e). Pokud je ovšem v Pacemakeru služba označena jako zastavená, dojde k nekonsistenci a bude nutné stav nějak ošetřit. Doporučuji nepoužívat, stejně jako startování virtuálního stroje pomocí virsh, jehož je tento příkaz obdobou.

service clevervm forcestart

Nastartuje virtuální stroj(e) pomocí Pacemakeru. To se hodí se, pokud byla předtím služba označena v Pacemakeru jako zastavená.

service clevervm stop

Zastaví (hibernuje) virtuální stroj(e). Pokud je ovšem v Pacemakeru služba označena jako spuštěná, Pacemaker ji znovu nastartuje. Doporučuji nepoužívat, stejně jako hibernování virtuálního stroje pomocí virsh, jehož je tento příkaz obdobou.

service clevervm forcestop

Zastaví virtuální stroj(e) pomocí Pacemakeru a označí službu jako zastavenou.

service clevervm cleanup

Vyčistí stav služby - smaže případné záznamy o selháních, čímž umožní službu znovu nastartovat, pokud byl dosažen počet povolených selhání na daném uzlu.

service clevervm migrate

Donutí Pacemaker, aby migroval primární DRBD úložiště (a tím i všechny závislé služby) pryč z tohoto uzlu (tj. na protější uzel). Tento uzel bude označen jako nezpůsobilý k provozu služeb. Pro odejmutí tohoto omezení je nutné zavolat po migraci příkaz unmigrate.

service clevervm unmigrate

Odebere od tohoto uzlu příznak "nezpůsobilý k provozu služeb". Neznamená to automatický přesun služeb zpět, jelikož funguje určitá setrvačnost (stickiness), zabraňující službám fluktuovat.

Split-brain

DRBD split-brain recovery: http://www.drbd.org/users-guide/s-resolve-split-brain.html

Clever cluster

From PlcWiki

Revision as of 10:59, 21 December 2012

Split-brain

Views

Personal tools

Navigation

Search

Toolbox

@@ Line 23: / Line 23: @@
 At the same time it tests external connectivity on all nodes using pingd (sends ICMP packets to gateway) and automatically promotes to primary role a node that has not lost the connectivity.
-Současně s DRBD úložištěm se díky závislostem přesouvá i virtualizovaný kernel. Přesun kernelu probíhá tak, že se hibernuje na původním uzlu a obnoví svou činnost na cílovém uzlu.
+As a result of defined dependencies, the virtualized server gets moved as well as the DRBD storage. The migration is done by hibernating on the original node first (in case of controlled migration) and by resuming on the target node then.
-Celá operace obvykle proběhne do dvou minut (typicky za 90s), aniž by se např. přerušily otevřené SSH relace.
+The whole operation takes about 2 minutes or less, even e.g. an open SSH session survives this operation.
-Migraci lze vyvolat i ručně: buď příkazem nebo speciální aplikací.
+Migration can be initiated manually: with a command or with a special application.
-Dalším prostředkem, který se automaticky přesouvá spolu s primárním DRBD úložištěm je IP adresa 10.48.1.48, kterou jsem si k tomuto účelu dovolil použít (tj. je to označení uzlu, na němž aktuálně běží kernel).
+Another resource, that is automatically moved together with the primary DRBD storage is a special IP address denoting the active node.
-Pro pohodlné ovládání clusteru je dobré si nainstalovat Linux Cluster Management Console:
+It is recommended to install Linux Cluster Management Console for convenient cluster management:
 http://sourceforge.net/projects/lcmc/files/
 [[File:lcmc.png]]