Architecting HA and DR solutions, including PowerHA 7.1.3 (SE) Migration [2014] by realbjornroden

Björn Rodén (roden@ae.ibm.com) http://www.ibm.com/systems/services/labservices/ http://www.ibm.com/systems/power/support/powercare/

Architecting HA and DR solutions, including PowerHA 7.1.3 (SE) Migration

Session Objectives • This session discuss considerations for designing and deploying high and continuously available IT systems. – The session start from identifying business impact, risk & threats, discuss KPIs and metrics such as RTO/RPO/MTTR, but focus on designing and planning deployment of IT continuity design with verification and validation, and on maintaining the solutions until decommission. – And how to leverage PowerHA SystemMirror 7.1 Standard Edition

objective

Björn Rodén

You will learn how to approach high availability solution design, planning and implementing.

Business challenges & needs

• Information management for business processes needs to… – Ensure appropriate level of service – Manage risks (mitigate, ignore, transfer) – Reduce cost (CAPEX/OPEX)

93% 40% of companies that suffer a massive data loss will never reopen 1

of companies that lost their data center for 10 days or more due to a disaster filed for bankruptcy within one year of the disaster2

Reference: (1) “Disaster Recovery Plans and Systems Are Essential”, Gartner Group, 2001 Reference: (2) US National Archives and Records Administration

Björn Rodén

Information management – availability challenge

More business on-line Higher end-user expectations Greater dependency on applications Increasing impact of information unavailability

The Widening Gap

Ability to meet expectations through

restore – restart Fewer independent application systems Smaller window to restore or recover Less tolerance for downtime Björn Rodén

Business Continuity in IT perspective

BjĂśrn RodĂŠn

Business Continuity

Ability to adapt and respond to risks as well as opportunities in order to maintain continuous business operations

High Availability

The attribute of a system to provide service during defined periods, at acceptable or agreed upon levels and masks unplanned outages

Disaster Recovery

Capability to recover a data center at a different site if the primary site becomes inoperable

Continuous Operations

The attribute of a system to continuously operate and mask planned outages

ÂŠ Copyright IBM Corporation 2014

IT Availability Life cycle

architecture, solution design, deployment, governance, system

maintenance and change management, skill building, migration and decommissioning …

Björn Rodén

A lot to analyze, plan, do and check…

DESIGN > BUILD > OPERATE > REPLACE

What protection is the solution expected to provide? Global Distance Recovery Compliance

Data Loss or Corruption

Björn Rodén

Metro Distance Recovery High Availability

Single System Failure Human error Software error Component failures Single system failures

Local Disaster Human error Electric grid failure HAVC or power failures Burst water pipe Building fire Architectural failures Gas explosion Terrorist attack

Regional Disaster Electric grid failure Floods Hurricanes Earthquakes Tornados Tsunamis Warfighting

Balance business impact vs. solution costs

Consider the whole solution lifecycle

Cost

Down Time Costs (Business Impact)

Total Cost Balance1

Needs & Reqs

Solution Costs

Solution Costs (CAPEX/OPEX)

Balance

Down Time Costs

Risk

Business Recovery Time

(1): Quick Total Cost Balance (TCB) = TCO or TCA + Business Down Time Costs Björn Rodén

Expectations vs Requirements vs Interpretations

Expectations

Requirements

Interpretations Proprietary

RTO RPO MTTR Degree of Availability …

Björn Rodén

Open Source (+ a Chef…) 6-7 (500 gram) beetroots 2-3 onions 1-2 carrots 250 grams of cabbage 2-3 tbsp butter 1 1/2 liters of bullion 1 pinch black pepper 1-2 laurel leaves 1-2 tbsp vinegar Some salt Sour cream or equiv for serving How To: chop, cut, crush, boil, stir, serve. 9

IT service implementation process example

Failure to verify non-functional AVAILABILITY requirements can create exposures – you will know if it works when you need it Björn Rodén

Brief systematic approach IT services continuity with Availability governance focus: 1. 2. 3. 4.

Identify critical business processes (from BIA/BCP) Identify risk & threats (from BIA/BCP) Identify business impacts & costs (from BIA/BCP) Identify/Decide acceptable levels of service, risk, cost (from BIA/BCP)

---------------------------------------------------------------------------------------------5. 6. 7. 8. 9. 10. 11.

Define availability categories and classifying business applications according to business impact of unavailability Architect Availability infrastructure Design solution from Availability architecture Plan Availability solution implementation Build Availability solution Verify Availability solution Operate and Maintain deployed Availability solution

---------------------------------------------------------------------------------------------12. 13.

Validate Availability solution SLO, implementation, design and architecture Decommission/Migrate/Replace BIA – Business Impact Analysis BCP – Business Continuity Plan SLO – Service Level Objectives

Björn Rodén

Get the ducks in a row

• Know why – Business and regulatory requirements – Services, Risks, Costs – Key Performance Indicators (KPIs)

• Understand how – Architect, Design, Plan

• Can implement – Build, verify, deploy, skill-up

• Will govern – – – – –

Service and Availability management Change, Incident and problem management Security and Performance management Capacity planning Migrate, replace and decommission

Björn Rodén

Key IT Availability Metrics

Björn Rodén

What are your key Availability Requirements?

Recovery Time Objective (RTO) How long time can you afford to be without your systems?

Recovery Point Objective (RPO) How much data can you afford to recreate or lose?

Maximum Time To Restart/Recover (MTTR) How long time until services are restored for the users?

Degree of Availability (Coverage Requirement) Annual percentage of a given time period when the business service should be available?

Björn Rodén

Notes on Degree of Availability • IT service availability can be measured in percentage of a given time period when the business service is available for it’s intended purpose – Usually expressed with a number of nines (9) over a year (rounded): • 99% => 88 hours/year • 99.9% => 9 hours/year • 99.95% => 4 1/2 hours/year • 99.99% => 52 min/year • 99.999% => 5 min/year • 99.9999% => ½ min/year

• IT system vs. IT service (ripple effect) – e.g. IT service dependent on five IT systems, if all target levels are met but not at the same time: • (99.9*99.9*99.5*99.5*99.0)/1005 => 97.82% or 191-192h/period total degree • MIN(99.9*99.9*99.5*99.5*99.0) => 99.0% or 88h/period as the highest degree indicator

• Determine the time period for the degree of availability – Are time for planned maintenance excluded during the year? •

Such as planned service windows and/or fixed number of days per month/quarter

– How many hours are used per year • Calendar year hours – 8760 h for 365 days non-leap years – 8784 h for 366 days leap years

• Decided amount of time per year (global coverage with 24 time zones, add one day) – 365 days (non-leap), then if global coverage add 24h d/y=366 or 8784 h – 366 days (leap), then if global coverage add 24h d/y=367 or 8808 h Björn Rodén

Common Availability and Disaster requirements

High Availability • • • • • • • • •

Disaster Tolerance

RPO – zero (or near zero) data loss RTO – measured in minutes at the most NRO – zero PRO – zero from UPS & generator Coverage Requirement (e.g. 24x7 / 24x365) Degree of Availability (e.g. 99.9% or ~9h/year) No single point-of-failure (SPOF) – System level Geographic affinity (Metro distance) Automatic failover/continuance/recovery to redundant components including application components – up to in-flight transaction integrity

• RPO – near zero data loss (may require manual recovery of orphaned data) • RTO/NRO – measured in hours, days, weeks • PRO – depend on generator fuel storage • Maximum Tolerable Period of Degraded Operations • Maximum Time To Restart/Recover (MTTR) • Business Process Recovery Objective (BPRO) • No single point-of-failure (SPOF) – DC level • Geographic dispersion (Global distance) • Declaring disaster is a management decision • Rotating site swap or periodic site swap • Full or Partial swap

Timeline Checkpoint in Time

RPO

Outage

Minimum Service Delivery

System repair

Service Delivery at 100%

New Business RTO

Your Recovery Objectives - Example

PRO – Power Recovery Objective NRO – Network Recovery Objective DOT – Degraded Operations Tolerance

Björn Rodén

Identify Points of Failure

Björn Rodén

Review your Availability Architecture •

Is the Availability Architecture still in place? –

–

Björn Rodén

Or might it have been altered when performing changes for: • Servers • Storage • Networks • Data Centres • Software upgrades • IT Service Management • Staffing • External suppliers and vendors Assumption: • The longer time duration an IT environment is exposed to opportunities for human error, the risk increase for deviation between Reality (facts on the ground) and the Availability Architecture (the map) Key areas: • Redundancy and Single Points of Failure (SPOF) • Communication flow and Server Service Dependencies • Local Area Network and Storage Area Network cabling • Application, system software and firmware currency • Staff attrition, mobility and cross skill focus

Identify critical IT resources – information flow perspective

DON’T FORGET

Business process information flow

CORE SYSTEMS

Information providing systems

Depend-on

Information receiving systems

Needed-by

Buffer time Degree of Availability

Björn Rodén

DON’T FORGET

Buffer time Degree of Availability

Degree of Availability

Identify critical IT resources – deployment connectivity perspective • Protocols (colors): – – – – – – – – –

RMI / IIOP HTTP / HTTPS CIFS NFS LPD / IPP MQ DB2 JDBC Java serializing

Björn Rodén

Redundancy and Single Points of Failure (SPOF)

Find the

Enterprise environment

SPOF

Site environment Data Centre environment Server Storage

Server

Storage

MAN WAN

Application SAN

Middleware Operating System & System Software

UPS Gen.

Local Area Network Storage Area Network

Logical/Virtual Machine

Kernel stack

Physical Machine Network

Storage Hypervisor

Hardware (cores, cache, nest)

Björn Rodén

Redundancy and Single Points of Failure (SPOF)

ISP (external)

FW/IPS

Find the

SPOF

Routers

Switches Network Servers Storage

Switches

Storage

Björn Rodén

Redundancy and Single Points of Failure (SPOF)

Find the Your major goal throughout the planning process is to eliminate single points of failure and verify redundancy.

SPOF

A single point of failure exists when a critical Service function is provided by a single component. If that component fails, the Service has no other way of providing that function, and the application or service dependent on that component becomes unavailable. http://publib.boulder.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.powerha.plangd/ha_plan_over_ppg.htm

Björn Rodén

Application and data resiliency examples

• Application – Application restart after node failure (physical/virtual) • active / standby (automatic/manual) – Application concurrency (cluster horizontal scaling) • active / active (separate or shared transaction tracking)

• Data – Single site, single or dual storage • Storage based controlled by host (Hyperswap) • Host based (LVM mirroring/GPFS) • Database based (transaction replication) – Dual site, dual storage • Storage based (Metro/Global mirror) • Host based (GLVM/GPFS) • Database based (transaction replication)

Björn Rodén

Calculation examples for redundancy – Vital business function/process depending on three systems -

Uptime/year: 24*365=8760 System#1: 12+(4*4)=28 (8760/(8760+28)=0.9968 (incident + planned service window, 1/q) System#2: 12*2=24 (8760/(8760+24)=0.9973 (planned service window, 1/m) System #3: 2*3=6 (8760/(8760+6)=0.9993 (planned service window, 2/y) End to end availability: (0.9968 * 0.9973 * 0.9993) = 0.9934 * 100 = 99.34% Can be used as a baseline for improvement

Availability =

– Estimated failure rate for continuously used disk -

Uptime/year: 24*365=8760 Storage system #1: 14*7=98 Storage system #2: 14*7*4=392 MTBF: 300,000h Estimated disk failures per year: 8760/300k*490≈14 disks that might fail during one (1) year Can be used to increase awareness and motivate RAIDXX configuration

AFR =

uptime uptime + downtime

hours per year * n − disks MTBF

– Reliability indicator example -

Björn Rodén

MTBF: 300,000h MTTR: 20h (here including total service down time with Administrative or Logistic Downtime/ALDT) Reliability: 300,000/(300,000+20) * 100 = 99.99% MTBF MTBF: 100,000h Reliability = MTBF + (MTTR + ALDT) MTTR: 1h Reliability: 100,000/(100,000+1) * 100 = 99.999% MTBF Can be used to illustrate how difficult it is to obtain 100% Reliability = MTBF + MTTR © Copyright IBM Corporation 2014

PowerHA SystemMirror

Björn Rodén

PowerHA SystemMirror Edition basics • PowerHA SystemMirror for AIX Standard Edition – Cluster management for the data center • Monitors, detects and reacts to events • Multiple channels heartbeat between the systems > Network > SAN > Central Repository

• Enables automatic switch-over – SAN shared storage clustering – Smart Assists • HA agent Support – Discover, Configure, and Manage • Resource Group Management – Advanced Relationships • Support for Custom Resource Management • Out of the box support for – DB2, WebSphere, Oracle, SAP, TSM, LDAP, IBM HTTP, etc

• PowerHA SystemMirror for AIX Enterprise Edition – Cluster management for the Enterprise (Disaster Tolerance) • Multi-site cluster management • Automated or manual confirmation of swap-over • Third site tie-breaker support • Separate storage synchronization – Metro Mirror, Global Mirror, GLVM, HyperSwap with DS8800 (<100KM)

Björn Rodén

PowerHA SystemMirror support •

PowerHA 6.1 End of Support (EOS): 30-Apr-2015 (extended from 30-Sep-2014) – End of Support (EOS) is the last date on which IBM will deliver standard support services for a given version/release of a product. –

Björn Rodén

Any further service support extension, you will find it on this website: http://www-01.ibm.com/software/support/aix/lifecycle/index.html

Eliminating SPOF by using redundant components Cluster components

To eliminate as single point of failure

PowerHA SystemMirror supports

Nodes Power sources

Use multiple nodes Use multiple circuits or uninterruptible power supplies

Up to 16. As many as needed.

Networks

Use multiple networks to connect nodes

Up to 48.

Network interfaces, devices, and labels

Use redundant network adapters

Up to 256.

TCP/IP subsystems

Use networks to connect adjoining nodes and clients

As many as needed.

Disk adapters Controllers Disks

Use redundant disk adapters As many as needed. Use redundant disk controllers As many as needed. Use redundant hardware and disk mirroring, striping, or both As many as needed.

Applications

Assign a node for application takeover, to configure an Flexible configuration policies for high availability within a application monitor, and to configure clusters with nodes at site and between sites. more than one site.

Sites

Use more than one site for disaster recovery.

Resource groups

Use resource groups to specify how a set of entities should Up to 64 per cluster. perform.

Cluster resources

Use multiple cluster resources.

Up to 128 for the clinfo daemon (more can exist).

Virtual I/O Server (VIOS)

Use redundant VIOS

As many as needed.

HMC Managed System hosting a cluster node

Use redundant HMC Use separate managed systems for each cluster node

Up to 2. Up to 16.

Cluster repository disk

Use RAID protection

One active repository disk per site that has the ability to replace the disk after a failure. You must have a spare disk that is available to replace the failed repository disk in the live cluster.

Up to two sites.

http://publib.boulder.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.powerha.plangd/ha_plan_eliminate_spf.htm Björn Rodén

Some key changes in PowerHA 7.1 vs. 6.1 •

Architectural changes from PowerHA 6.1 (CAA/RSCT, Heatbeating, RG) –

•

PowerHA 7.1 is built on Cluster Aware AIX (CAA) functionality which provide fundamental clustering capabilities in the base operating system. PowerHA 6.1.0 use Reliable Scalable Clustering Technology (RSCT) for clustering framework.

PowerHA 7.1.3 require AIX 6.1 TL9 SP1 or AIX 7.1 TL3 SP1 –

•

Cluster Aware AIX (CAA) manage the heartbeats, not RSCT –

• •

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101347 CAA use a Repository Disk to store configuration information persistent and must be shared by all cluster nodes

Event management is handled by using AIX pseudo file-system architecture Autonomic Health Advisor File System (AHAFS), not cluster manager and RSCT IP multicast with gossip protocol in 7.1, replaced unicast UDP IP in 6.1 –

• • •

Non-IP networks – diskhb, mndhb, rs232 etc removed No IPAT via Replacement (HW Address Takeover HWAT) Restrictions on changing hostname – –

• •

Communication Path to a node can be set from 7.1.2 (IP address mapping to hostname) Eased further in 7.1.3 (capability to dynamically modify the host name of a clustered node)

Smart Assist technology improved and extended Graphical Cluster Simulator with 7.1.3 –

• • •

With 7.1.3 unicast (TCP) is the default option in addition to multicast

Based on PowerHA ISD plug-in, saved XML config can be deployed

WebSMIT and 2-node cluster assistant removed ISD plug-in introduced (not PowerVC) Priority override location (POL) are not used and persistence after reboot is not retained http://publib.boulder.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.powerha.insgd/ha_install_priority_override.htm

Björn Rodén

Basic implementation flow PowerHA 7.1 vs. 6.1 PowerHA 6.1 1. 2.

3. 4.

5. 6. 7.

PowerHA 7.1

Plan for network, storage, and application – Eliminate single points of failure. Define and configure the infrastructure – Application planning, and start and stop scripts – Networks (IP interfaces, /etc/hosts, non-IP devices) – Storage (adapters, LVM volume group, filesystem) Install the PowerHA filesets. Configure the PowerHA environment: – Topology • Cluster, node names, PowerHA IP, and non-IP networks ----– Resources, resource group, attributes: • Resources: Application server, service label, volume group • Resource group: Identify name, nodes, policies • Add attributes: Application server, service label, VG, filesystem. Synchronize, save configuration (snapshot) Start/stop cluster services Verify, test configuration

Björn Rodén

1. 2.

3. 4.

5. 6. 7.

Plan for network, storage, and application – Eliminate single points of failure. Define and configure the infrastructure – Application planning, and start and stop scripts – Networks (IP interfaces, /etc/hosts, non-IP devices) – Storage (adapters, LVM volume group, filesystem) Install the PowerHA filesets Configure the PowerHA environment: – Topology: • Cluster, node names, PowerHA IP networks, Repository Disk and SFWcomm • Multicast or unicast network for heartbeat • Cluster Aware AIX (CAA) cluster – Resources, resource group, attributes: • Resources: Application server, service label, volume group • Resource group: Identify name, nodes, policies – Add attributes: Application server, service label, VG, filesystem. Synchronize, save configuration (snapshot) Start/stop cluster services Verify, test configuration

Configure PowerHA 7.1 vs. 6.1 PowerHA SE 6.1 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

PowerHA SE 7.1

Clear previous cluster configuration Configure cluster netmon.cf and (optional) rhosts files (optional) Customize cluster communication Create cluster definition Create node definitions Create LAN network definitions Create SAN disk heartbeat network definitions Add boot IP address definitions Add heartbeat disks definitions Add cluster service IP address definitions Create cluster resource group definitions Create cluster application server definitions Customize cluster application server monitoring definitions 14. Customize cluster resource group definitions 15. Verify and synchronize cluster configuration

1. 2. 3. 4. 5. 6. 7. 8.

9. 10. 11. 12. 13. 14. 15.

Clear previous cluster configuration Verify shared repository disk between cluster nodes Configure cluster netmon.cf and rhosts files (AIX 6.1.6) (optional) Customize cluster communication Create cluster definition Create node definitions (will create Node and LAN definitions automatically) Configure Repository disk and IP multicast id (unicast) (optional) Configure FC adapters for Target Mode and zone FC adapters WWPNs for SFWcomm, and Virtual Ethernet if VIOS Verify and synchronize cluster configuration Add cluster service IP address definitions Create cluster resource group definitions Create cluster application server definitions Customize cluster application server monitoring definitions Customize cluster resource group definitions Verify and synchronize cluster configuration

PowerHA 7.1.3 have additional SmartAssists and a new SmartAssist framwork. Björn Rodén

PowerHA 7.1 with dual node single/dual site

Baseline – Ordinary run-of-the-mill dual node cluster – Using Mirror Pools for LVM mirroring – Single Virtual Ethernet adapter per node backed by the same VIOS SEA LAGG PowerHA cluster

HA1 LPAR

HA2 LPAR

– Set "Communication Path to Node" to the cluster nodes hostname network interface – netmon.cf configured for ping outside the box from partition (cluster file) – /usr/es/sbin/cluster/netmon.cf – rhosts configured cluster nodes (cluster file) – /etc/cluster/rhosts – netsvc.conf configured with DNS (system file) – /etc/netsvc.conf

– Single or dual SAN Fabric – If dual sites, within a few km distance for minimal latency and throughput degradation

LVM Mirror

– Single LAN with ISL – If dual sites, use VLAN spanning Single or Dual Enterprise Storage

Björn Rodén

If the cluster node (partition) have multiple Virtual Ethernet adapters, set the "Communication Path to Node" to the IP address and Virtual Ethernet network interface device which maps to the hostname. © Copyright IBM Corporation 2014

PowerHA 7.1 with dual node single/dual site

Multicast between nodes •

Multicast is optional from 7.1.3 •

•

Default with 7.1.3 is TCP unicast

If desired, verify multicast is working between nodes before creating the 7.1 cluster –

PowerHA cluster

HA1 LPAR

HA2 LPAR

•

Check assigned multicast IP: –

•

lscluster -i | grep -i multi

Test with the mping command: – – –

LVM Mirror

Multicast IP can be set manually, or CAA will assign one based on the nodes lower 24-bit IP address after upper 8-bit multicast of 228, such as: 192.1.2.3 => 228.1.2.3

Start receiver first • mping -r -c 100 Start sender • mping -s -c 100 Use the -a <multicastip> flag to set the multicast address to be used by mping

Single or Dual Enterprise Storage

http://publib.boulder.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.powerha.trgd/ha_trgd_test_multicast.htm Björn Rodén

PowerHA 7.1 with dual node single/dual site

Repository Disk • • • • PowerHA cluster

HA1 LPAR

HA2 LPAR

• • •

Access from all nodes and paths Raw disk device driver I/O Direct access by CAA only Minimum 512MB and no larger than 460 GB, but do ~10GB Define a spare for the repos disk Do not manually write to the repos disk ! Check repos disk status – – – –

clras clras clras clras

lsrepos dumprepos dumprepos -r <reposdisk> dpcomm_status

LVM Mirror

Single or Dual Enterprise Storage

http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.clusteraware/claware_repository.htm Björn Rodén

PowerHA 7.1 with dual node single/dual site

Storage Framework •

Fibre Channel adapters with target mode support only – –

PowerHA cluster

HA1 LPAR

HA2 LPAR

–

•

All physical FC adapters WWPNs zoned – –

TM-ZONE •

LVM Mirror

• Single or Dual Enterprise Storage

One Fabric supported with SFWcomm For dual Fabric, it is supposed to work, if it do not work with your implementation and system software levels, please open a PMR with IBM Support

LPM do not migrate SFWcomm configuration –

•

Attribute on fcsX tme=yes Attribute on fscsiX dyntrk=yes and fc_err_recov=fast_fail Enable the new settings, such as through reboot

It is recommended that SAN communication be reconfigured after LPM is performed

Datalink layer communication over VLAN between AIX cluster node and VIOS with the physical FC adapters Check SFWcomm status – – –

lscluster -i sfwinfo -a clras sancomm_status

http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.clusteraware/claware_comm_setup.htm http://publib.boulder.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.powerha.concepts/ha_concepts_ex_san.htm Björn Rodén

PowerHA IP heartbeating over VIOS SEA •

Network heartbeating is used as a reliable means of monitoring an adapter's state over a long period of time. –

•

When heartbeating is broken, a decision has to be made as to whether the local adapter has gone bad, or the neighbor (or something between them) has a problem. – The local node only needs to take action if the local adapter is the problem; if its own adapter is good, then we assume it is still reachable by other clients regardless of the neighbor's state (the neighbor is responsible for acting on its local adapters failures). – This decision (local vs remote bad) is made based on whether any network traffic can be seen on the local adapter, using the inbound byte count of the interface. – Where Virtual Ethernet is involved, this test becomes unreliable since there is no way to distinguish whether inbound traffic came in from the VIO server's connection to the outside world, or just from a neighbouring VIO client (This is a design point of VIO that its virtual adapters be indistinguishable to the LPAR from a real adapter). For PowerHA 6.1 use netcmon.cf facility which is part of the cluster.* fileset –

•

RSCT Topology Services

For PowerHA 7.1 use netmon.cf –

RSCT Group Services

–

For pre 7.1.3 it is part of rsct.basic.* PTF U852423 or U851850 •

Install the PTF, refer to APAR: – http://www-01.ibm.com/support/docview.wss?uid=isg1IV14422

•

Configure netmon.cf according to APAR: – http://www-01.ibm.com/support/docview.wss?uid=isg1IZ01331

Björn Rodén

Cluster Topology Configuration – netmon.cf facility Without this feature network link and network switch failure will not be properly detected by the cluster node.

•

For single adapter PowerHA network adapters use the netmon.cf configuration file: – /usr/es/sbin/cluster/netmon.cf When netmon needs to stimulate the network to ensure adapter function, it sends ICMP ECHO requests to each IP address. After sending the request to every address, netmon checks the inbound packet count before determining whether an adapter has failed or not. Specify remote hosts that are not in the cluster configuration and that can be accessed from PowerHA interfaces, and who reply consistently to ICMP ECHO without delay, such as default gateways and equiv. Up to 32 different targets can be provided for each interface, if *any* given target is pingable, the adapter will be considered up (ICMP ECHO).

!REQD <owner> <target> Parameters: ---------!REQD : An explicit string; it *must* be at the beginning of the line (no leading spaces). <owner> : The interface this line is intended to be used by; that is, the code monitoring the adapter specified here will determine its own up/down status by whether it can ping any of the targets (below) specified in these lines. The owner can be specified as a hostname, IP address, or interface name. In the case of hostname or IP address, it *must* refer to the boot name/IP (no service aliases). In the case of a hostname, it must be resolvable to an IP address or the line will be ignored. The string "!ALL" will specify all adapters. <target> : The IP address or hostname you want the owner to try to ping. As with normal netmon.cf entries, a hostname target must be resolvable to an IP address in order to be usable.

http://www-01.ibm.com/support/docview.wss?uid=isg1IZ01331 Björn Rodén

Cluster partitioning, aka node isolation or “split brain” 1/2 • Cluster partitioning, aka node isolation or “split brain”, is a failure situation where more than one server acts as a primary. – Partitioning occurs when a cluster node stops receiving all interconnecting heartbeat traffic from its peer-node, and assumes that the peer-node has failed. – Due to the lack of synchronization, a split brain situation is problematic and can cause undesirable behaviour, such as data corruption. – Once the peer-node is determined to be down due to lack of heartbeats, both nodes on each side of the cluster attempt to take over resources (if so configured) from a node that is actually still active and running. – When the interconnection is restored and hearbeats resume, the cluster will merge and at this point, the cluster manager identify that a partitioning has occurred, and the cluster node with the highest node number will stop itself immediately. – During partitioning, if both nodes have acquired its respective peer-nodes resource groups and have had applications running with users connected and updating data for the same application on both nodes separately, data integrity is lost.

Björn Rodén

Cluster partitioning, aka node isolation or “split brain” 2/2 Common approaches regarding cluster partitioning: – Maximize independent interconnects between sites •

Use multiple IP and non-IP interconnects for cluster node heartbeats, with all physical links provided separately, and well isolated from failure at the same time, such as: – Dual IP-networks (LAN), each over separate physical adapters and network switches, and interconnection between cluster node sites. – Dual non-IP-networks (SAN), each over separate physical adapters and network switches, and interconnection between cluster node sites. – Consider using a third network interconnect for heartbeat only between nodes, such as if primary interconnections between nodes/sites use DWDM, use a non-landbased or VPN over ISP connection.

– Use third site as tie breaker •

Using “Tie-Breaker” disk/node/service concept, where a third site disk/node/service is used to choose surviving partition. For PowerHA, please refer to: –

•

http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.powerha.admngd%2Fha_admin_mergesplit_policy_713.htm

Optimally also use separate physical interconnect from each cluster node site to the third site.

– Classify node-failure as site-down event and/or start secondary by operator • •

Active site declares itself down and expect that secondary site will take over the failed services, secondary site takes over services if communication is lost to active site. Active site declares itself down, and secondary site is started by operator.

– Accept as-is •

Decide that the risk for partitioning occurring is unlikely, the cost for redundancy is too high, and accepting longer downtime relying on backup restore in case of data inconsistency.

NOTE: External access to nodes can still be available to primary site, even if site interconnects fail. Björn Rodén

PowerHA/EE Merge/Split Policy Options

Policy Setting

Split

Merge

Comments

Majority Rule

>N/2 side wins (N= total nodes in the cluster) In case of a tie, side with the smallest node id wins

Tie Breaker

Tie break holder side wins

Manual

Operator interventions enabled for split/merge processing

•

Tie breaker policy – A means of determining the winner when a split-site condition occurs – Losing side is quiecsed

•

Tie breaker policy options – Majority rules (site with largest number of nodes wins) – SCSI 2 or 3 tie breaker reservation disk (first one wins) – Operator intervention (operator decides)

X Majority rules Björn Rodén

Manual (operator controlled failover)

• Split/Merge Policies – – – –

Administrator prompts Cluster will wait for Admin inputs Optional Policy: After N prompts allow auto-recovery Custom action scripts can invoked at the time of split or merge as well

• Defaults – Number of prompts (N)=infinite – Interval between notifications: once in 30 seconds and then increasing in frequency – Auto-Recovery after N prompts

site down

cluster split

Björn Rodén

Migrating to PowerHA 7.1.3

Björn Rodén

Migration process to PowerHA 7.1.3 from 6.1 1. 2. 3.

Verify current PowerHA 6.1 availability functionality – Run cluster verification and make sure no errors are reported Verify PowerHA 7.1 preconditions, heartbeat networks and SPOFs AIX upgrade – Upgrade all nodes in the cluster to AIX 6.1 TL9 SP1 or AIX 7.1 TL3 SP1 or higher •

Migrate the PowerHA 6.1 cluster – Rolling migration •

–

This type of migration involves bringing down the entire PowerHA cluster, reconfiguring the active cluster to fit, installing the new PowerHA and restarting cluster services one node at a time.

Snapshot upgrade •

–

You can upgrade a PowerHA cluster while keeping your applications running and available, during the upgrade process, a new version of the software is installed on each cluster node while the remaining nodes continue to run the earlier version.

Offline upgrade •

This type of migration involves bringing down the entire PowerHA cluster, reconfiguring the snapshot configuration, installing the new PowerHA and restarting cluster services one node at a time.

New install and configure •

Leverage altdisk install and rotating one node at a time http://publib.boulder.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.install/doc/insgdrf/alt_disk_migration.htm

Design and install PowerHA cluster from scratch.

Verify cluster and high availability functionality – Cluster system functionality tests – Component failure tests – Failure scenario tests

Björn Rodén

Todo before migration •

Software levels for currency –

Upgrade AIX and RSCT to supporting levels and ensure that the same level of cluster software (including PTFs) are on all nodes before beginning a migration •

– –

• • • •

AIX 6.1 TL9 SP1

• AIX 7.1 TL3 SP1 • RSCT 3.1.2 or later Ensure that the PowerHA cluster software is committed (not applied) When performing a rolling migration, all nodes in the cluster must be upgraded to the new base release before applying any updates for that release

Run cluster verification and make sure no errors are reported Take a snapshot of the cluster configuration Backup and mksysb Use the /usr/sbin/clmigcheck tool 7.1

AIX 6.1 TL6+ AIX 7.1

7.1.1

AIX 6.1 TL7 SP2 AIX 7.1 TL1 SP2

RSCT 3.1.2.0 or higher for both AIX 6.1 and 7.1

7.1.2

AIX 6.1 TL8 SP1 AIX 7.1 TL2 SP1

RSCT 3.1.2.0 or higher for both AIX 6.1 and 7.1

AIX 6.1 TL9 SP1 AIX 7.1 TL3 SP1

RSCT 3.1.2.0 or higher for both AIX 6.1 and 7.1

7.1.3

AIX 6.1 RSCT 3.1.0.0 or higher AIX 7.1 RSCT 3.1.0.0

The "Communication Path to Node" on the PowerHA cluster nodes must be set to an IP-address mapping to the hostname. All cluster node hostnames must be resolved locally using the /etc/hosts file (IP address and label), use netsvc.conf, irs.conf or NSORDER in /etc/environment to set the order. Pre-7.1.3: After you have synchronized the initial cluster configuration, it is not supported to change the hostname or IP resolution of the hostname.

http://www-01.ibm.com/support/knowledgecenter/SSPHQG_7.1.0/com.ibm.powerha.insgd/ha_install_required_aix.htm Björn Rodén

Todo before migration •

Verify cluster conditions and settings – – – –

• •

Take a snapshot of the cluster configuration and save off customized scripts, such as start, stop, monitor and event script files Remove configurations which can’t be migrated – – – – –

•

Configurations with IPAT via replacement or hardware address takeover (MAC address) Configurations with heartbeat via IP aliasing Configurations with non-IP networking, such as RS232, TMSCSI/SSA, DISKHB or MNDHB Configurations which use other than Ethernet for network communication, such as FDDI, ATM, X25, TokenRing Note that clmigcheck doesn't flag an error if DISKHB network is found and PowerHA migration utility automatically takes care of removing that network

SAN storage for Repository Disk and Target Mode – –

•

Use clstat to review the cluster state and to make certain that the cluster is in a stable state Review the /etc/hosts file on each node to make certain it is correct Review the /etc/netsvc.conf (equiv) file on each node to make certain it is correct After AIX Version 6.1.6, or later is installed, enter the fully qualified host name of every node in the cluster in the /etc/cluster/rhosts file

The repository is stored on a disk that must be SAN attached and zoned to be shared by every node in the cluster and only the nodes in the cluster – and not part of a volume group SAN zoning of FC adapters WWPN for Target Mode communication

Multicast IP address for the monitoring technology (optional) – – –

Björn Rodén

You can explicitly specify multicast addresses, or one will be assigned by CAA Ensure that multicast communication is functional in your network topology before migration Note that from PowerHA 7.1.3 unicast is default

clmigcheck tool (1/2) clmigcheck tool is part of base AIX from 6.1 TL6 or 7.1 (/usr/sbin/clmigcheck) •

An interactive tool that verifies the current cluster configuration, checks for unsupported elements, and collects additional information required for migration

•

Saves migration check to file /tmp/clmigcheck/clmigcheck.log

•

You must run this command on all cluster nodes, one node at a time, before installing PowerHA 7.1.3

•

When the clmigcheck command is run on the last node of the cluster before installing PowerHA 7.1.3, the CAA infrastructure will be started (check with lscluster -m command).

----------[PowerHA System Mirror Migration Check] ------------Please select one of the following options: 1 = Check ODM configuration. 2 = Check snapshot configuration. 3 = Enter repository disk and multicast IP addresses. Select one of the above, "x" to exit or "h" for help:

Björn Rodén

clmigcheck tool (2/2) •

Option 1 – –

•

Option 2 – –

•

Checks configuration data (/etc/es/objrepos) and provides errors and warnings if there are any elements in the configuration that must be removed manually. In that case, the flagged elements must be removed, cluster configuration verified and synchronized, and clmigcheck must be rerun until the configuration data check completes without errors. Checks a snapshot (present in /usr/es/sbin/cluster/snapshots) and provides error information if there are any elements in the configuration that will not migrate. Errors checking the snapshot indicate that the snapshot cannot be used as it is for migration, and PowerHA do not provide tools to edit a snapshot.

Option 3 – – –

Queries for additional configuration needed and saves it in a file in /var on every node in the cluster. When option 3 is selected from the main screen, you will be prompted for repository disk and multicast dotted decimal IP addresses. Newer version of AIX has updated /usr/sbin/clmighcheck command and ask to select "Unicast" or "Multicast“.

Use either option 1 or option 2 successfully before running option 3, which collects and stores configuration data in the node file /var/clmigcheck/clmigcheck.txt, which is used when PowerHA 7.1.3 is installed. Björn Rodén

Rolling Migration Overview Steps 1. 2.

Stop cluster services on one node (move rg as needed) Upgrade AIX (if needed) and reboot •

3. 4.

Also install additional CAA filesets, bos.cluster and bos.ahafs

Verify /etc/hosts and /etc/netsvc.conf (and /usr/es/sbin/cluster/netmon.cf) Update /etc/cluster/rhosts •

5. 6. 7.

Enter cluster node hostname IP addresses. Only one IP address per line.

Refresh -s clcomd Execute clmigcheck (option1, then option 3) Upgrade PowerHA •

Install base level install images and complete upgrade procedures

•

Then comeback and apply lastest SPs on top of it. Can be done non-disruptively.

8. Review the /tmp/clconvert.log file 9. Restart cluster services (move rg back if needed) 10. Repeat steps above for each node (minus the additional options on clmigcheck)

Björn Rodén

Basic PowerHA cluster functionality verification Verify PowerHA cluster functionality – – –

After system functionality verification (file systems, users, network, backup, etc) Before or after cluster application server verification (start/stop/monitor integration hardening) Before end-to-end application resiliency verification (environment/enterprise wide failure scenarios)

Procedure Reboot both NODE1 & NODE2 and restart HA on both RG stop on NODE1 w/RG on NODE1 RG start on NODE1 RG stop on NODE1 w/RG on NODE1 RG start on NODE2 RG stop on NODE2 w/RG on NODE2 RG move from NODE2 to NODE1 w/RG on NODE2 RG move from NODE1 to NODE2 w/RG on NODE1 IP Failure test NODE1 Reintegrate NODE1 IP Failure test NODE2 Reintegrate NODE2 IP Failure test NODE1&NODE2 Reintegrate NODE1 & NODE2 Stop of PowerHA on NODE1 w/ migration to NODE2 Re-start PowerHA on NODE1 to reintegrate Stop of PowerHA on NODE2 w/ migration to NODE1 Re-start PowerHA on NODE2 to reintegrate SAN Availability Test on NODE1 Reintegrate NODE1 SAN SAN Availability Test on NODE2 Reintegrate NODE2 SAN HMC Power Off of NODE1 w/ RG on NODE1 HMC Activate of NODE1 & Restart Power-HA on NODE1 HMC Power Off of NODE2 w/ RG on NODE2 HMC Activate of NODE2 & Re-start Power-HA on NODE2 Reboot both NODE1 & NODE2 and restart HA on both Björn Rodén

Actions --- EXAMPLES

Excepted outcome Actual outcome

clRGmove -d clRGmove -u clRGmove -d clRGmove -u clRGmove -d clRGmove -m clRGmove -m ifconfig en# down w/ RG on NODE1 ifconfig en# up on NODE1 ifconfig en# down w/ RG on NODE2 ifconfig en# up on NODE2 ifconfig en# down on NODE1 & NODE2 ifconfig en# down on NODE1 & NODE2 cl_clstop cl_clstop SAN Admin SAN Admin SAN Admin SAN Admin chsysstate chsysstate chsysstate chsysstate chsysstate © Copyright IBM Corporation 2014