Best practice with 32 TB virtual machines
| Abbas Ali Mir | Ralf Klahr | Momin Qureshi | Chris Nolan |
Episode #210
Introduction
In episode 210 of our SAP on Azure video podcast we talk about 32 TB Virtual machines on Azure!
Recently we talked about the release of virtual machines with 32 TB of memory on Azure. This allows you to run the most high demanding SAP workload on Azure. With some innovations it takes some time to be adopoted, but apparently with the 32 TB system, that was not the case.
I am really happy to have a team with us today that worked on customer projects already using these 32 TB VMs. Abbas, Ralf, Momin and Chris.
Find all the links mentioned here: https://www.saponazurepodcast.de/episode210
Reach out to us for any feedback / questions:
- Robert Boban: https://www.linkedin.com/in/rboban/
- Goran Condric: https://www.linkedin.com/in/gorancondric/
- Holger Bruchelt: https://www.linkedin.com/in/holger-bruchelt/
#Microsoft #SAP #Azure #SAPonAzure #SAPHANA #VirtualMachines
Summary created by AI
- Overview of 32 Terabyte VMs :
- Holger and Abbas Ali discussed the release and benefits of 32 terabyte virtual machines on Azure, highlighting their ability to handle high-demand SAP workloads. Abbas Ali introduced the team and the agenda for the meeting.
- Release Benefits: Holger highlighted the release of 32 terabyte virtual machines on Azure, emphasizing their capability to handle high-demand SAP workloads. Abbas Ali noted that these machines allow for running the most demanding SAP workloads with innovations that facilitate adoption.
- Team Introduction: Abbas Ali introduced the team, including himself, Ralph, Momin, and Christopher, who have been working on customer projects using the 32 terabyte machines. Each team member provided a brief introduction of their roles and expertise.
- Agenda: Abbas Ali outlined the agenda, which included discussions on S4 HANA architectures, high availability and disaster recovery, migration from 24 terabyte to 32 terabyte VMs, storage layout, deployment best practices, VM specifications, and platform improvements.
- Case Study Presentation :
- Abbas Ali presented a case study of a Fortune 50 retail consumer goods company that recently went live with S4 HANA on 32 terabyte VMs, detailing the architecture and migration process from 24 terabyte VMs.
- Customer Background: Abbas Ali presented a case study of a Fortune 50 retail consumer goods company that recently went live with S4 HANA on 32 terabyte VMs. The system handles over $40 billion in transactions annually.
- Migration Process: Abbas detailed the migration process from 24 terabyte VMs to 32 terabyte VMs, explaining the use of HANA system replication for a seamless transition. The customer initially used 6 terabyte VMs, then moved to 12 terabyte, 24 terabyte, and finally 32 terabyte VMs due to increasing database size and performance needs.
- High Availability: Abbas explained the high availability architecture, including the use of HANA system replication, pacemaker clusters, and Azure load balancers to achieve an uptime of three nines, translating to less than 7 hours of downtime annually.
- High Availability Architecture :
- Abbas Ali explained the high availability architecture for the S4 HANA system, including the use of HANA system replication, pacemaker clusters, and Azure load balancers to achieve high availability and minimize downtime.
- Architecture Overview: Abbas Ali described the high availability architecture for the S4 HANA system, which includes a cross-zone setup in the South Central US region with three availability zones (AZ1, AZ2, AZ3). The architecture ensures an uptime of three nines, with an RPO of 20 minutes and an RTO of 30 minutes.
- Components: The architecture includes two HANA database VMs with HANA system replication and pacemaker clusters, app servers deployed in availability sets, and SAP shared storage using ANF with cross-zone replication. Central services VMs and web dispatchers are also deployed across two zones with Azure load balancers.
- Customer Variations: Abbas mentioned that other customers might use different variations, such as Azure fencing agent or Azure shared disk architectures for quorum, and VM scale sets for app servers instead of availability sets.
- Disaster Recovery Architecture :
- Abbas Ali described the disaster recovery architecture, which includes HANA system replication to a secondary region and the use of Azure Site Recovery for app servers and central services.
- DR Setup: Abbas Ali explained the disaster recovery (DR) setup, which involves extending the primary database to a secondary region using HANA system replication (asynchronous). The DR setup also includes Azure Site Recovery for app servers, central services, and web dispatchers.
- RPO and RTO: The target RPO for DR is 30 minutes, and the target RTO is 4 hours. The customer expects to handle most scenarios within the same region, with cross-region failover as a less likely but possible scenario.
- Future Enhancements: Abbas mentioned future enhancements, such as ANF multi-target replication and the use of Rubrik backup solutions to improve DR capabilities. These enhancements aim to provide cross-region replication and better data availability.
- Storage Design and Performance :
- Ralf discussed the storage design for the project, highlighting the use of premium managed disks for HANA logs and Azure NetApp Files for data volumes to optimize performance and reliability.
- Storage Components: Ralf explained the storage design, which uses premium managed disks with write accelerators for HANA logs and Azure NetApp Files (ANF) for data volumes. This combination optimizes performance and reliability for the SAP workloads.
- Performance Benefits: Ralf highlighted the performance benefits of using ANF for data volumes, including lower latency and higher IOPS. The use of multiple NFS volumes helps distribute the load and improve performance during high I/O operations like savepoints.
- Backup Strategy: Ralf discussed the backup strategy, which involves using cross-region replication for data volumes and leveraging Azure Snap for consistent snapshots. Future plans include using ANF backup with immutable and indelible storage to protect against ransomware.
- Deployment Best Practices :
- Momin Qureshi shared best practices for deploying 32 terabyte VMs, including capacity planning, infrastructure setup, business requirements, and performance testing.
- Capacity Planning: Momin emphasized the importance of capacity planning, including setting up quotas and planning deployment timelines for different environments (QA, pre-prod, production, DR) to ensure resource availability.
- Infrastructure Setup: Momin recommended using automation tools like Terraform and PowerShell scripts for infrastructure setup to reduce human error and increase deployment speed. He also suggested running disk quality checks to ensure proper configuration.
- Business Requirements: Momin highlighted the need to define business requirements, such as RPO, RTO, SLAs, and SLOs, to ensure the deployment meets business expectations. He also stressed the importance of practicing backups and restores to build confidence in recovery processes.
- Performance Testing: Momin advised conducting thorough performance testing, including load testing and using tools like niping, ABAP meter, and FIO for storage testing. He recommended building dashboards to monitor key metrics like ANF throughput, CPU, and memory usage.
- VM Specifications and Certifications :
- Christopher Nolan provided details on the specifications and certifications of the 32 terabyte VMs, including their CPU, memory, disk, and network capabilities, as well as the use of Azure Boost for hardware acceleration.
- VM Specifications: Christopher Nolan detailed the specifications of the 32 terabyte VMs, which come in two SKUs: one with hyper-threading enabled (1792 vCPUs) and one without (896 vCPUs). Both SKUs offer over 30 terabytes of memory, 64 disk attachments, and 185 Gbps network throughput.
- Certifications: The 896 vCPU SKU is certified for OLTP and OLAP deployments as single nodes, while the 1792 vCPU SKU supports OLAP. Certification for scale-out deployments is ongoing.
- Azure Boost: Christopher explained that Azure Boost, a dedicated hardware acceleration system, enables high throughput and low latency by offloading network and storage tasks to a system on a chip, improving VM performance and resource utilization.
- 0:00 Intro
- 1:30 Introducting the team Abbas, Ralf, Momin and Chris
- 5:10 32 TB VMs - SAP S/4HANA Scale-up Architectures
- 10:45 Secondary Region for DR
- 14:15 Migrate to 32 TB VMs
- 17:50 SAP HANA Files system Storage layout
- 20:45 Storage configuration in second region (DR)
- 25:55 SAP Deployment Best practices and Go Live Readiness
- 36:00 VM Specs, Platform Improvements, Observablity
- 44:55 Ralf's recipe - Cod Fish with potato chips