
Premium SSD vs Ultra SSD: Azure Storage Performance for Distributed Databases
towardsai.net
Premium SSD vs Ultra SSD: Azure Storage Performance for Distributed Databases 0 like March 4, 2025Share this postLast Updated on March 4, 2025 by Editorial TeamAuthor(s): Richie Bachala Originally published on Towards AI. When building distributed systems in the cloud, storage performance can make or break your applications success. In this post, well explore how different Azure disk types perform under distributed database workloads, using YugabyteDB as our distributed database. Well dive deep into benchmarking methodologies and reveal practical insights about Azure storage performance characteristics.The Azure Storage LandscapeAzure offers several managed disk types, each designed for different workloads and performance requirements. Well focus on three key offerings:Premium SSD: The traditional performance-tier offering, providing consistent performance with burstable IOPSPremium SSD v2: A newer generation offering higher performance and more flexible scalingUltra SSD: Azures highest-performance offering with configurable IOPS and throughputEach of these options presents different performance characteristics and price points, making the choice non-trivial for database workloads.Understanding Distributed Database WorkloadsBefore diving into performance numbers, its essential to understand what makes distributed database workloads unique. Unlike traditional single-node databases, distributed databases like YugabyteDB handle data differently:Write Operations:Require consensus across multiple nodesNeed to maintain consistency across replicasOften involve both WAL (Write-Ahead Log) and data file writes2. Read Operations:May contact multiple nodes depending on consistency requirementsUtilize caching at various levelsCan be affected by data localityThese characteristics mean that storage performance impacts database operations in complex ways, often not directly proportional to raw disk performance metrics.Benchmarking MethodologyTo thoroughly evaluate storage performance, we need a comprehensive testing approach. We employed two industry-standard benchmarking tools:TPC-C BenchmarkTPC-C is a database benchmark that simulates a complete order-processing environment. Its valuable because:Models real-world business operationsGenerates mixed read-write workloadsTests multiple transaction types with varying complexityProvides insights into real-world performance expectationsOur implementation focuses on the following transactions:New Order: Complex write-heavy transactionPayment: Mixed read-write transactionOrder Status: Read-only transactionDelivery: Write-heavy batch transactionStock Level: Read-heavy transactionEach of this transaction is a set of queries that are fired to carry out the business use case. For e.g. the following are the queries that are fired for New Order transactionGet records describing a warehouse, customer, & districtUpdate the districtIncrement next available order numberInsert record into Order and New-Order tablesFor 515 items, get Item record, get/update Stock recordInsert Order-Line RecordFor TPC-C, we focus primarily on NewOrder latencies as number of NewOrder transactions define the efficiency. So if the NewOrder latency is 50ms, it means it took 50ms to carry out all the queries listed above.SysbenchSysbench is a micro benchmarking workload. It creates a bunch of similar tables and the workloads are uniformly distributed across all keys of all the tables. Following are the two workloads that we use most:oltp_read_only There are 10 selects in one transaction to random tables and random keys. So if the latency of the transaction is lets say 10 ms, it means each select is taking 1 ms. And if the throughput is 100 ops/second, it means it is doing 1000 selects per second.oltp_multi_insert There are 10 inserts in one transaction to random tables and random keys. So if the latency of the transaction is lets say 50 ms, it means each insert is taking 5 ms. And if the throughput is 100 ops/second, it means it is doing 1000 inserts per second.While TPC-C provides a high-level view, Sysbench allows us to examine specific performance characteristics:Enables focused testing of individual operation typesProvides precise control over workload parametersHelps isolate storage performance impactsAllows scaling tests with different table counts and sizesWe configured Sysbench tests to examine:Point selects (read performance)Insert operations (write performance)Different data set sizes (20 and 30 tables) Sysbench git repo : https://github.com/yugabyte/sysbench/Image by AuthorAzure Disk Performance Comparison TablesTest Environment ConfigurationCluster ConfigurationBenchmark ResultsBenchmark Configuration DetailsImage by AuthorKey Findings and RecommendationsBased on our comprehensive testing, we can make several recommendations:For Read-Heavy WorkloadsPremium SSD v2 provides the best balance of performance and cost. The performance gap between Premium SSD v2 and Ultra SSD is minimal for read operations, making Ultra SSD harder to justify purely for read performance.For Write-Heavy WorkloadsUltra SSD shows its value in write-intensive scenarios, particularly with larger datasets. The consistent performance and lower latencies can justify the higher cost for write-critical applications.For Mixed WorkloadsPremium SSD v2 emerges as the most cost-effective option for most mixed workloads. The performance improvements over Premium SSD are significant, while the cost remains lower than Ultra SSD.ConclusionOur testing reveals that Azure disk performance isnt simply about raw IOPS and throughput numbers. The interaction between storage and distributed database workloads is complex, with CPU often becoming the limiting factor before storage performance is fully utilized. If the workload requires low latency, then Ultra SSD would be the best choice. If the workload requires high throughput, then Ultra SSD would also be the best choice. If the workload does not have any specific latency or throughput requirements, then Premium SSD V2 would be a good choice. Ultra SSD has the lowest latency and throughput of all three types of disks. However, it is also the most expensive. Premium SSD V2 is a good choice if you need high throughput and are on a budget. Premium SSD is a good choice if you do not have any specific latency or throughput requirements.For most distributed database deployments, Premium SSD v2 provides the sweet spot of performance and cost.Ultra SSD becomes compelling primarily for:Write-heavy workloads with strict latency requirementsLarge datasets with unpredictable access patternsMission-critical applications requiring consistent performanceWhen selecting Azure disk types for your distributed database, consider:Your workload characteristics (read/write ratio)Dataset size and growth expectationsPerformance requirements and budgetary constraintsThe actual bottlenecks in your current systemRemember that storage performance is just one piece of the puzzle. A well-designed distributed database system needs to consider network topology, CPU resources, and memory configuration alongside storage performance for optimal results.Image by authorThanks for readingx.comEdit descriptiontwitter.comJoin thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
0 Comments
·0 Shares
·21 Views