Are You Sure That's a Benchmark?

What’s the Point of a Benchmark?

Wikipedia alleges that a benchmark is meant to make it easier to compare different computer systems just by looking at a number. For a benchmark to be useful, it should be run independently. How can I compare my graphics card to your graphics card if we can’t run the same test? So, a benchmark would let you and I run the same set of code and reach a similar result. Database benchmarks like TPC-C let software vendors run the same set of code and get in a pissing match about whose software should be the most expensive (the answer is still Oracle). The upside of benchmark code like TPC-C, or even open source benchmarks like YCSB or the Big Data Benchmark, is that anyone can run it. Once you’ve run a benchmark you can compare your system to a test rig see how your performance matches up.

Share and Share Alike

We should share benchmarks, right? Absolutely. We should share all of our benchmark code and share the results, as allowed by the benchmark’s license. That means we can’t share TPC results and call them TPC results.

Trust But Verify

That means we should be able to verify benchmarks, right? You got it! For a benchmark to really be representative, it needs to be available for peer review. The more eyes that are on a benchmark, the more accurate the benchmark can become and the better our understanding of it gets. The Azure SQL Database Benchmark only supplies a workload description like “SELECT; in-memory; read-only” or “UPDATE; mostly not in-memory; read-write”. It’s impossible to verify this benchmark or even compare it to a local SQL Server. Don’t worry, though, Microsoft knows that you’re not going to test your code. And, even if you did, you could never figure out how your performance maps up to their nebulous Database Throughput Unit (DTU).

Customers familiar with traditional databases and database systems will immediately understand the value and caveats associated with benchmark numbers. We have found newer startups and developers to be less familiar with the benchmarking industry. Instead, this group is more motivated to just build, test, and tune. Azure Database Performance: Service Tiers & Performance Q&A

That’s right - if you’re deploying to the cloud, you don’t want to know in advance. That’s the official stance from Microsoft. You’d just be happier to sit around and re-write your code over and over again until you get something that works. Refactoring is good fun, after all.

But Sharing is Caring?

Why wouldn’t you want to share the benchmark? It couldn’t be that Azure is slow. And it certainly couldn’t be that Azure is expensive. And it absolutely couldn’t be that customers might leave if you compare something slow and expensive to something fast and less expensive… What other reason could there be for not releasing any way to run the Azure Database Benchmark? Sure, we could all cobble something together that works and acts remarkably unlike the Azure SQL DB Benchmark. But it’s not going to be the same benchmark. If I sound bitter, it’s because I had high hopes that Microsoft would continue increasing transparency around their core business. There’s even a compelling argument that metrics like TPC-C are outdated. But in order for that argument to hold water the metrics must be reliable, understandable, and reproducible. Secret metrics help no one.