Q&A with Andrew J. Kelly
Wait Stats Can't Wait
SQL Server’s wait statistics are like a laser-guided pointer to performance bottlenecks. So why aren’t wait stats a key weapon in every database pro’s performance tuning and troubleshooting arsenal?
Unfortunately, many DBAs don’t understand how to interpret and use the under-documented wait stats, says SQL Server MVP and performance expert Andrew J. Kelly. It’s a situation he’s intent on correcting. In his Nov. 6 PASS Summit Unite 2009 post-conference seminar, “Practical Performance Monitoring and Troubleshooting,” Andrew takes on wait stats, file stats, and more as he demonstrates how to capture and analyze the most useful statistics for database and system performance. Here’s an inside look at what to expect.
Q: What’s the reward to performance monitoring and troubleshooting done right?
A: For performance monitoring, it’s being able to head off potential issues before they become real problems. I often see many sites spending all their time putting out fires because they don’t have processes in place to do regular monitoring. Most sites have just a few bottlenecks or poorly tuned queries that account for most of their problems. One of the areas I cover in my seminar is how to find in short order the top x queries that are causing the majority of the problems.
In addition, organizations can often forego or at least delay the purchase of new hardware just by ensuring that they stay on top of performance issues caused by poorly written code, improperly tuned indexes, and so on.
As for troubleshooting, for me it’s the satisfaction when you find and fix the issue and speed up the system.
Q: DBAs can easily get overwhelmed with the amount of performance information available and knowing where to start, what statistics to look at, and how to interpret the information they provide. Your seminar promises to help them through the forest to see the “problem” trees. What’s your favorite part of the seminar—any “a-ha” moments?
A: There are usually a lot of a-ha moments because everyone has a little different view of SQL Server and varying degrees of experience in troubleshooting and performance monitoring. But I like the file and wait stats sections. They provide perfect opportunities for attendees to explore the kinds of real-life cases that clearly show how this information relates to their own environment. Such scenarios are hard to share in a book or white paper, and that’s one of the reasons why attending a conference in person, especially a deep-dive pre- or post-con workshop, can really pay off.
Q: What was the best performance-related tip, trick, or advice you ever received?
A: Hmmm, I’ve been doing this for so long, I’m not sure I can pick one out anymore. But I can tell you that taking the time to understand how SQL Server works under the covers for a given area that you’re concerned with can go a long way toward preventing performance problems. For instance, understanding how the engine caches and reuses query plans can help ensure that your database calls don’t get a new plan each time they’re called. This can make a significant dent in CPU usage and boost performance in general.
Q: When you go into a client’s environment to troubleshoot a performance problem, what are the first things you do?
A: I always start by getting a good overview of the environment and issues directly from the people responsible for each area of concern. I then start capturing metrics and metadata to get a handle on what the system is doing and how it’s configured. But wait stats are always high on the list to look at sooner rather than later. They can tell you right away where your bottlenecks are and help eliminate areas that aren’t of concern.
Q: Do you approach performance monitoring differently when you’re doing a regular health check versus when you know there’s a problem?
A: With a health check, I tend to capture a wide range of metrics and metadata to ensure I get a good overall view of the system. But with a known issue, I can usually narrow down the counters and such that I need to look at to find the source of the problem. For example, if there are disk bottlenecks, I can use the virtual file stats DMV to determine which file or files I need to concentrate on first.
Q: Where do a lot of DBAs go wrong when doing performance monitoring and troubleshooting?
A: Many people capture too much, too little, or simply the wrong type of information, all of which can lead to problems in determining what the real issues are. Capturing too much data can make the DBA feel overwhelmed and unsure of where to start. Too little data or the wrong data, on the other hand, can lead to wrong conclusions and misinformed decisions.
Q: What do you think is the most overlooked or underutilized statistic for performance tuning and investigation?
A: Without a doubt, it’s wait statistics. Properly analyzed, wait stats can tell you more in a short amount of time than any other statistic or counter. When it comes to performance issues, too many DBAs these days tend to shy away from using wait stats because they’re not the best documented or understood of topics. But every DBA should make a conscious effort to learn more about them and to use the information they provide on a regular basis.
Q: A core part of your workshop is the code used to capture and analyze the metrics data. Is this code something you developed? What’s notable about it?
A: Most of the code is something I either developed myself or, in some cases, modified from bits I’ve come across over the years. What’s notable is that none of it is really complex. Most people don’t realize how easy this can really be.
Hear more from Andrew and other SQL Server experts at PASS Summit Unite 2009 in Seattle, WA. See the lineup of pre/post-conference seminars, review the full conference agenda, and register today.
Andrew J. Kelly, a SQL Server MVP since 2001, is the practice manager for performance and scalability at Solid Quality Mentors. With 20 years’ experience in relational databases and application development, Andrew specializes in performance, scalability, and maintainability of large-scale SQL Servers. He also writes for leading SQL Server publications, is an active blogger, and regularly speaks at conferences and user groups.