Don’t Expect Miracles from your Database Administrator

My previous post focused on the contribution of the Database Administrator (DBA) to application performance. Even so, application performance depends upon many factors, some of which are beyond the control of even the most dedicated DBA. So if you were thinking of relying on your DBA to fix everything, this week’s performance principle provides is intended as a wake-up call:

Expecting a DBA to guarantee the performance of any application that uses the database is like asking a piano tuner to guarantee a flawless performance, regardless of the pianist.

What is Database Design?
In our industry, most terminology is imprecisely defined. There are no universally accepted boundaries between “architecture,” “design,” “development,” and “tuning”. In the case of databases, and the applications that use them, ensuring acceptable performance involves many design-related activities:

  • Performance planning and analysis: Identifying how the users’ most critical processing requirements relate to application and database design decisions.
  • Application design: Designing applications that will use the database(s) efficiently, including the best way to code data manipulation statements like SQL.
  • Application partitioning: Distributing the processing components of applications for optimal performance given an existing geographical distribution of data.
  • Application load testing: Testing and verifying the performance of complex SQL code and processing logic.
  • Application profiling: Monitoring and analyzing application behavior to determine how a particular program uses database resources, or how a particular data manipulation request interacts with a particular database or set of databases
  • Database design: Selecting among the logical and physical database structures offered by a particular database system, to meet application performance goals.
  • Data distribution: Determining the optimal scheme for distributing data among database systems situated at various locations in a distributed enterprise.
  • Database tuning: Adjusting the many software options provided by a database vendor to customize the behavior of a particular database system to best suit a particular mix of databases and application programs.
  • Environment tuning: Adjusting the hardware or software environment within which the applications and database systems operate.

Who Does Database Design?
People’s titles, roles, and responsibilities vary widely, but very few people in our industry are actually given the title of designer. Where databases are concerned, a designer may be called an architect, an application developer, a performance analyst, a database specialist, or a DBA.

Sometimes a single individual carries out many of the activities listed above. But most enterprises implementing information systems have people in roles that correspond to analysts, developers, and DBAs:

  • Application analysts and software developers certainly have a lot to say about how information flows among the various enterprise databases. They may even design some new databases.
  • For small databases, it is reasonable to free database users from central MIS schedule and resource limitations by giving them reporting tools that can directly manipulate departmental application data where it resides. They may even design new “private” databases for themselves or their co-workers.

But, once a collection of information is identified as a corporate asset, to be shared by several applications, or queried by users from more than one department, then its ongoing care and management must become a community (or “system”) responsibility. And to ensure that very large databases can be used without running into performance problems, then someone with specialized database design and tuning skills must be involved. These types of design and tuning decisions are usually made by DBAs, not application developers.

Performance Takes a Team Effort
This does not mean that management can expect DBA’s to make all the right decisions concerning database design, tuning, and performance, while working in a vacuum. Decisions affecting the performance of applications that use database resources simply can’t be made without knowing how those applications need to use the data. And that’s an aspect of application design where the key variables are always going to be controlled by application designers and developers, not DBA’s.

So ultimately, application performance must become the joint responsibility of all concerned. Expecting a DBA alone to guarantee the performance of any application that uses the database is like expecting a piano tuner to guarantee a flawless performance, regardless of the pianist.

Be Nice to a DataBase Administrator Today

The annual Computer History Museum Fellow Awards program publicly recognizes individuals of outstanding merit who have significantly contributed to advances in computing technology or applications, and to the evolution of the information age. Fellows may have worked in such diverse fields as hardware, software, networking, computer science, business, education, public service, or journalism, but they have one thing in common: their contributions have had a direct influence on computer history, and ultimately, they have changed our lives.

Each year around this time, a “who’s who” of the technology world assembles at the museum in Mountain View CA for a banquet and ceremony to induct a new group of Fellows. This evening, among the six new Fellows chosen in 2009, Don Chamberlin will be honored. Don was a co-inventor of SQL, the world’s most widely-used database language, and one of the managers of IBM’s System R project, which produced the first SQL implementation and seeded the development of much of IBM’s relational database technology.

Relational database technology lets programmers manipulate data without having to know anything about its internal storage structure. This seems natural today, but when Ted Codd first invented the relational model for data management, it was a revolutionary concept. The System R project, and the SQL language in particular, made that concept a reality. Even so, just having the ability to write applications that manipulate databases using a logical language like SQL does not make those applications run fast enough to meet the needs of their users. It’s still true that the performance of a database application depends on:

  • How the application uses the data
  • How the database management software is configured
  • The database structures (tables, indexes, and other options)
  • The amount and nature of the data itself
  • The hardware environment

Even though SQL has largely insulated application programmers from these messy details, somebody still has to worry about them if applications are ever going to meet business performance goals and satisfy customers’ needs for responsive interactions. Today that responsibility falls primarily on the shoulders of someone whose job did not even exist back in the days when Don Chamberlin was inventing SQL — the Database Administrator, or DBA.

Once considered a cushy job, it’s now a chaotic mix of resource managers, bitmap indexes, heterogeneous locking protocols, optimizer anomalies, and irate users that can’t access their own personnel information and get angry every time they hear the word ‘User’.  And to let application programmers think logically, DBA’s still have to worry about buffer sizes, index fragmentation, disk striping, database backup and recovery, and a million other things besides.

So, thank Don Chamberlin today for making our lives more logical. And be nice to a DBA today, for making it all work in practice.

Monitor Standard Application Scenarios

In recent years, much has been written about the value of use cases and scenarios for capturing functional requirements; by comparison, their usefulness for performance management has received scant attention .  An application scenario defined for performance management purposes:

  • Involves a known fixed workload
  • Runs in the normal production environment
  • Runs against the production databases
  • Is instrumented to record response time

Because standard application scenarios are application/program instances with defined behaviors, their use of computing resources is also (relatively) predictable. In a sense, they are “benchmark” programs, since they perform a similar function. Normally however, performance benchmarks are designed to mimic a particular type of workload on a component or system, and are used to measure system capacity and throughput when processing a typically broad mix of applications.  Standard application scenarios, in contrast, can be designed to measure a system’s responsiveness for a single precisely-defined set of processing needs.

In today’s enterprise computing or Web environments, many components combine to determine the performance of applications in the production environment. These include LANs, WANs, workgroup servers and databases, enterprise servers and databases, internet servers, and the networks that make up the Internet. If an application slows down, which component is responsible? The value of standard application scenarios is that they can be systematically chosen to test the responsiveness of these various components individually.

Do You Need Special Purpose Code?
The applications used for scenarios are usually not purpose built–in fact it’s best if they are not. The ideal approach employs existing production applications, but used in carefully controlled ways. By exploiting the characteristic behaviors of existing applications under different usage conditions, we can create a suite of standard tests each of which comprises a distinct and known level of activity. For example:

  • A minimal workload might involve a request to a remote server that returns without doing any database activity, allowing us to monitor the communication delay.
  • A small workload might retrieve a single database record using an index.
  • A typical workload would be one that matched the expected performance profile for a certain application class.
  • A large workload might do a table scan of a large table, retrieve a large amount of data, or perform extensive computations, depending on the type of application.

Service Level Agreements
When service level agreements (SLAs) exist, the definitions of terms like minimal, small, typical, and large can be tailored to match workload levels defined in the SLAs. Also, the ideal SLA specifies how service levels will be verified in practice; one method is to use standard use cases or scenarios that generate known workloads. I’ve been noting the potential for this kind of synergy between functional and performance specifications for almost 20 years now, so I’m pleased to see Steven Haines also recommending this approach in his 2006 book, Pro JAVA EE 5 performance management and optimization.

Developing and Using Standard Applications
Before applications are deployed, it is common for developers and testers create test cases to validate aspects of their code, or of overall system behavior. With a little more planning, standard application scenarios can be identified at this time, and used during testing and deployment to verify that an application meets its performance goals. Once standard execution scenarios have been defined for an application, test scripts can be written to incorporate them into the suite of tools available for performance monitoring.

Using performance testing tools, we can run a suite of standard application scenarios, measure their response times, and summarize the results. Given a sufficiently comprehensive suite of standard application scenarios, such a report would serve as a system-wide summary of response time performance, to be viewed alongside other performance reports or a performance management dashboard. Individual scenarios can also be executed and measured when investigating specific response time concerns, to isolate the location of performance problems.

This is the second in a series of posts on the subject of “Performance Principles.”  I encourage members of the Apdex community to contribute comments about practical applications of these principles from their own experience.

Newton’s First Law of Performance Monitoring

If Sir Isaac Newton were stating the laws of computer systems performance, his first law would surely have been: The graph of performance continues in a straight line unless the force of some external event causes it to change.

Not knowing what changed is a serious impediment to problem diagnosis.

How does performance suddenly become “abnormal”? Of course, the answer is, it doesn’t–at least, not on its own. There is always an external cause. So to fix a performance problem, we must find the cause–usually more of something, such as:

  • Increased processing volumes
  • More data in a database
  • More customers
  • A new application competing for resources
  • Increased competition from existing applications on the same servers
  • Increased interference from other traffic on the network.

Tracking Growth Patterns
Growth is not always sudden or dramatic. Sudden workload growth can produce sudden changes in performance; gradual growth tends to produce a corresponding gradual decline in performance levels.

Any computer system may gradually reach a point where memory size, processor speed, I/O processing and communications overhead are interacting in a way which was not predicted when an earlier performance model was evaluated. Regular monitoring should reveal a corresponding gradual decline in performance levels.

Environmental Changes
If nothing in the workload grew noticeably, then the explanation must be that some component of the application or its environment became less efficient. Typical examples of environmental changes that create performance problems are:

  • A new version of the application, or a fix to critical application routines, is installed, changing the way hardware or systems software resources are used.
  • A database or index becomes sufficiently disorganized that normally efficient processing is seriously degraded.
  • Previously matched software components become mismatched when system software is upgraded. For example, a new version of database software may handle certain types of request differently, and previous default settings used by the client need to be changed accordingly.
  • Changes to system software parameters (like cache sizes, scheduling priorities, or available threads) can affect the performance of some applications directly. Changes in the environment can invalidate the previous settings of these system parameters. For example, after a new application has been added to a system, settings that were suitable for a low volume of activity may now be causing excessive queuing of service requests.
  • Changes in the hardware or software environment can consume resources previously available. For example, new applications, users, or devices may be suddenly dumping large volumes of data onto a shared network.
  • A network device may develop a fault that reduces the effective bandwidth of the network.

Monitor External Changes
In the Apdex community, we tend to focus on the numerical aspects of performance. But successful performance management demands more than merely collecting, reporting, and tracking measurement data. The ideal performance management process is one that also systematically tracks all external changes, because they are the source of most performance problems.

This is the first of a series of posts I plan to write on the subject of “Performance Principles.” I encourage members of the Apdex community to contribute comments about practical applications of these principles from their own experience.

Welcoming Chris Loosley

I would like to introduce Chris to the community.  A specialist in performance management, Chris is joining us as a regular contributor to the Apdex Exchange.  Through his work at IBM, Bachman, Database Associates, and Keynote Systems, Chris has helped to guide many organizations through the dangerous waters of software performance management.  He compiled his insights in High-Performance Client/Server, which is very informative book.  However, more important to this blog, Chris has had a hand in the development of Apdex from the ground up.  Chris was one of the co-editors of the original specification and has helped deliver the Apdex Symposium over the past three years.  I look forward to his contributions to this blog.

Using Apdex to Improve Online Customer Satisfaction

New Relic hosted a fast-paced webinar where Peter Sevcik, founder and executive director of the Apdex Alliance, provided an overview of Apdex. New Relic consultant Steve Hudson followed with real-world examples of how to measure Apdex scores in production Rails or Java web applications using RPM.

Click here to see the August 26 webinar in mp4 format.

Many Uses of Apdex

Apdex is a simple formula that converts many performance values into an easy to understand 0-to-1 performance index. Defining the target application response time T and accurately interpreting the result (the score) requires some methodology–and that methodology should reflect how you plan to use Apdex. There are three ways to use Apdex: to do tactical diagnostics, to support process, and to link performance to your business. These three uses govern how you make Apdex parameter choices.

Tactical Diagnosis
This is the simplest use of Apdex.  The Apdex parameters are completely under your control without requiring agreement from any other group.  It allows you to experiment with T values to see the sensitivity of your applications to the Apdex formula.  Once you have a T value set you can use the Apdex scores to sort many measurements to determine which applications need attention.  This is an alternative to alarm triggers that are often too sensitive and provide false alarms.

When used for diagnostics the Apdex T usually floats as needed to get the job done within a specific investigation.  The Apdex scores provide first level performance problem diagnostics.  Of course the tool that supplies the real-time data must have drill-down capabilities to continue on the diagnostic path. 

Process Support
Sophisticated performance management requires process.  The four fundamental ITIL-based processes that apply to application performance are: incident management, availability management, capacity management, and finally service assurance.  When applied to these processes Apdex takes on management properties that require you to involve other groups in your organization so Apdex scores will have meaning across groups.

When supporting incident management, the Apdex T and the “trigger” score should be meaningful (show actual incidents), yet not too aggressive (which creates false positives).  This said, we find that organizations tend to run Apdex parameters “hot” so Apdex scores vary a lot over a day.

By the time you progress to performance assurance, Apdex is incorporated in executive reporting and/or possibly is the foundation for SLAs.  For this use the Apdex T and “acceptable” score should be more relaxed.  Apdex scores should not swing wildly over the course of a day if in fact performance was consistent in the view of the users.

Think of it as similar to tracking human health.  Doctors in a hospital emergency room want monitoring instruments to be sensitive to the slightest patient changes.  In contrast, doctors at the U.S. government’s Centers for Disease Control (CDC) need to assess the health of the entire US population, so their parameters must be set to discover trends.  If parameters are too “hot” the results become very “noisy” and it is very hard to see long term trends.

Business/Performance Linkage
The most advanced use of Apdex is to report performance as it relates to the business.  Here Apdex is an element of dialog among IT staff, business managers, user representatives, and executive staff.  Apdex parameters are now carefully analyzed, presented with supporting evidence that shows business linkage, agreed upon by appropriate members of the organization, and documented.  The Apdex methodology brings structure, context, and an open standard to the dialog among key participants.  Apdex helps achieve consensus across groups about how to link application performance to the needs of the business.

Once company executives rely on Apdex reports then longer term performance management issues arise. For example how should Apdex parameters evolve over time?  This is like how Dow Jones decides to replace one of the 30 companies in the Dow Jones Industrial Average or showing the price of a stock before and after a share split.  The basis has changed.  How should you make the change?  Such a change shifts Apdex scores on the day the change is implemented.  How do you communicate the change in charts going forward?  Eric Goldsmith, Operations Architect at AOL has dealt with these issues, and described them in a fascinating presentation on the topic in the 2007 Apdex Symposium.

In a nutshell, you can use Apdex for simple tactical purposes within the network group as well as complex strategic executive level purposes.  The good news is that you can walk before you run, starting with simple uses and working your way up to sophisticated uses, leveraging your Apdex experience along the way.

Apdex Vendor Challenge: Show Me the Data

Apdex is dead simple. It is a standard way to convert many response time measurements into a single numerical value that always stays within the range of 0 to 1 where 0 is a disaster and 1 is perfect performance delivery. Dozens of vendors measure response time, so you would think you could readily get an Apdex report. Alas it is not so.

Last December, the Apdex Alliance asked vendors to submit the names of their products that generate Apdex reports to the Apdex Tools Directory. The directory has entries from the following six vendors: Compuware, Gomez, ip-label.newtest (formerly Auditec), WildPackets, and Xa Systems. However, when you search for information about Apdex at their websites, you get responses like, “No Pages were found containing Apdex”, or “Forbidden: Your client does not have permission to get URL.” WildPackets is the only vendor that has any information about their Apdex features. The WildPackets search on Apdex yields 51 documents with descriptions, FAQs, and tips.

So if you are interested in Apdex, you had better be prepared to roll your own reports. The good news is that you can easily create Apdex reports in a spreadsheet. We also find it useful to use a simple database like Access to organize your measurements. An Access query can pull out the measurements associated with the report you are creating, like all California measurements for yesterday. You can program the Apdex formula directly into queries and show them in a variety of reports.

Regardless of your report generation method, you will need input data. This turns out to be the tricky part. Many measurement tools do not make it easy for you to export measurement samples in a useful format. Most vendors want you to stay within the confines of their product. In fact, they help you import data from other sources but once there, they really don’t want you to export.

The same Apdex Tools Directory lists a dozen vendors that supply data which can be used to generate Apex reports. The most valuable part of the directory is the explanation of how to export the data and what kind of format the data will be when exported. This gives you a guide to match the measurement tool and your report generation tool.

Measurement vendors should make all of this much simpler. For example, they can add a generic report design capability where the user can define what data they want to see and how they want it processed (like the Apdex formula), and then how the report should look. If vendors continue to try and force their tool users of to see only reports that they designed then they must provide a wide range of reports including Apdex.

We are seeing an interesting pattern among enterprises that have created the following solution to the problem. They generate production (ongoing) Apdex reports using their own software using input data from various vendors. They use their own data format and conventions as a way to “normalize” the measurements from the various vendors. This then becomes the common data set which a variety of reports can use.

Enterprises that are serious about Apdex describe three benefits to this approach. First, they do not have to sort out differences in how the vendors generated their unique reports. They simply don’t use the vendor reports. Second, they can integrate data feeds from a variety of vendors into a single report. This would be nearly impossible with any of the vendors that have designed their product thinking they are the only measurement and reporting tool at an enterprise. Their reaction is, “You mean you use my tool and another company’s tool?” Finally, they can now add and remove measurement vendors as needed without changing their reporting system. They call this the “vendor isolation” feature.

If you are a measurement or management vendor you should be worried. This approach reduces your value to the enterprise to commodity status. Vendor isolation also means easy vendor replacement.

We think vendors have two choices. Increase value to your customers with better and more flexible reporting capabilities so your customer can build his own Apdex reports–or slide down the slippery slope to a commodity low-price data feed role. Oh, yes, there is a third choice: add Apdex reports directly into your product.

If you are a vendor that generates Apdex reports or can supply measurement data into a customer’s Apdex reporting engine, we want to know about it. Please post a comment to this blog or send an email to peter@apdex.org.

How Fast is Fast Enough?

Modern life is about speed-rushing to keep up. Your business application had better run fast, faster, fastest. Conventional wisdom dictates that application response time must always be faster. Whatever speed your users experience today, it had better be half that next year. Stop! Think about the consequences.

I once met with a client with a serious business problem-an application used by partners to support the company was too slow. So slow in fact, that frustrated partners were defecting to competitors. Everyone agreed that the response time was too slow. But when I asked, “How fast should it go?” chaos ensued with 20 people arguing over the target time. 

In the end they agreed that it should go faster as soon as possible and keep getting faster indefinitely. Peter replied, “This is good news since you just hired our firm for an indefinite contract that will have an infinite price!” The room went quiet, then people asked how they should determine how fast is fast enough.

There is no shortage of experts on this topic. James Gleick wrote a fascinating book entitled “Faster: The acceleration of just about everything.” His book is replete with examples of how the world has sped up as technology has enabled faster transportation, production, and communication. His thesis is that processes will go faster until we live a uniformly nanosecond world. When he wrote the book in 1999 he thought that day was nigh. But note that one of the great speed icons of that day-the Concord-was retired soon thereafter in 2003. There are no supersonic commercial flights available today. How can that be? Faster hit a limit.

No doubt some things will continue to happen faster, especially if is the speedup has a direct tangible benefit. The tangible benefit to business is improved productivity (more for the same money) or higher revenue (more money for the same effort). Did you notice the word money appeared twice? The first business rule of speedup is that it must improve the bottom line. The second rule is that the money it contributes (income or savings) must be greater than the cost.

Many projects that improve application response times are not cost effective. All technology improvements reach a point of diminishing returns, and business application speed is no exception.

I have had many animated conversations with colleague Peter Christy at the Internet Research Group over the years about the push and pull between “it’s never fast enough” and “speed has practical human-computer interface limits.”  The last time they had this conversation Peter Christy cited research that the human nervous system transmits cell-to-cell signals at about 1 millisecond. [Note that the brain and nervous system do that many times in parallel so the bandwidth must be big.] Peter Christy postulated that we won’t reach the limit of application response time speed until the computer interface matches human capabilities.

Many speed targets have come and gone over time. There was “the Web must respond in 8 seconds” rule promulgated by Zona Research at the turn of the millenium. Jupiter Research recently replaced that with 4 seconds as the new threshold of acceptability for retail web page response times.  And there is the old IBM 1 second rule from the 1970s. All of these rules were proposed by organizations with a vested interest in speed. By the way, they all told us that the world as we knew it would end if these speed rules were not achieved immediately. Has Microsoft gone bust because nothing on Windows happens in less than a second, or is Amazon just a mirage since most of their web pages load in more than 4 seconds?

The upshot is that if you believe that faster is always better, you are likely to spend too much, and however alluring one inflexible “fast enough” speed may be, it is a bad idea. The “right” speed depends on context. That context is a complex fusion of what the user needs to accomplish and how the application is designed. Taking this context into account is key to the Apdex methodology. Enterprises must figure out how fast is fast enough for application users to have a satisfactory experience.

I will be explaining Apdex approaches to determining the proper target time “T” in future posts.

But for now I have run out of time.

Gomez Showcases Apdex

At a Gomez users’ meeting held after Web Experience Forum on October 16 in Boston, users provided great commentary about how they use Gomez services to ensure good performance-including how they are using Apdex.  Imad Mouline, Gomez’ CTO, described where all of the Gomez application performance measurement services fit into an application management maturity scale of availability, response time, and consistency (good performance for all users all the time).  The Gomez Business Pulse XF portal integrates the views and leverages web standards (e.g., XML and SOAP) built into its measurement services to make it easy to tailor your own web service performance view.

As fans of Apdex, a Business Pulse XF feature that caught our attention was that it produces Apdex scores right from the Gomez measurement platforms.  The Apdex reporting tool is a simple-to-configure widget in the Business Pulse XF dashboard.

Norm Morrison of GSI Commerce, an e-commerce service provider, gave an excellent presentation on how GSI uses Apdex to track the performance delivered to all its customers.  GSI Commerce is a billion dollar company that provides design, hosting and operations support to about 100 major businesses like Toys”R”Us, Radio Shack and CBS Sports.  The company has been on a multi-year quest to improve how they define and deliver service quality to their customers.

GSI started defining and tracking quality using several metrics. Rigorous measurement and reporting of these metrics uncovered many small, previously unnoticed issues with capacity, product configurations, and vendor bugs.  This new understanding gave them confidence to press on to write service quality into their contracts.

GSI then struggled to expand the service metrics, searching for a simple response time metric they could link to customer business needs.  They tried and adopted Apdex, and their customers now get Apdex reports that uncover issues that remained unseen in simple response time averages or availability reports.

GSI is a great example of a company that is constantly improving quality.  The fact that they adopted Apdex as a better, more aggressive way to measure and report quality and share the results with their customers is a clear commitment to continuous improvement.

What is your company doing to truly foster continuous service quality improvement?  We are looking for similar stories.  If you have an Apdex story pass it on to peter@apdex.org so we can share it with the Apdex community.