How I may help
LinkedIn Profile Email me!
Call me using Skype client on your machine

Reload this page Load, Stress, Performance Test Terms, Deliverables, Profiles and Reports

This page presents the formatting and presentation of a sample performance profile . It is a companion to pages on Performance Testing Plananother page on this site each aspect of performanceanother page on this site of an application — based on statistics, graphsanother page on this site created by load testing toolsanother page on this site such as LoadRunneranother page on this site executing scriptsanother page on this site.


RSS XML feed for load testers RSS preview for load testers Site Map List all pages on this site 
About this site About this site 
Go to first topic Go to Bottom of this page

Set screen Introduction: Our Flow of Deliverables

Concerns Questions Project Technical Objectives Types of Testing Business Processes Metrics Requirements Budgets Recommendations Conclusions Analysis Run Results Flow of Information in a load profiling project

Go to Top of this page.
Previous topic this page
Next topic this page

Set screen Definition of Terms

    Results from a test run (such as in these statistical reports and graphs generated by LoadRunneranother page on this site) are the values obtained from measuring the impact of a specific set of run conditions.

    Conclusions (such as theseon this page) are subjective decisions (a proposition or claim) reached after (hopefully) thoughtful consideration of the facts drawn from evidence provided.

    In formal statistics, a conclusion evaluates the accuracy of prior hypotheses that is either accepted (confirmed) or rejected based on the outcome of experiments.

    Conclusions are presented organized to the questions and forms/types of performance testson this page

    A finding is a determination about the scope, validity, and reliability of observed facts (data). Example:

      "The GUI Response timing of an average 1.3 seconds to obtain a response to a valid login request reflects what might be typically experienced by normal production users."

      This statement limits its scope to:

    • valid requests from "positive" test cases, not invalid requests from "negative" test cases intended to end in errors.
    • login requests only, not any other type of request.
    • typical loads, not heavy loads with a lot of users
    • normal users, not users processing an extreme amount of data.
    • users in the open production enviornment, not in a closed testing/development environment.

    Findings provide the premises (the "truths" or evidence) providing the basis for making conclusions.

    All this is the path to a well-reasoned approach to the management of Performance, Scalability, Reliability (PSR).


"Young Scientist", 1932 by George Ericson, a.k.a. Eugene Iverd (1893-1936), Oil on canvas at the Erie Art Museum, Erie, PA. Get this print on your wall!
Get this print framed for your wall!

Go to Top of this page.
Previous topic this page
Next topic this page

Set screen Concerns, Questions, Metrics, and Goals

Go to Top of this page.
Previous topic this page
Next topic this page

Set screen Requirements

    An example of a complete non-functional performance requirement is:

    500 of 1,000 logged-in users, who pause an average of 20 seconds between a mix of requests (designated in table below) containing between 20 and 30 line items, obtain a completion response with no errors on IE6, IE7, and Firefox browsers in under 6 seconds 95 percent of the time. This measurement accesses the 1GB corporate local area network (LAN) during working hours (7 A.M. to 7 P.M. EST) with a normal load of background processes running throughout the period measured.

    This statement answers basic questions:

    1. How many users can the system handle?
    2. What is the system's maximum throughput?
    3. What are the sensitive hardware components?
    4. How many servers are required at each tier?

Go to Top of this page.
Previous topic this page
Next topic this page

Set screen Project Technical Objectives to Address Concerns

    The project objectives addressing concerns which prompted this project are:

    As a practical matter, bugs in Configuration, Installation, Security, and Failover/Recovery (Robustness) are often needed before conducting Performance, Load, and Stress testing:
    # Concern Questions Project Technical Objective: Type of Testing
    I. User productivity a-e Conduct speed testsanother page on this site to estimate the responsiveness of each user action. This identifies opportunities for application and configuration tuning.
    II. Operational efficiency:
    Stability of the configuration

    Conduct longevity testsanother page on this site by running a low or normal level of work over a long period of time. This identifies the extent of variation and spikes.

    Conduct failover testsanother page on this site by stopping various processes while running various levels of load. Such actions should result in redundant components taking over for the primary nodes. This also includes failback to make sure that work resumes to normal after components come back online.

    III. Stress on the common database machine m-n Measure the number of bytes between client and server.

    Execute the most resource-intensive business processes to obtain database machine CPU utilitization metrics at various levels of application loadon this page

    IV. Capacity of the configuration o-q Conduct stress tests another page on this site to measure user response time (and errors) at various levels of application loadon this page This determines the number of servers needed, which impacts product pricing. From this extrapolate the lead time and trigger point for upgradesanother page on this site

    Continue overload tests another page on this site running after server recycles or shuts down to see if the application can automatically recover after being flooded.

    V. Resource Utilization r-s Repeat Stress and Longevity Tests to determine the impact of various tuning options (such as application software versions, utilities, OS settings, JVM settings, etc.).
    VI. Capacity for growth t-w Conduct Scalability testsanother page on this site by repeating tests for each configuration.

    Conduct volume testsanother page on this site by running a large amount of data to ensure that the system can accomodate them at acceptable speed.

Go to Top of this page.
Previous topic this page
Next topic this page

Set screen Project Budget Performance

Go to Top of this page.
Previous topic this page
Next topic this page

Set screen Issues (Discoveries and Recommendations)

    This table presents the concerns which initiated the project together with the subsequent example observations and discoveries found during load profiling and analysis.

    Concern Possible
    Description / Analysis Recommendation
    VII. Extent and Efficiency of Testing Effort (Testability) In HTML, no differentiation of rows for counting among different tables. In order to obtain a count of items in different tables on the same page, a unique identifier is needed for each type of row. Add a unique CSS class= attribute to each type of row. This is usually a design requirement.
    I. User Productivity File returned to client is more than 500,000 bytes. This guarantees long response times and potential timeouts. Pre-cache files in smaller pages hidden in sign-up pages or download in background.
    Use of UTF-8 ContentType for English-only pages. Additional time is required to process vs. ISO-8559-1. Program specification only for pages which need it.
    No indication that system is working during long processes. Users are likely to abandon the session, click refresh, or other actions which cause even more load. Show a "searching... please wait" screen for responses known to be over 5 seconds.
    when server is overloaded, users see no screen or technical default text. Cryptic HTTP "500" error is shown when servers are too busy to respond. Show a "Busy ... Please Try Again Later" screen to users who are not allowed to login due to server overload.
    The first user of the day experiences long response times. Servers wait until users request specific transactions before loading them into memory, a task which may take several minutes. When server services start, automatically load programs into memory by configuration settings or invoking fake users.
    Users must make the same filtering selections repeatedly. Values to filter data specified by each user are not presented again. Retrieving data that users discard consumes CPU, memory, network, and other resources. Filter out data that users usually don't want.
    III. Stress on the common database machine Server error after 5 minutes. JVM diagnostics graphs showed that memory peaks at 250mb. This is the default value. Specify -xmx:2000m among JVM startup parameters.
    Server error after 15 minutes. Parallel graphs of diagnostics showed that the number of Weblogic sessions flattened out at 250. This is the default number. Since the timeout is 20 minutes, runs require 35 sessions per user per minute. Specify the maximum number of sesssions in the config.xml file.
    High disk utilization. 10GB of disk space is consumed per hour of peak load. In productive system simulations, use "Error" level logging.
    Maximum app loads did not overload the DB server. The major concern of this project was the impact on the Oracle machine. Runs at the largest application volume increased CPU utilization by no more than 25% with AP transactions, which had the most impact on the server. Identify and test for the total possible load on the DB running all apps at possible peak loads.
    II. Operational Efficiency: Stability of the configuration (Readiness of the app. for production) An image file was not found on page "Xyz". Microsoft browsers automatically request the favicon.ico file, which generates an error if it's not on the website's root folder. Workaround: Script loadtest to ignore the "404" error. Root cause: Provide the file with the name expected by the app code or change the code.
    II. Efficient
    Spikes in performance. Longevity tests confirm that spikes in response time were eliminated after changing the JVM run-time settinganother page on this site in the server start-up to specify a) more memory to permgen, b) availability of multiple processors, c) incremental garbage collection.
    Server shutdown during overnight runs. The server shutdown near the end of Longevity tests because it ran out of file handles. When the OS was setup with the maximum rather than the default number of file handles, the app completed longevity tests. An additional temporary workaround is to recycle each process once a day. Workaround: Configure the OS with more file handles/descriptors.
    Root cause: Change app code to explicitly close files.

    Idea To better manage follow-up, action items may be entered into a "defect" tracking system or task/project management system.

Go to Top of this page.
Previous topic this page
Next topic this page

Set screen Business Processes

    Here are examples of business processes for a general financial application.

    Business Process #
    0. LL Login / Logout 2 12 s 0.060 300 500
    1. AP Accounts Payable [6 lines] 23 8.5 m 0.160 10 15
    2. JE GL Journal Entry [14 lines] 6 10 m 0.001 10 20
    3. RC GL Report Creation [1 acct] 5 24 s 0.160 25 50
    4. RR Report Retrieval 3 12 s 0.260 50 100
    5. EV Employee Expense Creation [4 lines] 2 10 m 0.360 100 400
    Combined 41 42 m 1.001 300 500

    # Steps provides for each business process a count of its user dialogs — the number of "round trips" to the server after the user clicks a submit button or a link. This link provided with the number is to a list of the dialogs and the names of transactions measurements.

    Iteration Time1 is the total amount of time needed to complete all steps of the business process. (This can be obtained from VuGen during load script development).

    TPM /User (Transactions Per minute per User) is the TPS (Transactions Per Second) multplied by 60.

    Peak# Users is the peak (largest) number of users that may perform that process all at once, such as (in the case of login) each work-day morning, and (in the case of business processes) around each accounting period-end.

    Max# Users is the maximum number of users that can possibly use the specific process all at one time.

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Script Lines/Function Points

  1. How many features/lines (or function points) should be/are in scripts created to conduct load test runs index
  2. Some development managers use the number of lines of code as a rough metric to measure the complexity of systems and the productivity (efficiency) of developers. This is controvertial because better developers make use of reusable libraries and coding techniques to create more robust systems, but take more time to create.

    Set screen Test Run Length

  3. How long should/do test runs take index
  4. The turnaround time of a run is determined by the ramp-up and run-length strategy, which are different for each type of test.

    At the beginning of testing efforts, time to develop load testing scripts can lengthen test turnaround time.

    This time is tracked by recording timestamps when the notification of a configuration change is received and the time test results are published.

    Idea The amount of time needed for each test run should include the amount of time needed to manually prepare run conditions (such as running a program to reset data values) before each run as well as the amount of time to manually collect run logs and analyze run results.

    Automation of run log collection can speed this up.

    Set screen Number of Test Runs

  5. How many runs should it / did it take index
  6. Idea Organization which rate themselves high on CMMI should have this figure as an outcome of planning efforts, with expected numbers coming from an analysis of metrics gathered from previous similar projects.

Go to Top of this page.
Previous topic this page
Next topic this page

Set screen Observations

    These are good candidates for "Six Sigma" improvement projects.

    Several metrics that affect the performance and capacity of an application can be obtained even before load testing runs are completed.

    Reminder These metrics need to be measured manually, with a stopwatchanother page on this site Or maybe a calendar

    Set screen Environment Complexity

  1. How quickly can configuration changes be made index
  2. Idea We use a test log spiral notebook to record:

    • the timestamp of the email requesting a configuration change to be identified.
    • the timestamp of the email requesting a test run for a particular configuration change.
    • the time (work hours or days ) calculated between these two times

    Set screen Backup Imaging Time

  3. How long does it take to backup disk images off servers index
  4. This affects the amount of time for testing.

    Set screen Image Restore Time

  5. How long does it take to restore disk images on servers index
  6. Set screen Recovery/Reboot Time

  7. How long do servers take to restart servers index
  8. Idea We use a test log spiral notebook to record:

    • the timestamp of when the command is issue on the first server
    • the timestamp of when "ready" is issue by the last server to start
    • the minutes calculated between these two times

    Set screen Failover Time

  9. How quickly can each component failover to redundant servers index
  10. This is deteremined during Failover testing.

    Set screen Failback Time

  11. How quickly can the system failback from redundant resources index
  12. This is deteremined during Failover testing.

Go to Top of this page.
Previous topic this page
Next topic this page

Set screen Graphs and Dashboards

    Graphs presented here were generated using Mercury Corporation's LoadRunner package or using Microsoft Excel. While adequate, LoadRunner does not provide graph customization features to create dashboards which are useful for several reasons:

    • Placing several charts on the same ("dense") page allow interrelationships to be more easily identified and analyzed.

    • Performance and capacity measurement projects are not one-time projects, but an on-going endeavor.

    Using Microsoft Excel is a two-edged sword. I prefer it because it is the most common package. I don't have to beg my employer to buy it to get my work done. Since some companies may not want to pay for it, I may be stuck using Excel anyway.

    However, because of its power, Excel can be difficult to master. But I can show you how it can be done. After an initial investment of a few hours, you would develop an impressive skill that you'll take with you.

    Set screen Analytics

    tool While adequate, LoadRunner does not provide the dynamic presentation features in data visualization software packages.

    "Visual analytics" apps work by reading data from ordinary Microsoft Excel spreadsheet files into files that PowerPoint, PDF, and Flash-enabled web pages use to enable interactive exploration of graphic data — automatically switching graphic presentations in real-time response to variables specified by moving slider bars, accordian menus, and other "spiffy" user interfaces.

    Packages from several vendors enable wider sources of data, such as XML from web services and direct connection to Oracle or SQL databases.

    For consistency:

      Idea vertical sliders are used to specify various time frames being displayed.
      Slide to the left for more historical data.
      Slide to the right for more recent data.

      Idea horizonal sliders or circular speedometers (like the iPod wheel) are used to specify various levels of load on the servers.
      Slide to the lowest point for results associated with the minimum configuration.
      Slide to the highest point for results pertaining to the largest configuration tested.

      Idea pull-down selectors are provided (instead of sliders) to specify non-continuous items such as departments.

    Feature This rose icon marks graphs which may benefit from this technology.


Go to Top of this page.
Previous topic this page
Next topic this page

Set screen Results from Each Measurement Run

    Information in the "Raw Speed" table below displaying performance results were collected from the start of script development efforts:

    • Imp. (Importance) provides the Importance of the dialog. The "MoSCoW" approach uses these designators:

      • Must have (a mandatory requirement required for basic operations).
      • Should have.
      • Could have, if time permits.
      • Won't have (restrictions on features which might pose security risks, etc.)

    • Manual Step describes the title of the page that should be returned after users take action.

        view provides a link to a captured screen image of the app.
        "for" precedes the title of the page the script expects.

    • Mix is the percentage of iterations in which the action is expected to participate.

    • Think Time is the amount of time experienced users typically need to perform the action. The standard times are:

        2 seconds to find and click a link on a page with 5 or less items.
        5 seconds to select an item in a pull down menu.
        8 seconds to click on the User ID, type in a password, and press Enter to login/sign-in.
        4 seconds per field on a submitted form.

    Our scripts are coded so that statistics are captured for each action run with a single user.

    • Bytes 1 received from each page,
    • Speed 1 is the milliseconds to respond to each page. 1000 milliseconds equals 1 second. This number answers the question:

    Idea These numbers can potentially be used by load scripts to detect anomalies in responses during runs, such as issuing a message if less bytes are downloaded than expected for a particular page.


Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Raw Speed

  1. What is the fastest system response time that users can expect from each dialog/screen/function of the application index

    The contents of this table is described in the above sectionon this page From LoadRunner Analysis
    Summary Report
    Imp. BPon this page Manual step (Use Case) Mix Think
    Min Avg Max SD CV
    Must LL 1. Invoke homepage URL screen captured 90% -- 1_InvokeURL 43212 3212
    High LL 1.2 Home on main menu screen captured
    for "Employee facing registry page" screen captured
    40% -- 2_ 43212 3212
    High LL 1.3 Logout screen captured 20% 2 9_ 43212 3212
    High TS 3.1 Time sheet Menu screen captured link screen captured 40% 2 TS01 43212 3212
    High TS 3.2 Lookup screen captured Dept 22% 6 TS02 43212 3212
    High TS 3.3 Time sheet Entry screen captured Submit screen captured 38% 6 TS03 43212 3212

    To better visualize the statistics, this barchart ranks transactions. For each item:

    • The maximum (longest/slowest) time observed during the run is illustrated with a red bar.

    • The minimum (shortest/fastest) time observed during the run is illustrated with a blue bar.

    • The average time during the run is illustrated with a light-green bar.

    • The median time during the run is illustrated with an dark-green bar.

    Reminder This graph should be generated for a run at a single pace (the same number of virtual users) throughout the run.

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Impact of User Errors

  1. How fast does the application detect, report, and recover from various user errors index


    • Application invocation with missing resources
    • For each type of user (user role)
      • Registration
        • username already used
        • email address not supplied
        • inadequate password

    Idea To augment the Summary Report generated by LoadRunner's Analysis program, I copy and paste it onto an Excel spreadsheet, then

    1. Highlight the screen/step with the hightest variation by adding a "Coefficient of Variation" column calculated by dividing the average into the Standard Deviation. So the larger the ratio, the greater is the variation relative to the average.
    2. Highlight screens/steps which have response times higher than a threashold of 2 seconds by adding a flag
        =if( B2 > 2000 , " <<< ", "" )
    3. Merge numbers from the sheet with the table above for a consolidated presentation.

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Consistency of Response Time Speed

  1. How frequent do spikes in response time occur index

    Spike on Response Time. Click to open image in a new window

    This line chart presents the results of a run at the same conditions over several hours.

    Caution! Data values for these types of charts need to be presented at the lowest granularity (such as once per second, as shown here). Otherwise, individual spikes would be averaged in and thus not appear.

    The mean time between failure (MTBF) statistic is calculated by dividing the number of spikes observed into the length of the observation period (such as 8 hours).

    Feature To analyze why, we drilled down to the small time frame specific to when the "blip" occured on various servers.

    another page on this site Contention testing is often necessary to identify occassional spikes in response time.

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Speeds at Various Data Loads

  1. How much degradation in response time can be expected as the application processes larger amounts of data index

    This question is answered with Data Volume Testing, when the maximum amount of data expected is loaded on the system so that its impact can be measured.

    Volume testing is especially important to measure database performance because different size datasets require different indexing and caching strategies for maximum efficiency. Adding indexes to large datasets is the most common approach to improving performance from databases. On the other hand, indexing a small and frequently referenced dataset can actually slow processing speed. More on Oracle database architecture another page on this site and performance another page on this site

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Speeds at Various User Loads

  1. How much degradation in response time can be expected as the application gets busier (process more users/transactions simultenously) index

    Two approaches to running Stress Tests were used to answer this question:

    1. Gradually increase the load by increasing the number of simultaneous users until the server chokes. The results of this approach is shown on the first graph to the right.
    2. "Stair-step" constant loads for a certain amount of time. The results of this approach is shown on the second graph.
    Idea Use of a third-party tool such as Excel to present information provides the freedom to use the Median rather than the Average presented in standard LoadRunner Analysis reports.

    This more sophisticated (some may say overly complex) visualization is this "High-Avg-Low" chart (formatted using MS-Excel) provides averages, medians, and variation statistics at each level of load (rather than combined together as with the first type of run).

    Statistics from the first type of run is less useful because by default run averages include the spikes at the end. Data values can be filtered to the specific time period of interest. But "ramp-up" effects are included at every point.

    Results from stair-step type runs are more realistic to actual patterns of usage. More importantly, the stair-step approach provides information about the variability of response time at various steps.

    Drop-down selections (or Forward and backward buttons) are provided to see the impact from varying run conditions (such as different configurations, different versions of software, different instllations of hardware, etc.).

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Run Longevity

  1. How long can the application run before needing manual interventionindex

    Click to open image in a new window

    Conducting a longevity runanother page on this site over 22 hours identified the avarage response times in this graph.

    Caution! If there is only time for only one run, this statistic should be obtained from a run at high load level (but still sustainable) loads.

    The Variability statistic is measured using the standard deviation calculationanother page on this site

    The curiosity here is whether there a statistically valid trend to responsiveness improving or degrading over time.

    Feature To analyze trends, we can use accordian menus to view consolidated and detailed views of specific time frames.

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Data Transfer

    How many bytes are sent back and forth between client and server? index

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Resource Consumption Dashboard

  1. How many resources (servers, memory, disk space, connections, file handles, etc.) does the app consume index

    Memonic Resource
    Tier 1:
    Load Balancer
    Tier 2:
    Web server
    Tier 3:
    App server
    Tier 4:
    DB server
    Brain: Memory: 58%
    1023 MB
    209 MB
    809 MB
    1209 MB
    Arms: Disk
    Legs: CPU Util: 58%
    Feet: Network
    55 mbps
    20 mbps
    40 mbps
    58 mbps
    The table here presents measurements horizontally one column for each system tier/machine.

    Vertically, the different metrics are arranged according to portions of a running person shown here as a memonic for the metrics.

    Values are recorded during runs at peak capacity.

    Typically, the highest utilization of a particular resource in one tier/machine is the bottleneck that limits the capacity of the entire system.

    Feature With an interactive chart, one can drill-down to specific components of processes that are consuming resources, by clicking on that bar.

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Resource Consumption Alerts

  1. At what level of resource consumption should operations be alerted for manual intervention index

Go to Top of this page.
Previous topic this page
Next topic this page

From a 1MB Powerpoint 2003 slideshow containing voice narration:

Copyright 2004-2005 Wilson Mar. All rights reserved.

Set screen Capacity Metrics Analysis

    Sample action conclusion/recommendation:
    Additional capacity is needed by December 1st to meet new peak usage anticipated.
    To meet this growth, we will need to begin work to add an additional __ servers on November 1st this year, a one month lead time.

    Sample analysis:
    This conclusion was calculated based on these findings:

    1. At present, peak usage is 38 simultaneous users (on each of 4 app servers).
    2. 78 simultaneous users with no "think tiime" (on each of 4 app servers) is the point where performance degradation becomes noticible at 4 seconds response time.
    3. Our user base is growing at the rate of 30 "simultaneous users" per day.
    4. So we will reach our peak number of users in 40 (78 - 38) days, which is November 1st.
    5. It takes a maximum of 10 days to order, install, configure, test, and integrate a new server.
    6. So we need to begin the ordering process on (10 days before December 1st).

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Throughput (Capacity) Per Second vs. Response Time (Speed)

  1. At what load does the application process at unacceptable response times (e.g., over x seconds) index

    This chart illustrates (rather typical) behavior: Click to open image in a new window.

    The analysis:

      At low load of a single user processing 0.5 transactions per second, response time is fast (0.4 seconds).

      At a moderate load of 35 simultaneous users processing 8.8 transactions per second, response time doubles to 0.8 seconds, but it's that acceptable performance.

      At a high load (such as 100 simultaneous users in the above example taking response time to 8 seconds), the server can complete transactions about as fast as when it had less transactions.

      This indicates that the system has reached its point of "job flow balance", the tps rate which requests are processed as quickly as they are received. Requests arriving faster at this rate would be queued.

    The Mercury Capacity Planning (MCP) Visualizer module displays simlar data using this format, providing a pull-down manu to quickly access charts by individual business function:

      Click to open image in a new window

    Calculating Load and Five Considerations For Large Scale Systems

  2. At what load does the system reject transactions

    If the rate of requests is relentlessly beyond the job flow balance rate (such as beyond the persistent rate of 200 simultaneous users in the above example), eventually the server runs out of queue space. If it can't allocate less time to each user, it then returns errors or even shuts down.

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Point of Overload Failure

  1. At what overwhelming load (or other conditions) do servers shut down index

    This can include denial of service type attacks.

    Set screen The Bottlenecks

  2. How quickly do the servers/services restart on their own after being overloaded by a sudden overwhelming load index

    In production, if servers can restart automatically on their own, much time can be saved rather than assuming that system administrators have the diligence and the time to always watch the systems.

    Servers which require another server (such as a proxy or database server) to be up before it starts should be initialized with a process that checks the availability of those other servers and wait until they are available.

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Comparison Regression Test Results

  1. What is the impact of changes / tuning options (such as application software versions, utilities, OS settings, JVM settings, etc.) index

    Reminder This type of testing requires use of statistically valid methods to measure the likelihood that results occured due to chance.

    Reminder This approach requires the export of all "data points" (statistical "response variable") from LoadRunner to a statistics application which can generate a statistical presentation.

    Each setting that can be changed (such as an individual server configuration setting) is statistically a single input "factor", also called categorical variables or "treatments".

    Runs (trials) at the low, medium, and high "levels" of a factor (such as server setting) are considered three statistical "groups" of data.

    The calculation technique depends on the number of factors and groups:

    • A t test is used to determine whether there is a statistical difference between just two levels of a single treatment. This is also called "Student's t".

    • "One-way" ANOVA (Analysis of Variance) is used to statistically calculate the significance of differences in performance numbers after changing a single setting (such as an individual server configuration setting) — statistically a single "treatment" or input "factor" — runs at 3 or more levels (such as low, medium, and high value).

      The conclusion for a statistical difference among groups is called an "F Test" (named after Ronald A. Fisher who during the 1920s and 1930s pioneered the t-test for comparing just two population means). The F test is based on the F ratio of differences in the "variation" between groups over variation within each group (considered as statistical "random error"). The larger the F ratio, the greater the difference.

      The point where a specific F value becomes statistically significant is when it is larger than the critical value defined in a "F Distribution" presented in a static (paper) "F Table" or dynamically calculated by a program such as the tool Statistical Distribution Calculator client. Critical F values are adjusted for the number of groups and number of observations for a given confidence level (usually an "alpha" error of 0.05 or the stricter 0.01).

      Source of VariationSSdfMSFp
      between 3 groups 64 2 32
      within 69 observations 68 21 3.24 9.88 <0.01
      total 132 23
      formula for variance
      This table reflects the application of this formula of calculations needed to normalize postive and negative differences together and to adjust for the statistical impact of a small number of observations.

      Each "Mean Squared" (MS) value used in the F ratio calculation is the variance for a group of observations. It is calculated from dividing the sum of squares (SS) (squared deviations about the mean, called the variation) over N-1 degrees of freedom (df). "p" is for the confidence interval, called "alpha".

      For more information on ANOVA:

    • "Two-way" ANOVA is used to examine the effects of two factors, both together as the "Main Effect" and individually. [SAS code for 3-way]

      Excel supports this with Data Analysis "Anova: Two-Factor With Replication" and "Anova: Two-Factor Without Replication"

    • Repeated Measures (paired t tests) for longitudinal studies.

      Excel supports this with Data Analysis "F-Test Two Sample for Variances".

    • Measurement of several changes between runs (such as changing both software version and hardware configuration) would require MANOVA (Multivariate Analysis of Variation) techniques to test several independent variables (factors).

      Horizotnal Box and Whisker plot Vertical Box and Whisker plot
    Different statistical applications display results from ANOVA as a "BoxPlot" or Box and Whisker plot arranged horizonatally or vertically on the dependent variable (such as the number of response time in seconds, etc.)

      The mark (very small box) in the middle points to the median (or avarage) of each population.
      The larger box for each population illustrates the lower and upper quartile of values in that population.
      The "whiskers" above and below each box illustrates the overall range of the data (the standard deviation).

    Microsoft Excel users can use a "Volume-Open-High-Low-Close" chart format to approximate a BoxPlot/Box and Whisker Chart.

    Reminder Effect size is the difference between two groups stated as a percentage of a standard deviation (i.e., “14%”). It is the appropriate statistic for gauging the importance of comparisons. The guidelines are:

      greater than 50% = “large,”
      50-30% = “moderate,”
      30-10% = “small,” and
      less than 10% = “insubstantial, trivial”

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Load Balancing Test Results

  1. How well do provisions for balancing load allocate transactions evenly among machines index

    Does one J2EE application server instance in a cluster perform more work than the others?

    On which database instance are the most transactions executed?

    Sample Conclusion: Multiple machines are needed to realisticly create a large enough load to stress utility servers and services (such as to handle LDAP authentication, email, and database requests). Such servers are configured to serve many applications.

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Upgrade Regression Test Results

  1. Can the system scale "horizontally" when a server is upgraded with higher capacity components index

    Reminder This is a question of the cost effectiveness and stability of adding RAM or replacement of faster components (such as a faster I/O device or a motherboard with a larger number of processors/CPU's),

    The conclusion is whether the application and utility software are programmed or configured to take advantage of the additional hardware.

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Scaleability Test Results

  1. How well does the system "scale vertically" when more machines are added index

      Would doubling (or tripling, etc.) the amount of hardware (CPUs or machines) enable a corresponding doubling (or tripling, etc.) of capacity to perform work in a stable way?

    A 3 axis (3 dimensional) chart is necessary to illustrate the interrelationship of both 1) load and 2) number of machines on 3) response time.

    The format of this 3D surface chart (created using Excel) is based on the one in Neil J. Gunther's book — presenting the performance response resulting from various number of "m" machines/processors running at various levels of load (load factors such as "0.80" for 80% CPU utilization).

    This 3D surface chart was created by inputting the results of 25 separate runs into an MS-Excel spreadsheet.

    1. When the machine operates with 2 m's (along the back line), performance is 4 seconds for a single user and 6 seconds for 300 users.

    2. When the machine operates with 10 m's (along the front line), performance is 1.5 seconds for a single user and 2.5 seconds for 300 users.

    3. In between these two extremes above, such as when the machine operates with 6 CPU's, performance is under 3 seconds when servicing 100 users.

    The conclusion from all this is that to maintain response times under 3 seconds, the choices are:

    • Use machines with 8 m's and keep the load on any individual server under 280 users.
    • Use machines with 6 m's and keep the load on any individual server under 80 users.

    Furthermore, if 3 seconds is indeed the threshold:

    • Machines with only 2 or 4 m's should not be used.
    • A 10 m machine may be "overkill" if real loads do not reach a peak of 300 simultaneous users.

    Idea Within MS-Excel, 3D chart's Elevation, Rotation, and Perspective can be adjusted using this dialog under Chart, 3D View...

    scalability 3D response chart

    A requirement related to this can be stated as "multiprocessor effect (ME) of no more than 75%", which means that there should be at least a 75% improvement after a 100% (doubling) of hardware.

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Reserve Capacity

  1. How much longer can business volume grow before reaching a point of degradation index

    Reminder This type of testing is needed to provide assurance that scalability tests can be performed.

    See Capacity Metrics Analysis section above.

Go to Top of this page.
Previous topic this page
Next topic this page

    Set screen Production Availability

  1. What is the trend in response time over various ranges of timeindex

  2. How does our response time compare against our competitors over various ranges of timeindex

    Signal Processing of "noise".

Go to Top of this page.
Previous topic this page
Next topic this page

Set screen Production Monitoring

    Test scenarios often need to be modified based on results from the monitoring of actual production use (such as the distribution of time visitors spend viewing pages).

    Comparing results over time reveal how an application is performing versus how the application was expected to perform. Deviation from expectations should trigger an alarm message. The actual rate of user errors could impact the type of load on the system.

    tool Commercial Quality of Service Performance Management products for Network Operations Centers (NOC):

    Set screen Performance Management

    Based on the eBook ISBN:  0585309337 Foundations of Service Level Management (Indianapolis, Ind. Sams Publishing, 2000) by Rick Sturm, Wayne Morris, and Mary Jander

    Daily alerts focus on up-time operational status kept online for the most recent two weeks are:

    • Outage report by application by location
    • Response time report by application by location summarized at 15-minute intervals for the prime shift, and at 30-minute intervals for the off-shift
    • Problem reports by priority, including a brief description of the problem for critical and severe problems
    • Average problem response time by priority
    • Problems closed and outstanding by priority
    • Security violations and attempted intrusions

    Weekly volume reports focus on operational volumes Kept online for the most recent eight weeks are:

    • Workload volumes by application summarized by shift by day
    • Outage summary by application by shift by day
    • Recovery analysis for all outages of significant duration
    • Cumulative outage duration for the month by application
    • Response time percentiles by application
    • Security violations and attempted intrusions

    Monthly project reports focus on progress toward completing projects. Contents kept online for six months include:

    • Report card summary
    • Workload volumes by application
    • Service level achievement summary by application service
    • Highlighted problem areas and analysis

    Quarterly trend reports focused on overall satisfaction and structural trends and major initiatives. These include:

    • Workload trend report by application and user community
    • Customer satisfaction survey results
    • Service level achievement trends
    • Cost allocation summary
    • New IT initiatives



Go to Top of this page.
Previous topic this page
Next topic this page
Set screen

Go to Top of this page.
Previous topic this page
Link to Performance Engineer RSS 2.0 XML feed Atom 1.0 XML feed feeds
for Performance and Capacity Engineers...

How I may help

Send a message with your email client program

Your rating of this page:
Low High

Your first name:

Your family name:

Your location (city, country):

Your Email address: 

  Top of Page Go to top of page

Thank you!

Human verify:
Please retype:

Visitor comments are owned by the poster.
All trademarks and copyrights on this page are owned by their respective owners.
The rest ©Copyright 1996-2011 Wilson Mar. All rights reserved.
| | | |