Quality: The Process, Part III [Something different]

[continued from Standard treatment]

Something different

There are a number of other testing techniques that are used during development, and I want to touch briefly on a few.

One essential technique is known as “compatibility testing”. As the name implies, this is testing the software for compatibility on a variety of different system configurations. There are companies that will perform extensive compatibility testing, but this is not inexpensive. Alpha and beta testing should cover a range of systems, but it will be far from comprehensive.

For a Windows product, one must test on some flavors of Win9x and NT, at an absolute minimum, and preferably on every supported operating system. Game and multimedia products need to be tested with different video cards and sound cards. Products with printing features need to be checked on different types of printers, including at least a color inkjet and a laser printer, from different manufacturers. In short, you must cover as much of your target audience as absolutely possible.

Another external testing technique, related to compatibility testing, is product certification. This involves submitting your software for certification according to the rules of some program. Instead of checking different system configurations, product certification programs check other criteria, depending on the goals of the particular certification. These range in cost from free to very expensive.

For a slightly less formal review of the usability and general quality of the software, one can conduct “focus group” testing. Focus groups are essentially a collection of people in the target audience who are brought together in one location specifically to give their opinions and feedback. Professional firms can conduct such groups with quasi-scientific questionnaires, hidden cameras, and written analysis, for a tidy sum.

The easier and, in my experience, no less effective method to perform focus group testing is to find a location, such as the computer lab in a local school, and advertise free pizza and drinks for computer users who will show up and try your new product. I cannot comment on how this would work for business products, but it works well for games.

Finally, throughout the entire testing process, you need to conduct “regression testing”. Regression testing is a method of making sure that bugs that were fixed are not reintroduced into the program. This concept is really as simple as trying to reproduce each of the fixed bugs and making certain that they have not reappeared.

My first exposure to regression testing was a spiral notebook into which every bug was written as it was reported and checked as it was solved. Before we would send a game build to the publisher, we simply tested each item in the notebook as part of the test plan. It hardly needs to be more complicated than that.

[continued in Gamma testing?]

Quality: The Process, Part III [Standard treatment]

[continued from Beta move on]

Standard treatment

In most cases, companies use closed beta testing, limiting and controlling the distribution of beta versions of the software. Finding and managing beta testers becomes an issue, and finding good testers is a difficult challenge, so we need to discuss the closed beta process in more detail.

The unfortunate fact is that few users know how to properly test software, so if you are lucky enough to find a good tester, make certain that you keep that person happy. Useful feedback should be rewarded with a free copy of the program, at a minimum, and the tester should always be invited to participate in future beta tests. A good tester will outperform a dozen mediocre testers and, therefore, is very valuable.

A related problem is that many prospective testers will not provide any feedback at all, so it is necessary to invite more beta testers than you expect to need. You can anticipate that roughly half of the beta testers in a closed beta will not report anything at all, and some of the others will not be useful. In most cases, it is difficult to find enough beta testers, so it is unlikely that a product will get too many volunteers.

When looking for beta testers, cast a wide net. It is important to have as large a range of experience levels, methods of use, and system configurations as possible. It is a good idea to ask potential beta testers not only for contact information, but also about system configurations and software experience.

Remember, some of your potential customers are likely to be struggling with computer illiteracy, so it makes sense to have some less experienced testers as well. Knowledgeable users will often figure out how to do something, or find a workaround, on their own without indicating that there may be a problem. Neophytes, on the other hand, will ask questions that customers would ask. Do not rely solely on other developers for testing unless your product can only be used by programmers.

The best means of communication for a closed beta process is beta forum of some kind, in which beta testers can interact with each other. This helps establish a sense of community that works to support tester involvement and breeds loyalty to the product. From a practical standpoint, this also allows problems to be independently verified by other testers, and they will often work together to help you replicate a bug. There should also be an email address for bug reports, but forum participation should be encouraged.

It is important to remember that beta testing is not an adversarial process. Let me say that again. Beta testing is not an adversarial process. It can sometimes be very difficult to take criticism, but you must be certain not to get defensive. Always wear a (virtual) smile. Beta testers are there to help you, and it is far better to hear about problems now rather than after release.

All feedback is beneficial, so you should listen to everything that is reported. Try to respond to every report so that testers know you are listening and involved, which gives a psychological incentive to do a better job. Avoid being dismissive, as that discourages participation. Also, make it clear that you appreciate the reports, even the negative ones, since some testers are reluctant to report bugs or bad impressions if they feel that you will be insulted. Many reports are preceded by apologies.

One technique for keeping testers involved is to provide means of communication that does not necessarily involve bugs reports. Informal surveys about aspects of the program or system hardware questionnaires give testers a change to participate even if they cannot find any bugs (which is the goal, after all). In my last beta test, I decided to try a little contest. I found three unreported bugs in different areas of the game and challenged the testers to find them. The number of valid bug reports increased measurably.

[continued in Something different]

Quality: The Process, Part III [Beta move on]

[continued from Greek to me]

Beta move on

When the program is feature complete, or approaching that stage, it is time to consider taking the next step. One step from alpha is beta, so we should now look at “beta testing”.

Beta testing is the most recognized form of black box testing, in which the software is submitted to users outside the company for additional testing and feedback. Generally, these testers are not professionals, but rather should represent a typical cross-section of potential customers and users.

Since beta testing is often the first external exposure of your product, it is important that the alpha testing and glass box techniques have produced a reasonably solid program. It may be a cliché, but there is not a second chance to make a first impression. When a tester’s first experience with a product is lousy, he or she will be less likely to get comfortable with it. If you know that there are lots of bugs, then your software is probably not ready for beta testing.

A practical reason for making sure the software already shows a standard of quality when beta testing begins is that obvious bugs will be reported multiple times, and less severe bugs will be overlooked. When a tester finds a number of problems, he or she may relax the reporting or assume that one bug is caused by another. Also, some bugs do cause a multiplicity of symptoms, and tracking becomes more convoluted.

There are two primary forms of beta testing, “open” and “closed”. In open beta testing, the developer announces the availability of a “public beta” version of the software, and any interested party can download and test the software. For closed beta testing, the developer provides a “private beta” version of the software to a limited number of known testers.

Companies may use either or both forms of beta testing. The main advantage of open beta testing is that the software can be tested by lots of people to cover a wide array of systems and uses, at the expense of control and a possible impact on the marketing plan. On the other hand, closed beta testing provides the developer with better control of the process, but the disadvantage is that it is hard to find testers.

Some companies use both forms of beta testing, starting with a closed beta and then expanding to an open beta program once the program is closer to release. Microsoft, for example, runs an extensive closed beta testing program for DirectX, including the SDK and the runtimes, which lasts for several months each version, but near the end of this process, the beta runtimes are made available for public download. [Note: Microsoft has since ceased proper testing of DirectX SDK releases and is now a counter example, not to be followed.]

For either form of beta testing, you should insert a “drop dead” date in the code, so the program will not run after a certain fixed date. This prevents the beta from entering general circulation and reduces testing of outdated versions. Note that this technique should never be used for release versions, so you must remember to remove it before the final version. You must also remember to update the date with each new testing version lest you have a valid beta timeout prematurely.

Just as a feature complete product signals the approaching end of the alpha testing phase, the impending completion of the beta testing phase is signaled by a “release candidate”. A release candidate is a version of the product that is potentially the release version of the software. At this point, testers should be instructed to report every bug they find, even if they have reported it previously, since all bugs should have been eliminated. If bugs are corrected, another release candidate should be created and tested.

For the first release of a product, the traditional beta version numbers start at 0.90 and approach 1.0, the release version. I know of one game product, on which I did not work, that had so many beta versions that the producer gave the team shirts that read “Version 0.99999999…” with the nines running all of the way down one of the sleeves.

[continued in Standard treatment]

Quality: The Process, Part III [Greek to me]

[continued from Quality: The Process, Part III]

Greek to me

Every program with more than seven lines of source code has bugs. It is important that software developers do whatever is feasible to eliminate bugs. With mass market software, one can be confident that even rare bugs, when multiplied by thousands of users, will be discovered. Bugs in some specialized and vertical market software could actually cause damage or injury. In any case, when distributing shareware, bugs will cost you sales, so quality will directly help your bottom line.

The most innovative approach to elimination of bugs, which I must credit to Barry James Folsom, involved a simple corporate proclamation. As the new President, he called for a meeting and all of the several dozen developers in the company were gathered. After an introduction, he declared that none of our software would have “bugs”. From that point forward, it could only have “defects”.

It may not be terribly practical to simply redefine terms and create quality, but this dubious proclamation did have a point. When a customer or, in the case of shareware, a potential customer is using the software and it fails to work properly, that is a problem. “All software has bugs,” is not comforting, so we need to look at the software the perspective of a user.

Let’s start at the very beginning, with alpha, or more specifically, “alpha testing”.

Alpha testing is a form of black box testing that is performed in-house. In practical terms, alpha testing is simply the developer using the software in the same way that a customer would, prior to making the software available to others.

After each version of the software is ready, I close all my development tools, clear the registry and data files, and pretend to be a user seeing the program for the very first time. I start by running the program installer, and then launching the game (in our case) using the installed shortcut, as opposed to the debugger. I will then just play the game for a while, recording any problems that arise.

Once I am comfortable that the program is working as intended on my development system, I then copy the installer to at least one other test system. Rather than install the software myself, though, I enlist somebody else to do it. This can be a colleague, friend, spouse, child, parent, pet, or benevolent stranger. I provide no other instruction, and note where any questions are asked. Any problems witnessed here will also be experienced by users on a larger scale.

In a formal testing environment, alpha testing involves testers systematically checking the software according to the specified test plan, combined with actual use of the software. In a corporate environment, the test plan is executed by the QA department. In small businesses, it generally falls on the programmers to follow the test plan. In either case, anybody willing should try using the software. In a larger company, I would throw an “open house” to show the software to other employees. As an independent, simply having the game available for play is sufficient.

Alpha testing should begin as soon as the software is usable, and this will necessarily overlap with program development. At some point during the alpha phase, the software should become “feature complete”. This means that all intended features for this version are in the program and functional. It does not mean that the performance is optimized, nor does it mean that the interface is finalized, but it should do everything that it was intended to do.

[continued in Beta move on]

Quality: The Process, Part III

[This article was originally published in the January 2003 issue of ASPects.]

Good things come in threes. Literature is rife with examples. Jack (of Beanstalk fame) received exactly three magic beans for a reason. However, with deference to Sigmund Freud, sometimes an article is just an article.

In the first installment of this trilogy, I introduced some foundational concepts for testing, including planning, some quality assurance terminology, and classification and tracking of bugs. The second part, the story bridge, covered general tools and techniques that can be utilized during product development. In this, the conclusion, I will discuss testing methods used as the software reaches a functional stage.

[continued in Greek to me]

Quality: The Process, Part II [Getting some help]

[continued from Automatic or manual]

Getting some help

Up to this point, I have discussed a variety of methods for improving the quality of software that can be implemented solely by the programmer during the development. However, as the program gets closer to completion, it becomes important to enlist the help of others for black box testing and feedback. That will be the topic for my next installment.

In the meantime, there is an opportunity to implement some of the above tools and practices into your development process.

Gregg Seelhoff is an independent game developer and charter member of the Association for Professional Standards [now defunct].

Quality: The Process, Part II [Automatic or manual]

[continued from Beyond the build]

Automatic or manual

At each development stage, an application has some new or updated features that will need to be tested thoroughly, beyond a quick execution of the program. Certainly, the code should be pretty solid after having passed through some of these tools, but there is still no guarantee that the results produced are actually correct, except to the extent that they are manually checked.

It is very important that you test your application to make sure that it withstands unusual input and produces correct results, or fails gracefully, especially if your software can be used for mission critical operation. This will often involve checking more input and output than a team of testers can conveniently generate, so this is where automated testing tools can help you with quality assurance.

One type of automated testing tool interacts directly with your source code and automatically generates special code, known as a “test harness”, which deliberately throws unusual parameter values at routines and monitors the results to make certain that the routines handle unexpected values reasonably. These tools have a number of different configuration options, but their general nature prevents them from having specific knowledge about a particular program.

Another type of automated tool interacts with the interface of a program, essentially providing a somewhat more sophisticated approach to what we use to call “keyboard testing,” which was just banging randomly and rapidly on the keyboard in an (often successful) attempt to crash or confuse the program. This type of testing is more appropriate for some types of applications than others. We have never investigated using this approach for testing our games, though a young child is a good substitute.

Developers can, and should, provide this type of glass box testing for their own products. You can write test harnesses that explicitly call routines with certain parameters and check for valid results. One excellent method for doing this, especially during optimization, is to have two separate routines that use different techniques for generating the desired results, and then run both routines, comparing results. This also allows you to profile both routines under the same conditions and ultimately use the better one.

For interface testing, you can use a standard macro recorder, software that records and can replay keyboard and mouse input into a program. Although this does not allow for random actions, it does allow a test sequence to be developed and verified on a regular basis. Also, testing an application with a macro recorder makes it possible to reproduce bugs simply by using the macro.

[continued in Getting some help]

Quality: The Process, Part II [Beyond the build]

[continued from Expanding our repertoire]

Beyond the build

The most powerful programs for glass box testing include source code analysis, runtime checking, and automated testing tools. These are not generally included in compiler packages, so they need to be obtained separately, and can often be somewhat expensive.

Source code analysis tools, better known as “lint” tools in C and C++ development, are utilities that examine your source code and produce warnings for potential problems. The output is similar to that from a compiler, except that the tool performs deeper checks, even emulated code walkthroughs, and has a larger and more specific set of issues to check.

A decent source code analysis tool would likely be your best investment of any glass box testing tool. Unlike a compiler, which merely needs to produce object code for a specific platform, a lint tool can check for a whole range of problems, from portability to standards compliance, and some coding guidelines. The details of potential problems can even help a programmer to better understand nuances of the language.

Lint tools produce many more warnings and errors than a compiler, but they also provide great flexibility to disable individual warnings, even for specific lines of code. It is unlikely that a non-trivial program could pass through such a tool at the highest level without warnings (and sometimes thousands of them), but each issue or type of warning identifies a pitfall that can be considered and resolved.

When developing, I run source code analysis on a regular basis to catch potential errors that the compiler missed. In this way, I can remain confident that my code is relatively free of silly errors, so I can instead concentrate on the logic of the overall code, not individual mistakes. Also, anywhere that my code does something unusual, there is, by necessity, a comment indicating a suppressed lint warning.

Another way of performing some rudimentary source code analysis, especially for a cross-platform project, is to compile the source code under two different development environments. It is somewhat inconvenient, particularly during the initial setup, but if code can build and work correctly from two different compilers, chances are pretty good that the code is solid.

Runtime checking tools include a variety of programs that automatically monitor the behavior of the program as it executes. Often, these tools check memory or resource usage, but they can also watch for invalid pointers and range errors, verify API parameters and return values, and report on code coverage. The most common benefit of these tools is to identify memory and resource leaks.

A comprehensive runtime checking tool serves as an ideal supplement to a source code analysis tool. While the latter catches potential problems with the code itself, the runtime checker highlights problems with the logic of the application during execution. Some tools can insert extra code and information during the build, in a process known as “instrumentation”, and this improves the runtime testing even more.

One issue with runtime checking is that it tends to slow program execution significantly, so it is definitely not intended for a release version, nor for every debugging build. Nevertheless, like other testing techniques, it is best to use the available tools early and often. The earlier a bug is detected and identified, the easier and less costly it will be to fix.

In my development process, I use my source code analysis tools after writing or modifying no more than a couple of routines. I use my runtime checking tools, at the highest detection level, after every major feature update, or before every delivery to a client. This glass box testing takes place in the background while I do black box testing of the application and, especially, new or updated features. If any problems appear, I address those problems right away before considering the feature to be done.

[continued in Automatic or manual]

Quality: The Process, Part II [Expanding our repertoire]

[continued from Development environment]

Expanding our repertoire

Most development environments include a debugger, which is an essential tool for producing quality software. However, the function of a debugger goes well beyond merely helping to find bugs. Some programmers do not regularly use a debugger, or only use one to help locate “that tough bug.” If you fall into this category, I strongly urge you to familiarize yourself with a debugger and integrate it into your standard development process.

Using a debugger for code assurance is another form of glass box testing. It is most powerful when used to perform live walkthroughs of program code. You can manually step through your code, examining variables, and make sure that it is performing as expected. There is no better way to assure yourself that the program is performing correctly than to actually watch it. It also helps identify situations where an errant value could cause problems.

To put this capability to work for you, set a breakpoint at the beginning of each new routine. When the breakpoint triggers, step through the code line by line, confirming that the variables are correct and that the process produces the desired results. Some authorities recommend setting a breakpoint at every single code path, only removing a breakpoint when the path has been thoroughly tested. I must admit that I find this to be overkill in some situations, such as where the function is simply returning an error code, but I do this for all significant branches.

Another glass box testing tool that is often provided with common development environments is a profiler. A profiler is a tool that takes time measurements of a running application and then provides performance statistics for modules or specific functions. This is useful for identifying performance bottlenecks and functional irregularities in a program.

There are two important metrics provided by most profilers, function time and execution count. The function time shows how much overall time was spent in a function (or module), which gives an indication of where any performance delays may be. The execution count shows how many times a function was called, and occasionally this highlights an unexpected problem if a routine is being called too often.

Together, the time and count metrics help show where a program can benefit from optimization, and it is useful to have this information. However, unless there is a serious problem, it is best to wait until all program functionality is complete before attempting to optimize. There is a term in the industry for unnecessarily modifying code for performance before having functionality: “premature optimization”.

There are more powerful profilers and debuggers available from third-party suppliers, but I recommend getting comfortable with the capabilities and features, as well as drawbacks, of the tools provided by your compiler vendor before evaluating expensive alternatives. The quality improvement to be gained by using any debugger far outweighs the incremental benefit of switching to a more powerful tool.

[continued in Beyond the build]

Quality: The Process, Part II [Development environment]

[continued from Quality: The Process, Part II]

Development environment

The best place to start a discussion of practical testing is with the development environment and the tools that you are already using. Rather than a comprehensive discussion of programming practices, which would be a book by itself, I will concentrate on using these tools to facilitate the testing work.

Generally, a development environment for a compiled language consists of a text editor, a compiler, and a linker, plus other tools useful during development, and often these are all part of a single IDE (Integrated Development Environment). This compiled development environment is assumed for this discussion, though there are analogous approaches in other environments.

The first step in producing quality code is to understand your development environment. Although there seems to be a growing trend towards hiding or automating functionality, it is nevertheless important to know what the make or project files are doing, and what options are available and being used. You need to how things work in the case that, “It Just Works,” fails.

Assuming you understand how the development environment works, you can begin actually programming. Once a certain amount of code is written, you try to build (i.e., compile and link) the executable. This is, in fact, the most basic form of glass box testing. If there are problems in the source code, or the build environment is not correct, then warnings or errors will be generated.

To make the best use of this functionality, modify the settings in the make or project files to set the highest warning level available. This may produce lots of extras warnings on some projects, but compiler warnings exist for good reasons. A warning almost always indicates that there is an immediate problem or, at least, a lack of clarity within the code that reduces the ease of maintenance and could cause future problems.

Many compilers include an option to treat any warnings as errors, and I recommend enabling this option. Warnings should never be ignored, and this prevents that from happening. Instead, source code should be corrected to eliminate warnings. This may seem like obvious advice to some readers, but my experience working with code from other programmers shows that many programmers routinely ignore warning messages during compilation, a dangerous practice that is contrary to quality development.

Taking this checking one step further, build the program frequently. This allows you to catch the warnings as they are introduced, rather than having a collection of problems at the end of a large coding session. Some warnings indicate problems that may need to be addressed by technical design changes, and it is good to find these problems early. Personally, I rarely write more than a short function between builds.

Black box testing should also be used in the early stages of development, even when features are incomplete. Running the executable regularly helps make sure that errors are not introduced or, when they are, catches them at an early stage. For incomplete features, you can hardcode values for testing, or just assure that the program behaves as expected, considering the missing code.

[continued in Expanding our repertoire]