Quality: The Process, Part III [Beta move on]

[continued from Greek to me]

Beta move on

When the program is feature complete, or approaching that stage, it is time to consider taking the next step. One step from alpha is beta, so we should now look at “beta testing”.

Beta testing is the most recognized form of black box testing, in which the software is submitted to users outside the company for additional testing and feedback. Generally, these testers are not professionals, but rather should represent a typical cross-section of potential customers and users.

Since beta testing is often the first external exposure of your product, it is important that the alpha testing and glass box techniques have produced a reasonably solid program. It may be a cliché, but there is not a second chance to make a first impression. When a tester’s first experience with a product is lousy, he or she will be less likely to get comfortable with it. If you know that there are lots of bugs, then your software is probably not ready for beta testing.

A practical reason for making sure the software already shows a standard of quality when beta testing begins is that obvious bugs will be reported multiple times, and less severe bugs will be overlooked. When a tester finds a number of problems, he or she may relax the reporting or assume that one bug is caused by another. Also, some bugs do cause a multiplicity of symptoms, and tracking becomes more convoluted.

There are two primary forms of beta testing, “open” and “closed”. In open beta testing, the developer announces the availability of a “public beta” version of the software, and any interested party can download and test the software. For closed beta testing, the developer provides a “private beta” version of the software to a limited number of known testers.

Companies may use either or both forms of beta testing. The main advantage of open beta testing is that the software can be tested by lots of people to cover a wide array of systems and uses, at the expense of control and a possible impact on the marketing plan. On the other hand, closed beta testing provides the developer with better control of the process, but the disadvantage is that it is hard to find testers.

Some companies use both forms of beta testing, starting with a closed beta and then expanding to an open beta program once the program is closer to release. Microsoft, for example, runs an extensive closed beta testing program for DirectX, including the SDK and the runtimes, which lasts for several months each version, but near the end of this process, the beta runtimes are made available for public download. [Note: Microsoft has since ceased proper testing of DirectX SDK releases and is now a counter example, not to be followed.]

For either form of beta testing, you should insert a “drop dead” date in the code, so the program will not run after a certain fixed date. This prevents the beta from entering general circulation and reduces testing of outdated versions. Note that this technique should never be used for release versions, so you must remember to remove it before the final version. You must also remember to update the date with each new testing version lest you have a valid beta timeout prematurely.

Just as a feature complete product signals the approaching end of the alpha testing phase, the impending completion of the beta testing phase is signaled by a “release candidate”. A release candidate is a version of the product that is potentially the release version of the software. At this point, testers should be instructed to report every bug they find, even if they have reported it previously, since all bugs should have been eliminated. If bugs are corrected, another release candidate should be created and tested.

For the first release of a product, the traditional beta version numbers start at 0.90 and approach 1.0, the release version. I know of one game product, on which I did not work, that had so many beta versions that the producer gave the team shirts that read “Version 0.99999999…” with the nines running all of the way down one of the sleeves.

[continued in Standard treatment]

Quality: The Process, Part III [Greek to me]

[continued from Quality: The Process, Part III]

Greek to me

Every program with more than seven lines of source code has bugs. It is important that software developers do whatever is feasible to eliminate bugs. With mass market software, one can be confident that even rare bugs, when multiplied by thousands of users, will be discovered. Bugs in some specialized and vertical market software could actually cause damage or injury. In any case, when distributing shareware, bugs will cost you sales, so quality will directly help your bottom line.

The most innovative approach to elimination of bugs, which I must credit to Barry James Folsom, involved a simple corporate proclamation. As the new President, he called for a meeting and all of the several dozen developers in the company were gathered. After an introduction, he declared that none of our software would have “bugs”. From that point forward, it could only have “defects”.

It may not be terribly practical to simply redefine terms and create quality, but this dubious proclamation did have a point. When a customer or, in the case of shareware, a potential customer is using the software and it fails to work properly, that is a problem. “All software has bugs,” is not comforting, so we need to look at the software the perspective of a user.

Let’s start at the very beginning, with alpha, or more specifically, “alpha testing”.

Alpha testing is a form of black box testing that is performed in-house. In practical terms, alpha testing is simply the developer using the software in the same way that a customer would, prior to making the software available to others.

After each version of the software is ready, I close all my development tools, clear the registry and data files, and pretend to be a user seeing the program for the very first time. I start by running the program installer, and then launching the game (in our case) using the installed shortcut, as opposed to the debugger. I will then just play the game for a while, recording any problems that arise.

Once I am comfortable that the program is working as intended on my development system, I then copy the installer to at least one other test system. Rather than install the software myself, though, I enlist somebody else to do it. This can be a colleague, friend, spouse, child, parent, pet, or benevolent stranger. I provide no other instruction, and note where any questions are asked. Any problems witnessed here will also be experienced by users on a larger scale.

In a formal testing environment, alpha testing involves testers systematically checking the software according to the specified test plan, combined with actual use of the software. In a corporate environment, the test plan is executed by the QA department. In small businesses, it generally falls on the programmers to follow the test plan. In either case, anybody willing should try using the software. In a larger company, I would throw an “open house” to show the software to other employees. As an independent, simply having the game available for play is sufficient.

Alpha testing should begin as soon as the software is usable, and this will necessarily overlap with program development. At some point during the alpha phase, the software should become “feature complete”. This means that all intended features for this version are in the program and functional. It does not mean that the performance is optimized, nor does it mean that the interface is finalized, but it should do everything that it was intended to do.

[continued in Beta move on]

Quality: The Process, Part III

[This article was originally published in the January 2003 issue of ASPects.]

Good things come in threes. Literature is rife with examples. Jack (of Beanstalk fame) received exactly three magic beans for a reason. However, with deference to Sigmund Freud, sometimes an article is just an article.

In the first installment of this trilogy, I introduced some foundational concepts for testing, including planning, some quality assurance terminology, and classification and tracking of bugs. The second part, the story bridge, covered general tools and techniques that can be utilized during product development. In this, the conclusion, I will discuss testing methods used as the software reaches a functional stage.

[continued in Greek to me]

Quality: The Process, Part II [Getting some help]

[continued from Automatic or manual]

Getting some help

Up to this point, I have discussed a variety of methods for improving the quality of software that can be implemented solely by the programmer during the development. However, as the program gets closer to completion, it becomes important to enlist the help of others for black box testing and feedback. That will be the topic for my next installment.

In the meantime, there is an opportunity to implement some of the above tools and practices into your development process.

Gregg Seelhoff is an independent game developer and charter member of the Association for Professional Standards [now defunct].

Quality: The Process, Part II [Automatic or manual]

[continued from Beyond the build]

Automatic or manual

At each development stage, an application has some new or updated features that will need to be tested thoroughly, beyond a quick execution of the program. Certainly, the code should be pretty solid after having passed through some of these tools, but there is still no guarantee that the results produced are actually correct, except to the extent that they are manually checked.

It is very important that you test your application to make sure that it withstands unusual input and produces correct results, or fails gracefully, especially if your software can be used for mission critical operation. This will often involve checking more input and output than a team of testers can conveniently generate, so this is where automated testing tools can help you with quality assurance.

One type of automated testing tool interacts directly with your source code and automatically generates special code, known as a “test harness”, which deliberately throws unusual parameter values at routines and monitors the results to make certain that the routines handle unexpected values reasonably. These tools have a number of different configuration options, but their general nature prevents them from having specific knowledge about a particular program.

Another type of automated tool interacts with the interface of a program, essentially providing a somewhat more sophisticated approach to what we use to call “keyboard testing,” which was just banging randomly and rapidly on the keyboard in an (often successful) attempt to crash or confuse the program. This type of testing is more appropriate for some types of applications than others. We have never investigated using this approach for testing our games, though a young child is a good substitute.

Developers can, and should, provide this type of glass box testing for their own products. You can write test harnesses that explicitly call routines with certain parameters and check for valid results. One excellent method for doing this, especially during optimization, is to have two separate routines that use different techniques for generating the desired results, and then run both routines, comparing results. This also allows you to profile both routines under the same conditions and ultimately use the better one.

For interface testing, you can use a standard macro recorder, software that records and can replay keyboard and mouse input into a program. Although this does not allow for random actions, it does allow a test sequence to be developed and verified on a regular basis. Also, testing an application with a macro recorder makes it possible to reproduce bugs simply by using the macro.

[continued in Getting some help]

Quality: The Process, Part II [Beyond the build]

[continued from Expanding our repertoire]

Beyond the build

The most powerful programs for glass box testing include source code analysis, runtime checking, and automated testing tools. These are not generally included in compiler packages, so they need to be obtained separately, and can often be somewhat expensive.

Source code analysis tools, better known as “lint” tools in C and C++ development, are utilities that examine your source code and produce warnings for potential problems. The output is similar to that from a compiler, except that the tool performs deeper checks, even emulated code walkthroughs, and has a larger and more specific set of issues to check.

A decent source code analysis tool would likely be your best investment of any glass box testing tool. Unlike a compiler, which merely needs to produce object code for a specific platform, a lint tool can check for a whole range of problems, from portability to standards compliance, and some coding guidelines. The details of potential problems can even help a programmer to better understand nuances of the language.

Lint tools produce many more warnings and errors than a compiler, but they also provide great flexibility to disable individual warnings, even for specific lines of code. It is unlikely that a non-trivial program could pass through such a tool at the highest level without warnings (and sometimes thousands of them), but each issue or type of warning identifies a pitfall that can be considered and resolved.

When developing, I run source code analysis on a regular basis to catch potential errors that the compiler missed. In this way, I can remain confident that my code is relatively free of silly errors, so I can instead concentrate on the logic of the overall code, not individual mistakes. Also, anywhere that my code does something unusual, there is, by necessity, a comment indicating a suppressed lint warning.

Another way of performing some rudimentary source code analysis, especially for a cross-platform project, is to compile the source code under two different development environments. It is somewhat inconvenient, particularly during the initial setup, but if code can build and work correctly from two different compilers, chances are pretty good that the code is solid.

Runtime checking tools include a variety of programs that automatically monitor the behavior of the program as it executes. Often, these tools check memory or resource usage, but they can also watch for invalid pointers and range errors, verify API parameters and return values, and report on code coverage. The most common benefit of these tools is to identify memory and resource leaks.

A comprehensive runtime checking tool serves as an ideal supplement to a source code analysis tool. While the latter catches potential problems with the code itself, the runtime checker highlights problems with the logic of the application during execution. Some tools can insert extra code and information during the build, in a process known as “instrumentation”, and this improves the runtime testing even more.

One issue with runtime checking is that it tends to slow program execution significantly, so it is definitely not intended for a release version, nor for every debugging build. Nevertheless, like other testing techniques, it is best to use the available tools early and often. The earlier a bug is detected and identified, the easier and less costly it will be to fix.

In my development process, I use my source code analysis tools after writing or modifying no more than a couple of routines. I use my runtime checking tools, at the highest detection level, after every major feature update, or before every delivery to a client. This glass box testing takes place in the background while I do black box testing of the application and, especially, new or updated features. If any problems appear, I address those problems right away before considering the feature to be done.

[continued in Automatic or manual]

Quality: The Process, Part II [Expanding our repertoire]

[continued from Development environment]

Expanding our repertoire

Most development environments include a debugger, which is an essential tool for producing quality software. However, the function of a debugger goes well beyond merely helping to find bugs. Some programmers do not regularly use a debugger, or only use one to help locate “that tough bug.” If you fall into this category, I strongly urge you to familiarize yourself with a debugger and integrate it into your standard development process.

Using a debugger for code assurance is another form of glass box testing. It is most powerful when used to perform live walkthroughs of program code. You can manually step through your code, examining variables, and make sure that it is performing as expected. There is no better way to assure yourself that the program is performing correctly than to actually watch it. It also helps identify situations where an errant value could cause problems.

To put this capability to work for you, set a breakpoint at the beginning of each new routine. When the breakpoint triggers, step through the code line by line, confirming that the variables are correct and that the process produces the desired results. Some authorities recommend setting a breakpoint at every single code path, only removing a breakpoint when the path has been thoroughly tested. I must admit that I find this to be overkill in some situations, such as where the function is simply returning an error code, but I do this for all significant branches.

Another glass box testing tool that is often provided with common development environments is a profiler. A profiler is a tool that takes time measurements of a running application and then provides performance statistics for modules or specific functions. This is useful for identifying performance bottlenecks and functional irregularities in a program.

There are two important metrics provided by most profilers, function time and execution count. The function time shows how much overall time was spent in a function (or module), which gives an indication of where any performance delays may be. The execution count shows how many times a function was called, and occasionally this highlights an unexpected problem if a routine is being called too often.

Together, the time and count metrics help show where a program can benefit from optimization, and it is useful to have this information. However, unless there is a serious problem, it is best to wait until all program functionality is complete before attempting to optimize. There is a term in the industry for unnecessarily modifying code for performance before having functionality: “premature optimization”.

There are more powerful profilers and debuggers available from third-party suppliers, but I recommend getting comfortable with the capabilities and features, as well as drawbacks, of the tools provided by your compiler vendor before evaluating expensive alternatives. The quality improvement to be gained by using any debugger far outweighs the incremental benefit of switching to a more powerful tool.

[continued in Beyond the build]

Quality: The Process, Part II [Development environment]

[continued from Quality: The Process, Part II]

Development environment

The best place to start a discussion of practical testing is with the development environment and the tools that you are already using. Rather than a comprehensive discussion of programming practices, which would be a book by itself, I will concentrate on using these tools to facilitate the testing work.

Generally, a development environment for a compiled language consists of a text editor, a compiler, and a linker, plus other tools useful during development, and often these are all part of a single IDE (Integrated Development Environment). This compiled development environment is assumed for this discussion, though there are analogous approaches in other environments.

The first step in producing quality code is to understand your development environment. Although there seems to be a growing trend towards hiding or automating functionality, it is nevertheless important to know what the make or project files are doing, and what options are available and being used. You need to how things work in the case that, “It Just Works,” fails.

Assuming you understand how the development environment works, you can begin actually programming. Once a certain amount of code is written, you try to build (i.e., compile and link) the executable. This is, in fact, the most basic form of glass box testing. If there are problems in the source code, or the build environment is not correct, then warnings or errors will be generated.

To make the best use of this functionality, modify the settings in the make or project files to set the highest warning level available. This may produce lots of extras warnings on some projects, but compiler warnings exist for good reasons. A warning almost always indicates that there is an immediate problem or, at least, a lack of clarity within the code that reduces the ease of maintenance and could cause future problems.

Many compilers include an option to treat any warnings as errors, and I recommend enabling this option. Warnings should never be ignored, and this prevents that from happening. Instead, source code should be corrected to eliminate warnings. This may seem like obvious advice to some readers, but my experience working with code from other programmers shows that many programmers routinely ignore warning messages during compilation, a dangerous practice that is contrary to quality development.

Taking this checking one step further, build the program frequently. This allows you to catch the warnings as they are introduced, rather than having a collection of problems at the end of a large coding session. Some warnings indicate problems that may need to be addressed by technical design changes, and it is good to find these problems early. Personally, I rarely write more than a short function between builds.

Black box testing should also be used in the early stages of development, even when features are incomplete. Running the executable regularly helps make sure that errors are not introduced or, when they are, catches them at an early stage. For incomplete features, you can hardcode values for testing, or just assure that the program behaves as expected, considering the missing code.

[continued in Expanding our repertoire]

Quality: The Process, Part II

[This article was originally published in the December 2002 issue of ASPects.]

Come, give us a taste of your quality.

When Shakespeare penned these words for Hamlet, he was referring to the calling of those to whom the comment was directed. In this context, quality is literally ones profession. As software developers who, for the most part, can either succeed or fail based on the quality of our work, this interpretation still holds true.

In the first part of this article, I discussed planning for quality software development, classification and tracking of bugs and unimplemented features, and some basic quality assurance concepts and terms. In particular, I explained black box and glass box testing, some methods for which we will explore in this part. As a simple reminder, black box testing is, essentially, user testing of the program itself, while glass box testing is developer testing of both source and object code.

You have been waiting long enough, so it is time to put these ideas into practice.

[continued in Development environment]

Quality: The Process, Part I [More to come]

[continued from Bug and feature tracking]

More to come

To this point, we have chosen a project, determined what we want to accomplish with the end product, and proved in the design document that we can meet our goals with it. We have recommitted to developing a quality product and we have devised our own system for tracking the features and the bug reports that will inevitably be received, and some of us have even populated the software with features from the design docs.
In short, we have established a strong foundation for quality software development.

Now, all of the real programmers among us are jumping up and down yelling, “Let’s get to the coding already!” With this foundation, we are in a great position to get started with the actual programming. Unfortunately, the discussion of specific glass box and black box techniques, including beta testing methods, will have to wait for the next installment.

I have some coding to do.

Gregg Seelhoff is an independent game developer who serve[d] on the ASP Board of Directors. He can be reached at seelhoff@sophsoft.com.