UI Lessons from a Terrible Interface

Or, 40 reasons why DirectTV sucks.

Let me preface this by pointing out that I have many, many years of experience with DVR recordings, originally (and still) using various TiVo boxes with Comcast/Xfinity cable, and more recently, almost three years with DirecTV (AT&T) and their Genie DVR.

Despite the claims in their commercials, DirecTV is objectively worse than cable television with a TiVo box.  The quality of recordings is noticeably lower in any viewing circumstance, and where there is motion involved, DirectTV just falls over.  Anybody who thinks otherwise must either not be discerning, have poor eyesight, or have never seen decent cable.

A few years back, TiVo redesigned its user interface to focus more directly on providing an interface for both cable recording and streaming services; I was not thrilled because it added some complication for me, who (at the time) did not use any streaming services.  However, the interface had a clear goal and it achieved that goal with few major glitches.

Earlier this year, DirecTV, presumably as an initiative of AT&T which acquired them just before we signed up (coincidentally), launched a massive interface redesign that completely changed the way customers used the DVR, and not for the better.  Unlike the TiVo redesign, this one was very poorly designed, and also poorly executed, resulting in the worst user interface in recent memory.

In this article, I critique many of the failures, large and small, with commentary about the UI principles violated and how we, as developers, can avoid making the same mistakes.

Quality Issues

The fundamental concern about any software is that it performs its function correctly and consistently.  It should be properly tested and prove robust before being presented to (or inflicted upon) the general public.  Generally, all issues are issues with quality, but there are some that are specific problems with quality assurance:

  1. The new interface was rolled out to customers before proper testing had been completed.  The sheer number of issues listed here is evidence of a massive failure of quality assurance.  Lesson: Always test your software completely before release.
  2. The interface locks up regularly, simply failing to respond in any way, requiring one to turn the DVR off and back to regain control.  This is what is known as a “showstopper” bug, one where a user cannot continue.  Lesson: Never, ever, ship a piece of software with a known showstopper.

Functionality Issues

The whole purpose of any software is to perform a function.  If software fails to perform that function correctly or completely (or at all), then the design and user experience are irrelevant.  Here are some of the issues with the DVR simply doing its basic job:

  1. Recordings arbitrarily start late or stop early.  The most fundamental purpose of a DVR is to record a program as requested, not only a portion thereof.  This happened with recordings both done in my absence and with live recordings I was actively watching (terminating the recording after only a minute or two).  Lesson: Be certain that your software actually performs its basic function before anything else.
  2. The DVR would get confused and schedule (and perform) simultaneous recordings of the same show, on the same channel, at exactly the same time.  Lesson: Do not release software that behaves illogically; it will annoy and confuse customers.
  3. Before the redesign, one could either record a series on any channel, or on a specific channel; this important functionality was removed, even though the interface still assumes that this is possible.  Now, every new series is scheduled for ‘All Channels’, which is especially problematic for heavily repeated or syndicated shows.  Lesson: Do not remove useful functionality in software upgrades.
  4. Arranging a list into priority order is just completely broken.  Changing the order of scheduled series causes the addition of duplicate entries into the list.  This is fundamentally terrible programming, the inability to handle a list properly.  Lesson: Do not hire programmers who cannot sort a list without making it grow.
  5. Using the back button to return to a previous page does not always work; instead, it often gets “stuck”, but one can do something like opening the list view, then go back multiple times to go back even further.  The full undo stack is there, just not working correctly and ending prematurely.  Lesson: Be sure to completely test new features.
  6. The ‘on demand’ functionality provides different (lesser) access on the DVR than from the mobile app.  The DVR will report that a show is not available, but then it can be immediately watched on an iPad without difficulty.  Lesson: Be consistent with content on multiple platforms; secondary platforms should not have better access.
  7. The “upgrade” did not fix the playback and compression issues; the recordings still get very blocky and almost unwatchable when there is motion on screen.  Lesson: Fix major problems with software before adding new features or other changes.
  8. The “upgrade” also did not fix the myriad audio issues, where sometimes a recording will start playing without any audio, or live recordings have an audio stutter, or a paused recording will make random pops and clicks.  Lesson: Test all aspects of a software interface, not only visuals.
  9. Now, however, the video also blanks out entirely for a second or two when playback speed is changed (for example, fast forward is started).  Lesson: Design a test plan that incorporates current and past bugs to prevent regressions and buggy releases.

Design Issues

Good software begins with good design, which is responsible for the entire user experience.  The user experience (UX) incorporates the aesthetics, flow, and interface, and generally works towards making the software easy to understand and use.  These are some design issues where the expressed intent works against the user:

  1. First and foremost, a design should be purposeful, providing a benefit to the end user.  This design change adds no value, yet makes the user learn a new way of doing things (at least, the things that can still be done).  Lesson: Never change a software interface just for the sake of being different; change must have a purpose.
  2. In 2018, the DVR still does not have any way to recover accidentally deleted recordings; if you accidentally delete a recording, it is gone forever.  (TiVo has had this functionality for more than a decade; Apple has used the concept for 35 years!)  Lesson: Always incorporate expected features first.
  3. In the play list, when multiple recordings of a program are put into a folder, the description chosen is that of the latest recording, which description is most likely to contain spoilers.  If one is getting ready to binge a season, the last thing one wants to see is something like, “In the aftermath of the death of <major character>…”  Lesson: Consider how the user is going to actually use your software in practice.
  4. Recordings of marked episodes show the latest (highest episode number) on top, but unmarked episodes (say, with only an episode name) are sorted to show the earliest recording on top, essentially the reverse order.  Lesson: Be consistent in presentation, even where the data may not be complete or consistent in format.
  5. Entries in the ‘to do’ list no longer show the number of upcoming recordings, so to get this useful information that used to be available at a glance, the user now needs to enter (then exit) the information screen for every entry.  Lesson: Provide important information immediately (at a glance) rather than requiring additional actions.
  6. Changing priorities in the ‘Series Manager’ now no longer has a move (drag) selector where the up or down arrows move the entry up or down in the list.  There are now separate up and down buttons which need to be pressed with the ‘select’ button.  This requires more effort and is far less intuitive.  Lesson: Never change an intuitive interface to require extra actions to perform identical functions.
  7. Because of the change in the move interface, one can no longer move an entry up or down by a page using the ‘page up’ and ‘page down’ buttons.  If you add a 100th entry, and you want the priority to be near the middle, you could have to press the select button 50 or more times to get it where you want it to be.  Lesson: Always test with a large data set simulating the real world, especially on interfaces that must scale.  Bonus lesson: Always eat your own dog food.
  8. The ‘manage’ option is no longer on the menu anymore; now it appears in the sidebar of the play list.  This is an illogical arrangement.  Every other selection on the sidebar has a (usually filtered) play list, so ‘manage’ does not belong.  Lesson: Be consistent when providing functionality at the same level (e.g., menu).
  9. The ‘to do’ list (and ‘series manager’) is buried under ‘manage’, rather than somewhere easier to access.  It is at the same level as ‘recording history’ and ‘purchases’, which are so rarely used as to border on pointless.  Lesson: Frequently accessed features should be more easily accessed than rarely used features.
  10. The DVR has a completely different behavior than the mobile app, which functions much better.  The platform dictates some differences between DVR and mobile app, but there is no design consistency between the two.  Lesson: All supported platforms for a software product should have consistent design and functionality.
  11. If a recording of a show appears directly in the play list, not in a folder, the episode name and number are not shown.  This is annoying for a scripted program with a description, but ridiculous when the episode name is the only relevant information.  Lesson: Provide the most important information, that which a user will most want to see, at a glance, and only require additional actions to access less important data.
  12. The ‘to do’ list, likewise, does not show the name or episode number for a scheduled recording, so you have to go to the information page just to see whether this is the desired episode.  Lesson: Consider the purpose for which a customer would be using a particular view to determine which information is important in that context.

Usability Issues

Even with a good user interface design, there can be implementation issues that adversely impact the usability of the software and ruin the user experience.  Here are some issues where the implementation detracts from the impression of the product:

  1. The interface has poor performance in general, always feeling sluggish and slow.  Lesson: The first rule of user experience is to make the software responsive.
  2. There is a delay when playing a recording before switching to display the video full screen, so if a program begins immediately, one needs to back it up.  Lesson: Where performance is poor or a wait is required, handle it gracefully; either indicate or mask or the situation with animation, sound, and/or other feedback.
  3. The selected recording on a play list shows an oversized listing (which looks like an ad banner) but removes the normal listing.  Lesson: Do not remove the fundamental item view when expanding to provide additional details.
  4. Pressing the ‘select’ button performs different functions depending on which list the user is viewing at the time, or which level of that list.  Lesson: Be consistent with the functionality of a button; do not use it for different purposes depending on context.
  5. The ‘play’ button on the remote does not play the selected recording at all levels.  Lesson: Do not disable logical (consistent) functionality on certain views.
  6. The ‘record’ button has different behaviors on different lists and different levels.  Lesson: Provide consistent behaviors for global buttons, regardless of view.
  7. Pressing the ‘record’ button in the information view for a recording in the ‘to do’ list removes the entire series, rather than only the episode.  Lesson 1: Do not have a button do more than the logical intent.  Lesson 2: Never allow a destructive behavior to be performed, without confirmation, on the press of a single button.
  8. Selecting the ‘move to top’ function in the series manager causes the selected program to move to the top of the list, but the selection then changes to the next item in the list.  Lesson: Do not change focus from a selected item unless necessary (i.e., the item was deleted); users expect the same item to remain highlighted.
  9. When duplicates exist in the series manager (a bug unto itself), moving a program up causes the selection to change.  Lesson: Again, do not change the item focus.
  10. The play list undo stack is not balanced; a user must press the ‘back’ (or left arrow) button twice after pressing the ‘list’ button to return to the original view.  Lesson: One action (forward) should require only one ‘back’ press to reverse.
  11. Pressing ‘back’ to return to a view where a (now) deleted recording was highlighted causes a completely incorrect recording to be highlighted, often within a folder (i.e., one level down), which is very confusing for the user.  Lesson: When a highlighted item is no longer available on reversing, select a logical default (e.g., first item).
  12. The current viewing position of a recording is lost, seemingly randomly.  Lesson: Do not lose, misplace, or corrupt user information; it frustrates customers.
  13. The choice of font size is too small for some users (as reported frequently online).  Lesson: Test font readability extensively and provide options for users with impaired eyesight, including colorblind users, if necessary.

Support Issues

Any significant software product is likely to have some bugs or usability issues.  Some issues are problems with the customer support provided by the company, such as these:

  1. This major “upgrade” was rolled out to the entire customer base very quickly.  Lesson: When making major user interface changes to a product, test properly with a smaller subset of users to avoid providing a substandard product to everybody.
  2. The many issues with the new version of the software were seemingly ignored.  Lesson: Always listen to customers and address issues as soon as possible.
  3. There is no way to refuse the upgrade, nor to revert to the older software.  Lesson: For a substantial product change, allow users control over when to upgrade and provide a method of reverting (as a failsafe).
  4. The customer support area is loaded with hundreds, probably thousands, of complaints about many of the above issues, but DirecTV/AT&T rarely ever address any of them, and never with anything positive, personalized, nor useful.  Lesson: Always, always, let your customers know that you have heard their complaints, that you appreciate their feedback, and how the issue can or will be addressed.

Conclusion

Frankly, we only signed up with DirecTV because it was the only choice at our location in Los Angeles, and after our experiences, we intend to never use them again.  (We have now cancelled the service, with some difficulty and annoyance.)

My message for AT&T is: If you want to actually improve your service, stop spending so much money on advertisements lying about DirecTV being better than cable and spend some on actually making it competitive.  Fire the DVR team and use your mobile app team instead.  Quit trying to make people bundle your mobile service with your television service, and give them a reasonable monthly price, like the one you offered us only after I was leaving the service.  (Offering a 50% discount to stay just pissed me off a lot more.)

And for the sake of everyone, dump that crappy contractor, Consolidated Smart Systems, who is making your company look even worse.  They do not have a one star rating on Google, Yelp, and with the Better Business Bureau for nothing. 🙁

Windows 8.1 Repaired

Microsoft fixes its touch interface bug reported previously.

Microsoft Windows 8.1About a month ago, I posted about a bug deep in the Windows 8.1 touch interface code, a problem which triggered an exception in our products.  I detailed the debugging process and verification that it was an error in Microsoft’s latest operating system.

Last week, Microsoft indirectly confirmed the bug by issuing an update, KB2919355, which fixes the reported problem.  With that update installed via Windows Update, our test system now works correctly with all of our unrevised games that had previously responded to a hardware (floating point) exception.  Of course, KB2919355 is a “cumulative update” comprised of more than 100 different fixes so, without unwarranted experimentation, I cannot be certain which one addresses our issue, though KB2927066: Ichitaro crashes when you use a touch screen to enter text in Windows 8.1 seems to be the most likely candidate.  (The description is very similar.)

Unfortunately, we still need to update all of our Windows products, since we cannot rely on customers applying the update (and, ironically, it actually fails to install on one of our development systems).  The fix comes too late, after Windows 8.1 was distributed to the public at large and, also, after I spent many hours debugging the problem.  Still, it is better that it was acknowledged and fixed (than denied and ignored).

A series of updates for Goodsol products will commence shortly.

Debugging Windows 8.1

There is bug deep in the Windows 8.1 touch interface code.

Over the past couple of weeks, I spent quite a lot of time trying to debug an issue that was causing crashes in our programs under Windows 8.1 and which, ultimately, turned out to be a bug in the touch interface code of the operating system.  The problem was not with our code; in fact, our tight programming practices actually outed Microsoft’s failure.

To see how we got to that point, read on…

Symptoms

Late last year, we got a single report of a crash in Pretty Good MahJongg under Windows, occurring only when the touch screen was being used; playing the game with mouse and keyboard worked fine.  Crashes in Pretty Good MahJongg are extremely rare, and this had all the earmarks of a device driver problem, so it was given fairly low priority.  Additionally, we did not have the necessary hardware (at that time) to reproduce the error.  Then we got a second report, and we knew something was amiss.

The first two bug reports were nearly identical, and the obvious commonality (and difference from our systems) was that both machines were running Windows 8.1 and, obviously, had touchscreens.  We had tested our products under Windows 8 when it was released, but did not check Windows 8.1 nor using a touch interface (which, in an ideal world, should not crash programs in any event).  Something appeared to be happening with either Windows 8.1 or the touchscreen interface, or both.

Specifically, the error being reported, in all cases, was “Floating point underflow“, though that actual text for the error comes directly from our exception handler, not the system.

Diagnosis

Our products have a very good crash logging system, so when I got the crash dumps, I discovered that the crashes did not appear to happen in program code at all and, in fact, some of them showed only system routines in the stack dump, while others were essentially the same, but with our message loop (but no other program code) in the stack.

My immediate thought was that the problem could be a message collision, knowing that the touch interface added some new Windows messages.  This could mean that either a touch message was triggering program (e.g., animation) code at an unexpected time, perhaps prior to initialization, or vice versa, with a program message causing driver code to be executed improperly.  This could potentially explain both stack conditions, although I would expect to see our program code elsewhere, but that was never the case.

The bigger problem, at first, was that all of the PGMJ message processing code was identical to that in the Goodsol Solitaire Engine, which drives Goodsol Solitaire 101, Most Popular Solitaire, and FreeCell Plus, as well as the code in Action Solitaire, and that most of the message loop is actually contained in a common library shared by all of these products.  After initial confirmation that all reports were for PGMJ, this concern was finally resolved when crash reports began to escalate, and they expanded to include the whole range of products.  At least the new reports fulfilled my expectation.

Of course, to get to the bottom of the problem, I needed to be able to reproduce the error myself, so I ordered a Windows 8.1 tablet (Dell Venue 8 Pro) for testing.  Fortunately, this tablet displayed the error, and I was able to determine a little bit more about the issue.  The crash happened immediately upon the very first touch within the program, whether clicking a button or simply selecting an edit box, though navigating the program with the virtual keyboard worked…  that is, right up until the first (virtual mouse) touch. 🙁

I built a version of PGMJ that moved custom messages elsewhere in the numbering space, but that made no difference at all.  I tried a couple of other brute force experiments, but nothing altered the crash behavior one bit, so I set up remote debugging on the device and began to debug the program properly.  Unfortunately, the debugger saw the stack in exactly the same way as our exception handler, so every crash was deep in system code and if any program routine was in the stack, it was only the message loop.  Still, our program was definitely and consistently crashing, which meant something was different.  The one major advantage of proper debugging, though, is that I got full symbols, so I was able to determine that the actual crash was happening in ‘ninput.dll‘.  But why?

Here you may imagine days of various attempts at debugging the root cause of the crashes, including “handling” certain messages rather than calling DefWindowProc(), doing the opposite and not processing any messages, and setting breakpoints all over the place and, mostly, being disappointed at how few triggered.  I finally narrowed down the issue to happening from a DialogProc() function within the common library when the (new) WM_GESTURENOTIFY message was posted.  That message is the result of the default processing of a WM_GESTURE message, so presumably handling that in some way would prevent the crash.  No dice.  There is a strange documentation conflict when WM_GESTURENOTIFY is sent to a dialog box, since, “This message should always be bubbled up using the DefWindowProc function.”  However, regarding DialogProc, “Although the dialog box procedure is similar to a window procedure, it must not call the DefWindowProc function to process unwanted messages.”  This gave me a bit of a combinatorial problem, too, but nothing seemed to have any effect on the crash.

Finally, frustrated, I regressed to pure shotgunning of the problem.  I knew that not all programs crashed when touched under Windows 8.1, but (all of) ours consistently did, albeit not in our code.  I added an early message box to demonstrate crashing before any of the dialog boxes or other interface features were shown, and then I began removing pieces of the initialization code.  Voila!  The issue revealed itself!

After removing some of the very first initialization code, executed prior to almost anything else being done, and seemingly entirely unrelated to interface code, the crashes disappeared (though, of course, the program no longer worked).  Methodically reducing the amount of code removed, I was able to determine that the crashes were triggered (but not caused) by three simple lines of code in the exception handler initialization.

Problem

Ultimately, the crash problem was a result of the following C++ code:

    unsigned flags = _controlfp ( 0, 0 );
    flags &= ~( _EM_INVALID | _EM_DENORMAL | _EM_ZERODIVIDE | _EM_OVERFLOW | _EM_UNDERFLOW );
    (void)_controlfp ( flags, MCW_EM );

This, very simply, enables floating point exceptions within the program, including the (now problematic) _EM_UNDERFLOW exception.  The purpose was to provide maximum checking for errors in our code, which is usually so clean that it squeaks.  We never imagined that it would catch errors in a released operating system.  For reference, the above code has been shipping for more than 9 years, to many thousands of customers (and potential customers), and never had any problem before Windows 8.1 arrived.

To be perfectly clear, the actual bug is in Windows 8.1, specifically within ‘ninput.dll’.  There is an error in that module, creating a floating point underflow exception, compounded by reliance on a particular floating point state, namely that the hardware underflow exception is (and remains) disabled.  This is a flaw in the operating system, even though the default floating point state and, therefore, most programs do not display symptoms.

Solution

The actual solution, of course, is to remove the above code, which is a workaround to avoid triggering the crashes.  The tradeoff is that our programs will no longer be quite as robust in detecting floating point errors, but as stated above, this checking has been in place for almost a decade without finding any problems in our code, so it should be fairly safe to remove at this point.

Note that removing these three lines of code is actually more than needs to be done to resolve the immediate problem (i.e., the underflow error exception), but enabling the other exceptions still provides additional places for the operating system to fail, perhaps even further along in the same processing path.  The fundamental problem is that Microsoft counted on the default floating point state (and never tested otherwise) for its latest touch interface code, so it is safest for us to simply revert to using the default state as well.

Verification

It is not enough to simply come up with a solution; that solution must be verified.  We approached this issue in two different ways.

First, I built a new beta version of Pretty Good MahJongg with the above solution applied, and that version was provided to as many of the PGMJ customers who reported a problem as feasible.  Every single one (who reported back) confirmed that the crashes were gone.

Second, we bought a brand new Ultrabook laptop with a touchscreen for testing on a different device.  The laptop shipped with Windows 8 (not 8.1), so it was perfect for conducting our verification tests.

I installed the shipping version of PGMJ (Pretty Good MahJongg 2.41) using nothing but the touch interface, and everything worked fine.  We tested several games and had no problems at all (n.b., under Windows 8).  Then, I upgraded the laptop to Windows 8.1 and confirmed that the crash detailed above happened in exactly the same manner and place when using the touchscreen, but the game was perfectly playable with the mouse and keyboard (until one forgot and touched the screen 🙂 ).  Finally, I installed the beta version of PGMJ with the workaround, and everything worked again; in fact, this is a great way to play the game, especially for a title designed without touchscreens in mind.

Given that we verified the solution using two different and separate processes, we are confident that the issue is resolved.  Indeed, Pretty Good MahJongg 2.5 will be released on March 25, so look for it, still the very best tile matching games available for Windows.

For those who know some of my background, the score now stands as follows:
Gregg Seelhoff 3 – Microsoft 0

Seeking a few great Beta Testers

We need people to playtest our arcade/puzzle game.

Demolish! Pairs for iOSToday, Digital Gamecraft is making an open call for iOS beta testers to help us test Demolish! Pairs in preparation for its upcoming release on the Apple App Store.

Anybody with an iPad, iPhone, or iPod touch is eligible to join our team and get early access to this fun game, while helping us make it as good and solid as possible.  All you have to do is play the game (and then tell us about it 🙂 ).

For more information, and to sign up, see our call for iOS beta testers on the Demolish! Pairs site.

ISVCon 2012: Success!

This conference reboot was the best in years.

You shoulda been there!

We have returned safely from ISVCon 2012, which was presented last week in Reno, Nevada [USA] with a mixture of physical exhaustion and mental exhilaration, as is often the case with great conferences.  ISVCon was a relaunch of the old Software Industry Conference, and the consensus was that this was the most beneficial event in several years.  The content was geared towards microISVs (Independent Software Vendors), software companies with just a few people (often, only one person), and the networking/socializing was with others who are facing the same challenges (as well as those who provide services to help).

 The main question: Why were you not there?

 

Before our departure for Reno, I added the Twitter box [edit: formerly] on the right of this blog, and I was “live tweeting” as much as possible throughout the conference, as well as during our journey (and quasi-vacation).  If you follow my personal account at @GreggSeelhoff, you can still see the updates, as well as more going forward.

In the coming days, I will review the highlights of the conference, and I have it on good authority that the Association of Software Professionals (new conference owners) will be making some or all of the session videos publicly available for viewing.

Prior to all that, however, I must give a HUGE shout out to Susan Pichotta of Alta Web Works, who deserves most of the credit for bringing this fantastic 3.0 version of the long-running conference together, and without whom ISVCon would never have happened.  Plans are already in the works for next year, and I really look forward to being there in 2013.

URGENT: ISVCon 2012 is almost here!

Register NOW and save with our discount code.

ISVCon.orgISVCon 2012 takes place July 13-15, which is only a couple weeks (!) away.  ISVCon is the spiritual successor to (or, in entertainment terms, reboot of) SIC, the Software Industry Conference, which I have attended numerous times, and which has always been a great investment.  This conference brings together scores of independent software publishers (or “vendors”, hence ISV) to discuss and learn about the industry  It is a unique opportunity to meet face-to-face with many other people who share similar business challenges; I now call lots of them “friends”.

ISVCon will be taking place in Reno, Nevada (USA) at the Atlantis Casino Resort.

Here is the catch: Time is running out!

Step 1: Register (at a discount)

First, register for ISVCon before the prices go up.  As an incentive, we at Digital Gamecraft can offer you this 10% discount code: “Gamecraft2012“.  Limited time only; prices increase July 1st.

Step 2: Get your hotel room (at a discount)

Next, make your hotel reservations now (using that link) to receive discount pricing and no resort fee.  Offer ends in only a couple of days!

Step 3: Attend ISVCon 2012

Join us in Reno for the conference.  We will be arriving before the Welcome Reception on Thursday evening, during which we will be able to have a drink or two, socialize with friends and colleagues (both long lost and brand new), and switch from travel mode into conference mode.

The conference sessions take place Friday, July 13, through Sunday, July 15, and specifics can be found on this complete conference schedule.  Note that the Friday sessions are Power Sessions, while the Saturday and Sunday sessions provide a couple of options for each timeslot.  There is so much content at ISVCon that we are sending most of the staff (okay, just two of us) to make sure that we can have full coverage of the relevant topics.  Additionally, the networking value and information exchange between (and sometimes during) sessions is possibly even more valuable than the speakers.

That said, let me draw your attention particularly to Paradise Room A on Saturday from 1:45pm to 2:45pm, for my presentation, Quality Assurance for Small Software Publishers, and on Sunday from 9:00am to 10:00am, where I will serve on a panel of game developers for the session, How Games are Different.  The answer to your question is: I will be there and awake at 9am because, with the time difference, that will be noon back home.  (Also, I never work the B room.)

We will there at the conference through the After Hours MeetUp on Sunday evening, before beginning our (more) lengthy journey back to the office.  From experience, this will involve an odd mixture of being physically spent, but mentally energized, full of plans and ideas.  Honestly, attending ISVCon 2012 is probably one of the best ways to spend a few days improving your business; I strongly recommend it for any ISV.

Follow me on Twitter @GreggSeelhoff for live conference updates.  See you there!

API Design Dilemma

I need to decide how to define certain parameter types.

The general situation is this:  I am refactoring a piece of C++ code to be part of a separate library, so I am in the process of defining and documenting an API for using the included classes and methods.  A fundamental design consideration is that the library may be called by third parties, without access to the source code, so I need to make the code as close to bulletproof as possible.  (For internal development, at least I know the methods used, how the API will be utilized, and that it will not be abused terribly.)

During this process, I encountered a theoretical dilemma about how to handle certain parameters.  Specifically, I was working on method definitions that included sizes and counts that should never be negative (but, of course, I need to prepare for abuse).  As an example, say I am reviewing a method that is declared like this:

    bool FillBuffer ( byte* pBuffer, int nSize );

Here, pBuffer is a pointer to the buffer to be filled, and nSize is the size of that buffer.  Of course, it is not possible for a buffer to be a negative size, so my initial reaction was to redefine it as:

    bool FillBuffer ( byte* pBuffer, unsigned uSize );

This makes perfect sense from a theoretical standpoint, but then a practical consideration occurred to me.  I always verify parameters with an assertion and, for a public method (as here), abort the routine if verification fails, with an exception or error return as appropriate.  In this case, my original method would assert nSize to be greater than zero (and return false), which would catch any negatives.  The new method would only catch the case where uSize was zero, but if a careless programmer cast a signed integer (or, worse, let the compiler do it), the current validation check would not identify a problem.

So, there were a few obvious solutions that I considered:

  • I could leave the original definition alone, which would catch obvious parameter errors, but would be theoretically incorrect, and if a programmer wanted to pass the size of a static buffer, using sizeof(), there would be a signed/unsigned mismatch.
  • I could use the new method definition, which would be correct in theory, and just trust programmers not to abuse the method with invalid parameters (and let them suffer if they do).
  • I could use the new method definition and add a sanity check so an extremely large buffer size (i.e., likely a negative value cast improperly) would be rejected, but the drawback there is that any such check would be somewhat arbitrary, and it would limit the functionality for any programmer who truly wanted to use an enormous buffer.

Each of these solutions has advantages and drawbacks.  I dislike having a parameter take a type that is not accurate (though not so much as to not have written this code in the first place), but I dislike arbitrary limits even more.  However, I know that defensive design is important here, since a careless programmer is, in my experience, the most likely to complain that the library or API is at fault.  (I was once threatened with physical violence when I produced a critical review of code written by a nominal “programmer”.)

At this point, I am leaning toward a hybrid solution by overloading the method with both (or multiple) definitions, the original checking for negative sizes as usual before doing an explicit cast of the size value and passing processing to the new/correct method.  The advantage is that passing an actual negative number (or signed type) will result in that extra checking, and a programmer could pass a buffer size up to the limit of the unsigned type.  The disadvantages are the additional work needed to create the extra stub(s), loss of type checking during static analysis, and the fact that our careless friend could still cast a value to create problems (but then it should be quite obvious, at least).

This post is an exercise in the process of working through a problem by simply writing down the issues, which often results in a solution (or decision) by the time one is finished with the description.  (It did here.)  I would, however, welcome any comments on my proposed solution, or other suggestions.

Finally, yes, I know that int and unsigned are not ideal parameter types for this in the first place, but I used them for the purpose of illustration.  (The principle also applies to object counts and other similar parameter types.)

Where to Start

A few highlights of the original Gamecraft incarnation.

After importing all of the (250) posts from the original incarnation of this Gamecraft blog, during the editing process that was necessitated by the technology change, I had the opportunity to review many of the older posts.  I found lots of fascinating information (compared to just a few irrelevant bits), so I decided to provide a few pointers for those (i.e., most readers) who have not read the entire blog.

The most recent posting to receive attention was Making Mac Disk Images Pretty, which describes how we improved the appearance of Pretty Good Solitaire Mac Edition 2.0 [linked from C-Command Software/DropDMG].

The best series began with Quality: An Introduction (running through Quality: The Index), discussing our guiding principle of Quality as applied to (game) software development, posted in May and June 2006.

The most controversial post was my critical review of Microsoft Visual Studio 2005, which incited a debate and continued to collect comments well after VS2008 was released.

The best quote was from Voltaire: “Le mieux est l’ennemi du bien“; this translates to ‘The best is the enemy of the good.‘ [from MVP Backgammon Professional]

The worst month for posting was definitely August, during which month I only posted once over 5 years [in August 2008], a full score less than the expected (average) number of posts in that time.

This is just a small sampling from the first phase of this blog, but there is plenty more, all still available (twice over).  Now, we move forward and begin the second phase in earnest…