Demolish! Pairs 1.10 for iOS

An update to our premier arcade/puzzle game is available.

Digital Gamecraft has published Demolish! Pairs 1.10 on the App Store, where you can buy it for only $3.99 (US).  This is, of course, a free upgrade for all existing customers, downloadable from the ‘Updates’ tab on the App Store.

Demolish! Pairs 1.10 for iOSDownload and play Demolish! Pairs now!

Demolish! Pairs 1.10 is an upgrade that updates the program interface for iOS 7.x and adds 64-bit support.  For more information on the game, please visit DemolishPairs.com.

While supplies last, readers of this Gamecraft blog may receive a code for a free copy of the game simply by sending an email request to marketing@digitalgamecraft.com.

Please… Download and Enjoy!Demolish! Pairs on the iOS App Store

Debugging Windows 8.1

There is bug deep in the Windows 8.1 touch interface code.

Over the past couple of weeks, I spent quite a lot of time trying to debug an issue that was causing crashes in our programs under Windows 8.1 and which, ultimately, turned out to be a bug in the touch interface code of the operating system.  The problem was not with our code; in fact, our tight programming practices actually outed Microsoft’s failure.

To see how we got to that point, read on…

Symptoms

Late last year, we got a single report of a crash in Pretty Good MahJongg under Windows, occurring only when the touch screen was being used; playing the game with mouse and keyboard worked fine.  Crashes in Pretty Good MahJongg are extremely rare, and this had all the earmarks of a device driver problem, so it was given fairly low priority.  Additionally, we did not have the necessary hardware (at that time) to reproduce the error.  Then we got a second report, and we knew something was amiss.

The first two bug reports were nearly identical, and the obvious commonality (and difference from our systems) was that both machines were running Windows 8.1 and, obviously, had touchscreens.  We had tested our products under Windows 8 when it was released, but did not check Windows 8.1 nor using a touch interface (which, in an ideal world, should not crash programs in any event).  Something appeared to be happening with either Windows 8.1 or the touchscreen interface, or both.

Specifically, the error being reported, in all cases, was “Floating point underflow“, though that actual text for the error comes directly from our exception handler, not the system.

Diagnosis

Our products have a very good crash logging system, so when I got the crash dumps, I discovered that the crashes did not appear to happen in program code at all and, in fact, some of them showed only system routines in the stack dump, while others were essentially the same, but with our message loop (but no other program code) in the stack.

My immediate thought was that the problem could be a message collision, knowing that the touch interface added some new Windows messages.  This could mean that either a touch message was triggering program (e.g., animation) code at an unexpected time, perhaps prior to initialization, or vice versa, with a program message causing driver code to be executed improperly.  This could potentially explain both stack conditions, although I would expect to see our program code elsewhere, but that was never the case.

The bigger problem, at first, was that all of the PGMJ message processing code was identical to that in the Goodsol Solitaire Engine, which drives Goodsol Solitaire 101, Most Popular Solitaire, and FreeCell Plus, as well as the code in Action Solitaire, and that most of the message loop is actually contained in a common library shared by all of these products.  After initial confirmation that all reports were for PGMJ, this concern was finally resolved when crash reports began to escalate, and they expanded to include the whole range of products.  At least the new reports fulfilled my expectation.

Of course, to get to the bottom of the problem, I needed to be able to reproduce the error myself, so I ordered a Windows 8.1 tablet (Dell Venue 8 Pro) for testing.  Fortunately, this tablet displayed the error, and I was able to determine a little bit more about the issue.  The crash happened immediately upon the very first touch within the program, whether clicking a button or simply selecting an edit box, though navigating the program with the virtual keyboard worked…  that is, right up until the first (virtual mouse) touch. 🙁

I built a version of PGMJ that moved custom messages elsewhere in the numbering space, but that made no difference at all.  I tried a couple of other brute force experiments, but nothing altered the crash behavior one bit, so I set up remote debugging on the device and began to debug the program properly.  Unfortunately, the debugger saw the stack in exactly the same way as our exception handler, so every crash was deep in system code and if any program routine was in the stack, it was only the message loop.  Still, our program was definitely and consistently crashing, which meant something was different.  The one major advantage of proper debugging, though, is that I got full symbols, so I was able to determine that the actual crash was happening in ‘ninput.dll‘.  But why?

Here you may imagine days of various attempts at debugging the root cause of the crashes, including “handling” certain messages rather than calling DefWindowProc(), doing the opposite and not processing any messages, and setting breakpoints all over the place and, mostly, being disappointed at how few triggered.  I finally narrowed down the issue to happening from a DialogProc() function within the common library when the (new) WM_GESTURENOTIFY message was posted.  That message is the result of the default processing of a WM_GESTURE message, so presumably handling that in some way would prevent the crash.  No dice.  There is a strange documentation conflict when WM_GESTURENOTIFY is sent to a dialog box, since, “This message should always be bubbled up using the DefWindowProc function.”  However, regarding DialogProc, “Although the dialog box procedure is similar to a window procedure, it must not call the DefWindowProc function to process unwanted messages.”  This gave me a bit of a combinatorial problem, too, but nothing seemed to have any effect on the crash.

Finally, frustrated, I regressed to pure shotgunning of the problem.  I knew that not all programs crashed when touched under Windows 8.1, but (all of) ours consistently did, albeit not in our code.  I added an early message box to demonstrate crashing before any of the dialog boxes or other interface features were shown, and then I began removing pieces of the initialization code.  Voila!  The issue revealed itself!

After removing some of the very first initialization code, executed prior to almost anything else being done, and seemingly entirely unrelated to interface code, the crashes disappeared (though, of course, the program no longer worked).  Methodically reducing the amount of code removed, I was able to determine that the crashes were triggered (but not caused) by three simple lines of code in the exception handler initialization.

Problem

Ultimately, the crash problem was a result of the following C++ code:

    unsigned flags = _controlfp ( 0, 0 );
    flags &= ~( _EM_INVALID | _EM_DENORMAL | _EM_ZERODIVIDE | _EM_OVERFLOW | _EM_UNDERFLOW );
    (void)_controlfp ( flags, MCW_EM );

This, very simply, enables floating point exceptions within the program, including the (now problematic) _EM_UNDERFLOW exception.  The purpose was to provide maximum checking for errors in our code, which is usually so clean that it squeaks.  We never imagined that it would catch errors in a released operating system.  For reference, the above code has been shipping for more than 9 years, to many thousands of customers (and potential customers), and never had any problem before Windows 8.1 arrived.

To be perfectly clear, the actual bug is in Windows 8.1, specifically within ‘ninput.dll’.  There is an error in that module, creating a floating point underflow exception, compounded by reliance on a particular floating point state, namely that the hardware underflow exception is (and remains) disabled.  This is a flaw in the operating system, even though the default floating point state and, therefore, most programs do not display symptoms.

Solution

The actual solution, of course, is to remove the above code, which is a workaround to avoid triggering the crashes.  The tradeoff is that our programs will no longer be quite as robust in detecting floating point errors, but as stated above, this checking has been in place for almost a decade without finding any problems in our code, so it should be fairly safe to remove at this point.

Note that removing these three lines of code is actually more than needs to be done to resolve the immediate problem (i.e., the underflow error exception), but enabling the other exceptions still provides additional places for the operating system to fail, perhaps even further along in the same processing path.  The fundamental problem is that Microsoft counted on the default floating point state (and never tested otherwise) for its latest touch interface code, so it is safest for us to simply revert to using the default state as well.

Verification

It is not enough to simply come up with a solution; that solution must be verified.  We approached this issue in two different ways.

First, I built a new beta version of Pretty Good MahJongg with the above solution applied, and that version was provided to as many of the PGMJ customers who reported a problem as feasible.  Every single one (who reported back) confirmed that the crashes were gone.

Second, we bought a brand new Ultrabook laptop with a touchscreen for testing on a different device.  The laptop shipped with Windows 8 (not 8.1), so it was perfect for conducting our verification tests.

I installed the shipping version of PGMJ (Pretty Good MahJongg 2.41) using nothing but the touch interface, and everything worked fine.  We tested several games and had no problems at all (n.b., under Windows 8).  Then, I upgraded the laptop to Windows 8.1 and confirmed that the crash detailed above happened in exactly the same manner and place when using the touchscreen, but the game was perfectly playable with the mouse and keyboard (until one forgot and touched the screen 🙂 ).  Finally, I installed the beta version of PGMJ with the workaround, and everything worked again; in fact, this is a great way to play the game, especially for a title designed without touchscreens in mind.

Given that we verified the solution using two different and separate processes, we are confident that the issue is resolved.  Indeed, Pretty Good MahJongg 2.5 will be released on March 25, so look for it, still the very best tile matching games available for Windows.

For those who know some of my background, the score now stands as follows:
Gregg Seelhoff 3 – Microsoft 0

DGOlympics Postmortem

Our social media service provided some interesting data.

XXII Winter Olympic Games in Sochi, Russia

The XXII Winter Olympic Games (a.k.a., Sochi 2014) took place in Sochi, Russia on February 6-23, 2014.  Digital Gamecraft started covering the event via a special @DGOlympics Twitter account, and a new DGOlympics Facebook page, January 24.

Prior to the actual competition, we reported all manner of information about the upcoming events, schedules, venues, and athletes, and once events got started, we reported news and results for all 98 athletic events, in 15 disciplines within 7 primary sports.  We provided a totally free real-time service, on two platforms, with no advertising.

Twitter Service

On Twitter, we posted (necessarily) short factoids and results, including podium finishers for every single event, as well as qualifiers and/or standings (as appropriate) from earlier segments of the competition.  At the start of each competitive day, we posted a list of the medal events that day and highlighted other interesting events.  At the conclusion of competition each day, we posted medal rankings and counts for the top countries.

The format of the result posts was a sport/discipline tag (e.g., #bobsleigh) followed by the event within that discipline, including ‘Men’ or ‘Women’ as appropriate, and then the actual results or interesting facts.  We originally began using #men and #women hashtags, but clicking on either brought up a whole lot of irrelevant and inappropriate garbage, so we dropped that practice quickly.  We made a point of always mentioning the results for American athletes, usually tagged with #TeamUSA.

Shortly after we started posting results, we began to also send out congratulatory tweets to those athletes who earned medals and also had a Twitter account.

By the end of Sochi 2014, we had made almost 1500 tweets (since London 2012).

Facebook service

On Facebook, we posted essentially the same information as Twitter, but without the size constraints, we often included information from several related tweets in a single Facebook update.  For example, all of the upcoming events and highlights for a day were included in one update.  Also, some updates (including medal ranks/counts) provided a little more information (e.g., top 10 instead of top 5) than the similar tweets.

The format of the result posts was similar to Twitter as well, except that sport/discipline tags were included at the end of each post, rather than within a sentence, where the name was spelled out normally.  Results for a single event (or segment thereof) were often combined into a single update, but results from different events were always separate.

We posted hundreds of updates by the end of Sochi 2014.

Results

Beginning with just a few (<10) Twitter followers left over from London 2012, we simply worked on providing a quality service, without external marketing.  Throughout the course of the Winter Games, our following grew gradually and organically to nearly 40 (paltry).  The Facebook page was brand new, and with a single request to all of my friends, the number of ‘likes’ jumped to just short of 30 overnight, but it took the duration for it to grow to almost exactly the same number as Twitter followers.

As Sochi 2014 got started, our Facebook page passed 30 ‘likes’, which is a significant milestone, because at that level one gains access to Insights, which provides information about how many people see each post, the number of people “engaged”, and the total “reach” of your page.  This is where things started getting interesting.

Despite the measly ‘like’ (and ‘followers’) counts, posts were clearly being seen and read far more widely.  Our total engagement numbers were higher than the total number of ‘likes’ on the page, and our reach was in the thousands each week.  Individual posts varied widely, but some got hundreds of views each (without being ostensibly shared), far beyond expectations with fewer than 40 ‘likes’.  On the Twitter side, with no similar analytics, we could still see similar behaviors, when results were often retweeted within minutes of posting, and almost always by somebody who did not follow us.

Though our hopes were to gain more Twitter followers, our tweets congratulating athletes did get some responses, in the form of favorites and retweets, as well as at least one non-athlete Twit who wanted to argue the validity of an official result.  Here is our shout out to the athletes who took the time to acknowledge our tweets:

The sheer number of hours (more than 200) spent compiling information, monitoring the events, and reporting results was astounding, and completely exhausting.  When the Closing Ceremony began, we were more than ready to post the final tallies and be done with the Olympics for 2 more years (at least 🙂 ).

Conclusions

First, the number of ‘likes’ on Facebook and the number of followers on Twitter do not tell the entire story.  We were clearly reaching many times that number of people.

Second, providing a purely informational resource, free of charge, is not enough to fully engage an audience.  We probably needed more cats and misspelled “meme” images.

Third, a comprehensive information resource like the one we provided seems to lose interest over time.  Whether it was Olympic weariness or something more general, our “reach” numbers peaked after about a week and a half, and then slowly declined (although they remained in the low thousands).

Fourth, a concentrated social media resource requires a major commitment of time which, in this instance, is in no way justified by the results.  I, personally, am not sure that I am willing to commit to this again for Rio 2016.

Fifth, even after providing loads of information for weeks, people on social media are apparently still jaded against marketing messages.  Our penultimate Facebook update, which ended with “Please note that DGOlympics has been brought to you by Digital Gamecraft, developer of Demolish! Pairs, http://demolishpairs.com/“, got the fewest views of any post during the entirety of this experiment, by a factor of 2.

Comments

Please share your tricks (and failures) about dealing with social media in the comments.