West Corporation

Posted on November 12, 2013 by West Corporation 



The Overlooked Costs of Insufficient Software Design and Testing

Everywhere one goes in the computer world, there are hallway conversations about how testing software is a necessary evil. And still others think that software can be made perfect through careful design and careful coding, thus eliminating all but a customer’s acceptance testing or the readiness of a production system.

Despite that, many articles are routinely published about how testing software to find and fix problems in the lab is cheaper than fixing a problem after the software has released. For example, IBM quoted research by the National Institute for Standards and Technology (NIST) showing that, based on the time and money involved for a person to work on the issue, the cost of fixing a problem grows as the software moves forward in the lifecycle:

Some articles go so far as to mention the harm that problems cause reputations and business relationships.

However, rarely do these articles and the numbers quoted in them show the danger posed by improper or insufficient test design or lack of testing. Here are some real-world examples to keep in mind when designing, coding and testing software, or when having someone do that work for you.

The $44,500 Parking Ticket
Here is a humorous example: A driver in Italy received a parking fine for US$44,500 due to late fees and interests for his parking infraction from the year 208 C.E. When this was challenged, the city corrected the mistake and reentered the ticket into the computer for the actual year of 2008. Seriously? No one from design through coding and testing of the software thought that the software needed a check to verify that the parking ticket was written after the invention of the automobile, or at least as far back as 1900?

Hacking by DTMF
By now, almost everyone has heard stories of websites having their security broken and customer or patient data being stolen with tricks like cross-site scripting and SQL injection. Other types of attacks mentioned in the press are designed to block access to websites so no one else can access them, such as Denial-of-service (DoS) attacks.

Few people realize that the same thing can happen with phone systems, regardless of whether they are DTMF- or speech-driven.

At one of the many now-common security conferences held around the world, a security researcher demonstrated that a certain bank’s automated phone system could be hacked with just a touch-tone phone. Once broken, the phone system provided account information for the bank’s other customers. In this demonstration, the bank’s system did not have enough safeguards in place to deal with error conditions, so instead of responding with, “That is an invalid entry,” or, “The system has encountered an exception,” the IVR used text-to-speech to recite the entire error message returned from the back-end system, which contained the information for some other client.

What surprised the audience at the conference is that the researcher’s “attacks” were effectively the phone versions of the same attacks used to break websites or to break operating systems and install malware or turn personal computers into zombies to do the bidding of some unseen puppet master. In the simplest case, all he did was enter more digits than required when prompted for an account number.

In other cases, the researcher created the equivalent of a DoS attack on the IVR system, locking out everyone from their accounts without needing hundreds of phones making simultaneous calls into the IVR.

$440 Million Lost in 45 Minutes
Not every fault with software is due to a bug in the code. In August 2012, a Wall Street trading firm lost approximately $440 million in roughly 45 minutes through a misstep when releasing its computer-based trading software.

A new version of the software, which automates some trades on the New York Stock Exchange, was close to being released. As the company’s staff bundled the software to move it from the lab to the production systems, the component that drives the automated testing of the software was accidentally included as well. When the bundle was connected to the NYSE, the testing component launched and started making trades (complete purchase and sale of the same stock) at a rate of 40 trades per second. Since the testing component wasn’t programmed to care about the purchase or sell prices, it was typically selling at a loss for each purchase. Those losses added up to around $440 million. Because the company was financially responsible, bankruptcy was certain.

There was no published follow-up to know how the test component was included in the production bundle of the software; however, general consensus is that there was no fault in any of the software but instead the problem was with the process not validating the contents of the bundled software or testing the software on the NYSE system before the start of the trading session.

Great U.S. Northeast Blackout of 2003
While this is a textbook example of the domino effect because many separate problems lead to this disaster, ultimately, a bug in the power plant’s software prevented the electric grid operators from responding in a timely manner. A more detailed look showed the bug to be what is known as a “race condition” where two or more things attempt to happen at the same time. There were so many alarms coming into the central computer from all the other problems at the same time that the software did not have a way to deal with them, thus the race condition.

To be fair, race conditions are one of the most difficult things to design, code and test against to prevent them from happening, and sudden “high-impact” conditions like those leading up to the 2003 Blackout are hard to imagine during meetings when hardware and software systems are being designed.

Where Does This Leave Us?
The reality about computer hardware and software is that there are many, many more things that can go wrong than can go right, and it is everyone’s job — from design through coding and testing — to make sure the right thing happens most often. The same is true for human processes where a typo, a forgotten step, or “the person who usually does this is on vacation” leads to mistakes. So, no matter how many times people say testing is a “necessary evil,” these and other stories clearly show that testing is necessary, and it is certainly not evil when it prevents disasters.

Leave a Reply

Your email address will not be published. Required fields are marked *