Saturday 20 August 2016

The cost of unresolved bugs

Most projects deliver solutions with bugs. When projects are working with tight and mandatory deadlines there might be a lot of known and documented defects as a result just prior to a release. Other projects might be in a position where they can postpone a their release and bring down the number of bugs. Think a bit about the consequence and cost of those defects related to actually delivering a less bug-infected solution a bit later.

The cost of any unresolved bug does at least include the following tasks: 
  • Impact analysis (money, time, material)
  • Workaround analysis and testing
  • SOP documentation
  • Re-plannning. 
On top of this there is at least some documentation that has to be updated twice (one for the bug version, and a next one when the bug has been fixed). There might be additional costs that are less visible. A bug might prevent progress on other tasks in a project and so on. Users might need re-training on top of SOP documentation in order to follow the new SOP. And finally there is the grey zone - what if we didn't analyse and understand the workaround completely.

Nevertheless, what might seem like a relatively correct and easy decision might actually add more work and more costs and more uncertainty to a given release.

Tuesday 10 May 2016

Well written test cases – or not. A few thoughts on test design

One of those recurring discussions when preparing for test is “how much detail is required in the test cases?. Those SME’s participating from the line of business know their business, they know the business rules and they know all the “sore spots” – those alternative test cases that need to be run in order to find some of the nasty bugs. So why in the world should they spend the time writing detailed test cases when they can basically test from the library within their memory?


On the other side of the table there is usually a single test manager trying to argue for “real” test cases. Those documented test cases that follow the basic guidelines from IEEE 610 with real preconditions, expected values and a nice list of things to do when a tester is executing the test case.

And that’s where the discussion usually ends. SME's reluctant to spend all their energy writing long test cases and test managers eager to have good project documentation.
 
There is logic to avoiding the massive overhead of doing detailed test cases in some projects when the right SME participants are in the project. And there is another reason to do detailed test cases in most projects – compliance. The whole idea with detailed test cases about having a baseline that can be used for consistent runs and reruns of test, reproducing bugs and for documenting “what actually happened” is enough to justify the effort. And that brings us back to how detailed the test cases should be.

In most projects it might be enough to have test cases with a few steps and some check points to those steps. Why bother doing step-by-step test cases when the testers build up knowledge about the application in a short while? Test cases that are only headlines are not sufficient simply because they open up for too much room for interpretation effectively making it impossible to report on test coverage and progress for that sake.

One thing that seems to work when the discussion about the level of details comes up is the “good example”. This should be prepared by the test manager and used as a template for those working with test case documentation. With solid arguments about why a certain level of details is necessary. Reproducibility, compliance, internal guidelines, company policy, business criticality are all valid reasons to ask for a certain level of detail for test cases. The test manager should drive the process towards a common agreement of detailed test cases and the understanding of why this level is necessary.

Friday 25 March 2016

Top 10 audit topics for test cases

So, a quarter of 2016 has passed - almost - and your projects are probably busy, summer deadlines approaching, lots of testing to take care of at the moment. And then - the quiet summer holiday period is also within sight. Maybe its worth considering what to do during those hot but quiet weeks this year.

An idea could be to prepare for an audit of the test cases. A bit of structured house keeping. So here are a top 10 suggestion for things to look after to get rid of crap and dig out the gems for future testing purposes.


  1. Discover and discard. Have a critical look at test cases that are not going to create future value. My own favorite - test cases with zero (0) step. The ones that were created with the best intentions but for some reason never made it beyond "design". Delete them or archive them in a way that they don't make any noise in future reporting
  2. Find the gems. Look at your test cases from the opposite angle. Which test cases were the "favourites" in recent projects and test runs. Identify those, especially the ones that actually are known to have detected defects and make sure they are documented and accessible for future testing purposes.
  3. Filter through the remaining 80%. OK, so you're now off to a good start, blanks gone and gems identified - "the rest" is just remaining. Unless you have a very strict approach to test case documentation you should probably be left with the 80% of the test cases you started out with. What to do now? For starters you could take a look at business criticality - which test cases do actually make sense to keep for future testing from a strict business point of view. They should be the next in the "save-for-later-bucket".
  4. Chop through duplicates and variations of the same theme. Chances are you don't need six test cases that aim at testing the same functionality in slightly different ways. If you do consider some kind of bundling. If you don't, make a choice and clear away the remaining test cases. 
  5. Find the blank spots. This is where it gets interesting. If you have spent the time so far going through documented test cases you should by now also have a few "eureka moments". You should have a few Post-it notes or a short list of things that should be somewhere in the test documentation but is no where to be found. Hunt down those responsible and find out what actually happened. And finally make sure that the blanks are filled with adequate documentation where needed.
  6. Traceability. Don't start re-doing business requirements now. It is too late. Instead make sure that you have some kind of adequate, high level overview and that you in that way maintain the overview of which areas have test coverage and which areas are left uncovered. Chances are that there is way too much work to be done, but at least you now know where the potential gaps are for future projects.
  7. Take a break. No, not that kind of break. Instead spend a short time bothering the DevOps, support team, first line support, the incident manager - or all of them depending on who's actually available now. Don't bother them with long lists about test cases and requirements coverage - instead have their input in terms of bug and errors detected during recent time and then do the analysis if it is feasible to include some of that into future testing. Or have a bucket list to present to the test analysts for later.
  8. Check the consistency of regression test packages. Are you adequately covered with a selection of test case packages for different regression test purposes? Is your smoke test good? Like in robust, focused and bullet proof. Can it be run in a short time? Are you genuinely happy with the efficiency of the test cases for smoke testing purposes?
  9. Test data pitfalls. By now you should be somewhere between concerned and desperate if you have covered bullets 1 through 8. There is a lot of work to do. Now consider if you have the right test data to support testing. Don't, however, do another 8 point list for test data because then you will be stuck with a lifetime project for that alone. Instead look at test data from a helicopter perspective. Remember this is the audit, it is not a "fix-the-world"-project. Where are the big loopholes in terms of test data, given that you now know which tests are actually creating future value for you and the organisation you are with? A favorite topic is always testing that had to be de-scoped due to test data issues. Some types of test data are difficult to work with in real life. Like data that is consumed due to one-time usage only. Maybe it is the right time to think about possible solutions now. Or simply document risk and impact a bit better given that your knowledge should be better.
  10. Sanity check. Lean back and enjoy a world that is less messy and better understood. And therefore a bit easier to communicate. Last topic on the list is to do a short sanity check. Can you actually report to project stakeholders based on the test case packages that you have decided to keep? Is the coverage sufficient? Where are the weak areas? Are there any ways to strengthen the weak areas? Improvements? Priorities? Congratulations. You now know what to include in your project activities for the remaining part of 2016 - and beyond. 

Tuesday 19 January 2016

2016 - for testing it's that kind of year, again, again

2016 is progressing as planned and most projects are up to speed again after the holiday season. Deadlines are approaching and a lot of testing is taking place. It feels familiar and safe, but wait, something feels a bit different this year.

That's because 2016 is a leap year. One of those years with one more day in the calendar. A full extra day for testing. Oh joy. However that extra day should also lead to a bit of scrutiny for you, dear tester, test manager, QA-specialist or whatever title that allows you to spend most of your time on testing.

Image result for 29 february

Remember last leap year? 2012. Cloud was the perhaps most hyped field within IT. Microsoft had just spent the previous years pushing Azure to a lot of strategic customers across the entire planet when disaster struck. This of course prompted a lot of jokes, within my organisation the joke was "Office 364" for some time. It probably also meant that Microsoft had a lot of fan mail from various lawyers.

To Microsoft this was a PR disaster because it was felt by so many end users in so many different places at the same time. This of course coupled with Microsoft promising that it was safe to move business critical platforms to the Cloud. Well, only if it was a "normal" year.

Leap year bugs are a problem since the root cause can be difficult to spot before the problem occurs real life. It's one of those side effects to a side effect. So try to take a look at your test plans? You plan to go live with your project during February - then panic a bit. Even if you have releases after the 29th do a little brain storm to find out if that extra day will affect any functionality you have in your project scope - like end of month/quarter/year. Or simply just try to figure out what will happen this year on the 28th and 29th of February, and on the 1st of March. And whether end-of-March will be affected.

That extra day in February is so nice since it is a free extra day in most project plans, but it will slap you in your face unless you test for it and know that everything works according specification or assumption. Wiki has a short list of known leap year bugs for inspiration to get you started.

If you don't remember what you've done for the past 4 years in terms of development and testing, have a cup of coffee with your favorite friend - the portfolio manager - and ask if she has a list of projects that have gone live in that period. Or maybe the deployment manager, or Devops. Actually, right now might a be a good time for a few cups of productive coffee. All in the name of defect prevention.

Thursday 7 January 2016

Performance testing - a real world experience and day-zero considerations

Welcome to 2016. Why not kick off the new year with an old story. Some years back I was a test manager for a large project aiming at launching a business critical system. The test strategy recommended a few structured performance test activities aiming at proving that the system would actually be able to deal with the expected number of users, that peak log on (during the mornings) could be handled and that system resources would be freed up as users log off.

All of these recommendations were approved by project management and a separate team was set up to design, implement and execute the necessary tests. This would be done late in the project phase and was as such totally normal. Do the testing at a point where the software is sufficiently mature to actually be able to trust and use the test results for a go/no-go decision. So far, so good.

Since this was a project that would implement a solution that would replace an existing solution we didn't have to guess too much on the user behaviour for normal use. Just look in the log files and find patterns. A large part of the testing was designed around this knowledge.

Then we consulted the implementation team to figure out how they expected to roll out the solution to the organisation. We returned with the knowledge of a "big bang" implementation. There were no alternatives so we also needed this as a scenario. How would the solution scale and behave on the first day when everybody had an email in their inbox saying "please log on to this brand new and super good system"?

No problems so far. Knowing that the organisation was located in two different time zones that took some of the expected peak load off and we didn't have to have this cruel "100% users at the same time"-scenario. Emails to different parts of the organisation could be sent out to groups of users with say 10-15 minutes intervals to avoid a tidal wave of concurrent log ons. Good and pragmatic idea and that was agreed in the project and executed by the implementation team.

Billedresultat for bomb

The one thing we didn't take into account was how organisations and especially middle management works. Middle management tend to send a lot of mails around these days. In ways not always known or controlled by a project like ours. So in the real world we succeeded with our performance testing but failed on day-zero.

As soon as middle management started to get the "Important information- log on to this new system" they did what they always do with this kind of information - passed it on. Not only to their own organisation but across the organisation. using different mail groups that would hit 30, 50 or 100 persons at a time. They were used to this in their daily operational life, and to them this was just another operational morning.

The result was that the peaks of log ons were completely different from what we had expected and planned - and tested. Not to the extent that there was a complete meltdown but there was short outages during the first couple of hours - and of course some angry and concerned users who needed feedback and assurance that they could trust the system which was mission critical for them.

Lesson learned: Think a bit outside the box. Not always worst case scenario, but closer than you might think. Even though you have a lot knowledge to build on always consider performance testing for day-zero scenarios as something truly special. First impressions last, especially for real-life users.