Archive for May, 2015

Dog bone approach to testing

Friday, May 22nd, 2015

We testers like our testing “shapes”.  For example consider the Testing Pyramid

test-pyramid

Well then, here is my contribution.  This is the dog bone approach to testing.

image

It is shaped like a dog bone because it is fat (more effort and resources) on both ends and skinny (less effort and resources) in the middle.  The three boxes represent:

Developer Testing

Unit Testing, Test-Driven Development, Behavior-Driven Development, Component, Inter-component

BUFT: Big Up-Front Test

End to end functional testing, Non-functional testing (see more here)

TiP: Test in Production

  • Using real (organic) and synthetic transactions and data in production
  • This is about Discovery – what we cannot know before contact with production.
  • It includes customer experience as part of the quality assessment
  • (yeah, I went back to using TiP instead of DDQ…)

Note that whenever we find a problem downstream that we could have found upstream, we roll this information back upstream – for example if we find a bug in production that could have been found with a unit test, we add that unit test and then assess why we missed that unit test in the first place.

In summary we need strong up-front testing to prevent where it is cheap and high value (the developer testing), but as test costs increase due to environment and complexity (BUFT) we only want to do as much testing as needed to gain confidence so we can dip our toes into production.  It is in production that we learn things we could not learn pre-production.

Lies, damn lies, and statistics – Your guide to misleading correlations.

Sunday, May 10th, 2015

We should trust data.  We certainly should trust data more than we trust the HiPPO.  But when we dive head-first into the data looking for answers when we ourselves do not know the questions, that can be risky.  After all, Goal, Question, Metric (GQM) was promoted (in that order) over 20 years ago. (here is the original GQM paper). 

I said “risky” but I did not say wrong.   As an advanced technique the “data wallow” can find insights we did not even know we were looking for.   But the risk is there that you will find a misleading correlation.

 

There are three types of misleading correlations: 

1. Confounding factors

The first can be understood through the maxim “correlation does not necessarily imply causation”.   For example the following data are correlated

  • Use of sunscreen
  • Incidences of death due to drowning

image

Increased sunscreen usage is correlated to increased death by drowning.  There is a correlation.  But of course sunscreen usage does not CAUSE drowning.  Instead they both have a common cause, warm sunny weather which results in both increased sunscreen usage and increased time spent swimming.  Increased sunny weather is considered the confounding factor for the sunscreen/drowning correlation.

2. Spurious correlation

A world of gratitude goes to Tyler Vigen for educating the world about these.  Here is but one example (unlike the sunscreen chart above, below IS actual data).

image

These are real statistical correlations, but are meaningless in every practical sense.   They are simply correlated by chance.  The sheer number of measurements in the world being huge, there will always be random correlations that emerge by chance.

3. Type I errors (Alpha risk)

A Type I error is a false positive.  When designing a controlled experiment (or trial) to establish causation, we choose a significance level (called alpha) for the experiment.  Using the common significance level of alpha=5%, if the data we observe would only be observed in 4% (or 4.9% or 4.99%) of the cases where there is NO causation, we conclude there IS indeed causation and report that.  But this still means that there is a 4% (or 4.9% or 4.99% chance we are wrong in concluding there is causation, and this is a false positive or type I error.   Of course lowering alpha for an experiment gives us more confidence in the results – the technical name for this is “specificity”, but at the cost that we might miss some true correlations.

Call for feedback

I am sure the stats pros and data folks out there can provide me with any corrections or clarifications I might need to the above.  Please feel free to do so in the comments.