Posts Tagged ‘data’

Why inspect and adapt?

Tuesday, November 10th, 2015

In a recent talk I gave on Scrum I highlighted the power of inspect and adapt cycles.  Or as the the Agile Principle puts it:

Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.

First I talked about a non-agile project that did NOT inspect and adapt.   On this project every month (or week, or day) we add some good stuff (valuable features and functionality) and we add some bad stuff (bugs, worthless features, poorly implemented functionality).  Conceptually it looks like this:

image

But when we inspect at the end of every cycle (month, week, day) and adapt based on that inspection then we find what works and double-down on it and find what does not work and eliminate or fix it.  Conceptually then it looks like this:

image

And we would rather release something that looks like the second graph and not the first.  That is the power of inspect and adapt.

Lies, damn lies, and statistics – Your guide to misleading correlations.

Sunday, May 10th, 2015

We should trust data.  We certainly should trust data more than we trust the HiPPO.  But when we dive head-first into the data looking for answers when we ourselves do not know the questions, that can be risky.  After all, Goal, Question, Metric (GQM) was promoted (in that order) over 20 years ago. (here is the original GQM paper). 

I said “risky” but I did not say wrong.   As an advanced technique the “data wallow” can find insights we did not even know we were looking for.   But the risk is there that you will find a misleading correlation.

 

There are three types of misleading correlations: 

1. Confounding factors

The first can be understood through the maxim “correlation does not necessarily imply causation”.   For example the following data are correlated

  • Use of sunscreen
  • Incidences of death due to drowning

image

Increased sunscreen usage is correlated to increased death by drowning.  There is a correlation.  But of course sunscreen usage does not CAUSE drowning.  Instead they both have a common cause, warm sunny weather which results in both increased sunscreen usage and increased time spent swimming.  Increased sunny weather is considered the confounding factor for the sunscreen/drowning correlation.

2. Spurious correlation

A world of gratitude goes to Tyler Vigen for educating the world about these.  Here is but one example (unlike the sunscreen chart above, below IS actual data).

image

These are real statistical correlations, but are meaningless in every practical sense.   They are simply correlated by chance.  The sheer number of measurements in the world being huge, there will always be random correlations that emerge by chance.

3. Type I errors (Alpha risk)

A Type I error is a false positive.  When designing a controlled experiment (or trial) to establish causation, we choose a significance level (called alpha) for the experiment.  Using the common significance level of alpha=5%, if the data we observe would only be observed in 4% (or 4.9% or 4.99%) of the cases where there is NO causation, we conclude there IS indeed causation and report that.  But this still means that there is a 4% (or 4.9% or 4.99% chance we are wrong in concluding there is causation, and this is a false positive or type I error.   Of course lowering alpha for an experiment gives us more confidence in the results – the technical name for this is “specificity”, but at the cost that we might miss some true correlations.

Call for feedback

I am sure the stats pros and data folks out there can provide me with any corrections or clarifications I might need to the above.  Please feel free to do so in the comments.

Data-Driven in one image

Wednesday, February 18th, 2015

 

image

Photo credit: http://www.filmchronicles.com/star-trek-nemesis/

Why Windows 10 will succeed where Windows 8 failed –Data Driven Quality

Sunday, February 15th, 2015

image

You can’t sort of A/B test your way before the product launches, because you don’t have it in users’ hands yet. You need to use your product intuition to make the right choices. You make these choices and people are paying you to make them.

Steven Sinofsky, former Microsoft President Windows

D11 Conference, May 2013

 

image

It starts with everyone in an organization having the curiosity, having the questions, trying to test out hypotheses, trying to gain insights, and then taking actions. You need to have a data culture inside of your organization.  …this is perhaps the most paramount thing inside of Microsoft. The thing we need to do better on is to be able learn from our customers and the data they exhaust … and continuously improve our products and services, that is job number one by far

Satya Nadella, Microsoft CEO

Executive Keynotes from SQL Server 2014 Launch, April 2014

(I cannot find the quote recorded anywhere, but I personally wrote it down while viewing his keynote for the SQL Server Launch. Some oblique references here, here and here)

Those pesky customers

Sunday, February 8th, 2015

steve_jobs3It is funny that whenever I teach (or lecture, or cajole) software engineering teams about the necessity for customer centricity, I almost inevitably get challenged with the example of Steve Jobs’ Apple.   Jobs did not care about focus groups, Jobs did not task his teams to deeply mine customer data… he just told them what was right and they did it.  Sometimes they got an Apple Lisa out of it, but it was from Jobs’ mind that the iPod, iPhone, and Mac Air sprang forth.

The exception that does not really prove the rule

Well I do not know precisely what to do with Mr. Jobs’ legacy in this context.  I could try to argue how much of his innovation came on the shoulders of others.  The original Mac UI and mouse paradigm having started at Xerox Parc and such… but that is not really relevant.  There is no denying that Jobs’ had the ability to tap into latent need of his customers.

thMAJ42OAXLatent need?  Yup, that is when you give customers what they desperately want, but they do not even know they want it.  For example, Henry Ford is famous for saying

If I had asked my customers what they wanted they would have said a faster horse.

Nobody got rich delivering faster horses… or at least not as rich as Mr. Ford got delivering automobiles to the middle and working classes.

I am the decider

So maybe Jobs and Ford are exceptional.   That does not change the need for understanding your customers (with hard data).   Just recently I was talking to a software entrepreneur who had just launched a new app targeted at fitness trainers and the gyms they work at.   He told me he was dismayed to find all these features that were missing that trainers and gym managers wanted and needed to make the app useful to them.   I started to suggest that this was good news and this limited release gave him precious data about his customers.  His response surprised me.  He said that all these ex-Microsoft guys that work for him suggested a beta release to collect data but he HATED that idea.  He told me this was his vision, and it was that vision he wanted to see realized, resulting in great success for the app.

Hmmmm…

I am thrilled to see ex *Microsoft* guys associated with customer centricity and data-driven quality.  This is a stark change from just a short while ago, and one that I can humbly claim to have been some part of.   And I guess I should not be shocked that every entrepreneur fancies himself the next Jobs with a Jobsian vision, but I also hope that such entrepreneurs can learn that perhaps they are not, and that data about customers, their problems, and what thrilling those customers with the perfect solution looks like, is perhaps the most powerful tool for success.

Culture of data and experimentation

amazon-instant-video-06-535x535In closing consider the case of my current boss, Jeff Bezos.  Bezos certainly had (and has) a vision for Amazon, but he said early on (and later codified) that Amazon will focus what is right for the customer.  Certainly there are many “Jeff projects” that go to production without a focus group and based purely on Jeff’s vision, but there is always metrics that everyone knows up front by which we will assess the idea, use to change it, or scrap it altogether.  I remember in the early days of Amazon Instant Video when we first launched streaming.  Jeff was sure that the free preview should play the first n minutes of the movie, so that is what we implemented.  The data quickly showed that this is not what people wanted, and they wanted the traditional preview/trailer.  Sure we all knew that would be the case… or did we?  Without first trying this idea we truly did not know… perhaps this is what people had been waiting for and it would be Amazon’s advantage?  It wasn’t, and that is fine… this is the culture of experimentation driven by real users and real data that a company needs to embrace to succeed.

TiP is misunderstood – perhaps DDQ is Better

Monday, January 12th, 2015

I spent a long time talking to folks about the merits of a conscientious Testing in Production (TiP) strategy.  But I knew TiP had a bad rap.  I even shared the story of how some would mischaracterize it as a common and costly technical malpractice

While evangelizing TiP, I and my Microsoft colleagues would happily post this picture wherever we could

imageYet I knew the original poster was not so enthused with TiP.   Comments on TiP were supposing this was not a conscientious and risk-mitigated strategy, but instead devs behaving badly:

Then blame all issues on QA -_-

That’s our motto here. Doesn’t work to well in practice, actually.

Now I have returned to Amazon after spending 6 years at Microsoft.  From the following it looks like I have some education to do.

image

On the other hand, who can argue with Data-Driven Quality (DDQ).  (Except maybe a HiPPO).  DDQ is also more expansive than TiP, leveraging all data streams whether from production, customer research, or pre-release engineering.  So TiP was fun, but DDQ is the future.

Who is the HiPPO?

Thursday, January 8th, 2015

image

HiPPO stands for Highest Paid Person’s Opinion

HiPPO driven decision making is the opposite of data-driven decision making.  The “highest paid” person may be your boss, or the VP with his eye on the project, or even the CEO.  But no matter how many of those big bucks they are pulling down, it turns out that 2/3 of decisions made without the data are the wrong ones.

When promoting data-driven decision making at Microsoft we distributed 1000s of these little hippo squeeze toys.  Perhaps by squeezing our hippo toy, we are reminded to constrain our HiPPO from making rash decisions without data (somewhat look a voodoo doll).

But another, kinder way to look at the HiPPO is as the person who has final responsibility for product feature decisions, and as a reminder to get that person the data they need to make the data-driven decisions.

Either way, it reminds us that data trumps intuition.

 

[and remember the hippo is considered to be the most dangerous animal in Africa (not counting the mosquito)] :-)

Better software through SCIENCE!

Saturday, November 29th, 2014

The scientific method is the time-proven way we have learned about the very principles that govern the universe.  It can be summarized by the following sequential steps

  • imageAsk a question
  • Construct a hypothesis
  • Test the hypothesis with an experiment
  • Analyze results of the experiment
  • Determine whether hypothesis was correct
  • Answer the question

Testability

A hypothesis is a suggested explanation of observed phenomena.  Given such an explanation one can then make predictions about those phenomena given certain conditions. 

But for a hypothesis to be truly useful it should be a specific, testable prediction about what you expect to happen.

For example, Galileo might ask the question "Is speed of a falling object dependent on its mass?" The hypothesis Galileo formed was that two objects of different mass, dropped from a height, would strike the ground at different times if falling speed depended on mass. The hypothesis differed from the original question in that the hypothesis predicts an experimental outcome that can be tested. The experiment in turn yielded data indicating nearly simultaneous impact with the ground, and the analysis concluded that falling speed is not dependent on mass.

Data-driven quality

In my previous blog (Data-Driven Quality explained – part 1: questions? what questions?), I introduced DDQ.  DDQ represents application of the scientific method to software quality. The steps of the scientific method can be mapped to the DDQ model as seen below.  Instead of the nature of universe though, we are interested in answering questions about the quality of our software.

image

Applied to software

For example we might be interested in learning why less and less folks are using Internet Explorer as their web browser. 

Based on some preliminary research we may hypothesize that users abandon IE when they encounter web pages that do not function properly, but work better in other browsers.

We might then configure IE malfunction with a select set of popular pages and assess whether IE abandonment rates are higher for users of those pages. 

Um….no…. that would be pretty stupid.  

So we take a cue from social scientists here. Social scientists do not send out crack teams equipped with highly addictive narcotics to supply certain neighborhoods that they can contrast the effects with other neighborhoods.   They instead find existing populations that already exhibit the attributes they need for comparison.

In our case we would compare users of pages known to malfunction in IE to see if there is a significantly higher abandonment rate than users who do not encounter such pages. 

If confirmed we can then dig in and identify the chief offenders of browser compatibility and fix them…. then re-assess.

Software quality

Testability, data-driven, answering questions… these should all sound familiar to any software professional as good practices.  Using DDQ and the scientific method is a powerful way to apply these for your software.

Data-Driven Quality explained – part 1: questions? what questions?

Monday, November 24th, 2014

The "dictionary" definition of Data-Driven Quality (DDQ) is:

Application of data science techniques to a variety of data sources to assess and drive software quality.

But it is really about questions and answers, specifically using data to find those answers. Trying to derive insights from data without knowing what you are looking for can be a source for new discoveries, but more often will yield mirages instead of insights, such as the following image [Source: tylervigen.com, used under CopyLeft license CC 4.0]

(if I only had such data in 1983 I could have wasted even more of my quarter-fueled youth)

So what questions then? These are the questions to ask about your software: image In this diagram "it" is your software (service or product). The questions are divided into the three layers identified by the categories on the left:

  • Business value: Does your software contribute to the bottom line and/or strategic goals?
  • User experience: Does your software delight customers and beat the competition?
  • System Health: Does your software work as designed?

Each category layer depends on the layer beneath it. Consider that it is difficult to build a good user experience on a slow error-riddled product. And ultimately it is the top layer, Business value, that we care about. This leads to the "trick question" I will sometimes ask software testers and SDETs: What is your job? To which I answer:

Your job is to enable the creation of software that delivers value to the business. This is the job of the tester. It also happens to be the job of the developer, program manager, designer, manager, etc.

(I explore this idea a bit more here if you are interested) If you think about it, you might want to add to the above statement that you create this business value through:

  • An experience that delights the users
  • The production of high-quality software

True. It also turns out that these are respectively User Experience and System Health, the layers we are dependent on to build Business Value. An interesting note about that word "quality". "High-quality software" above means system health – that it does not break. But as Brent Jensen likes to ask, which is higher quality software:

  • A. Perfect bug-less software that people do not use (or perhaps worse, they hate to use)
  • B. Quirky software with a few glitches making millions happy (and making happy $millions)

If you believe the answer is B, then DDQ will appeal to you with its "Q" for "quality" happily spanning the pyramid above and not just system health. DDQ is about a confluence of what has been called Business Intelligence (BI) and quality. They are not really different things.

Asking the right questions is an important start, but is only one piece of the DDQ puzzle. DDQ works in an environment of iterative improvement (same as Agile). The faster we can spin around these cycles, the faster we improve our software. This is the DDQ Model: image I will leave it as an exercise to the reader to map the above to the scientific method. (I may help you out and do this in a future blog post.)

Understanding our questions, the next step is to understand the data sources we can use to answer these questions. I will close by sharing a list of some of the types of data we can use below. You will note much of this data comes from production or near-production (think of private betas or internal dogfood). Production is a great environment to get data from as it is the most realistic environment for our software with real users doing real user things.

Business value User experience System health
Is it successful? Is it useable? valuable? Is it available? reliable? performant?
image image image

Acquisition

Adoption of a new feature,

New users, Unique users

Retention

Market share, Session duration, Repeat use

Monetization

Purchase, conversion, ad revenue

Minus: COGS, support costs

Usage Data

Feature use, task completion rates

Feedback (2nd person)

User ratings, User reviews

Sentiment Analysis (3rd person)

From: Twitter, Forums

Infrastructure data

Memory, CPU, Network

Application Data

Errors

Latency

MTTR (Meant Time to Recovery)

Compliance

Test Cases (run as part of pre-production test passes, or in production as monitors)

Correctness

Performance

Engineering Metrics (pre-production)

Code coverage

Code churn

Delivery cadence

Not covered

As you can see in the DDQ model, there is plenty more to cover. Besides the other boxes in that model, here are some other things that were NOT covered in this blog post

  • How to determine specific scenarios to frame your questions for your software
  • How this fits into a comprehensive software development life cycle, and specifically the impact on BUFT (Big Up-Front Testing)
  • Impact on team roles and responsibilities. Who does what?
  • The future of the Tester/SDET role
  • What do you need to know about actual Data Science?
  • Tools
  • Dashboards, and actionability
  • Examples :-)

Further reading

If you want to learn more, I recommend the following:

  • My former Microsoft colleague Steve Rowe has a great series of posts on DDQ
  • Adding to the acronym soup, but definitely on target with DDQ, my often co-conspirator Ken Johnston explains EaaSy and MVQ.
  • Although I had not quite tightened up the questions into the neat pyramid model above, I do fill in some of the blanks left by this blog post in my Roadmap to DDQ talk.

Finally wish to acknowledge and thank Monica Catunda who collaborated with me on much of this material, and co-delivered the Create Your Roadmap to Data-Driven Quality 1-day workshop at STPCon Nov 2014 in Denver.