85% of data science projects fail. So how do you avoid being part of that statistic? Here are a few common traps that data scientists can avoid in order to help deliver the desired impact our organizations are looking for.
1. Move beyond predictions
Predictive modeling is probably the first thing that comes to mind when people think “Data Science.” There is no doubt that predictive modeling is one of the many upsides of data science — especially during those frequent instances when we know that the result is out of our control so predicting it is all we can do. But why only limit data science to predictions?
For example, should we simply accept that customers will churn, and simply make retention offers to those most at risk? Or should we understand why people are likely to churn and make them happier customers in the first place?
We need to move beyond just building predictive models to uncovering underlying drivers. This is obviously easier said than done, given finding the root causes of your open-ended problem is much more complex than building a model. If you want to shape the future instead of being shaped by it, you need to discover what drives your problem.
2. Do you know what you want to know?
When turning a business problem into a data science use case, the first question is often, “What is my target variable?” This isn’t as trivial a question as you may think.
Common analytics use cases often have multiple angles. Take insurance claims, for example. We would want to know which claims are overall low risk and can be fast-tracked. In addition, we would also want to know which would need to go through triage with another insurer, or which ones are likely uncovered. Each of these objectives typically has different drivers, and using traditional methods, exploring five use cases will require you to put in five times the effort. In order to create sustainable business impact, this is not enough.
3. What holds true today won’t be relevant tomorrow
The volatility of the pandemic surfaced the well-known problem of simply recalibrating models on up-to-date data. Recalibration only allows your models to correctly interpret the information presented to them — the information encoded in the features that the data scientist provides. But what about the information that was discarded or ignored by the data scientist, because, in the past, it did not matter?
As different as the above problems may seem, one approach does offer a solution to all of them: leveraging AI to generate hypotheses at scale. If you are generating millions of hypotheses, you can evaluate them against any multitude of problem aspects, and discover insights into which ones indeed influence the underlying behavior. And you can repeat the process, from scratch, again and again as soon as new data is available. Moreover, each time you do this, you can then encode the relevant hypotheses and use them to build predictive models covering phenomena that only just now appeared in the data.