Five Common Mistakes in Analytic Projects

June 1, 2009

Managing projects is often challenging. Developing predictive models can be very challenging. Managing projects that develop analytic models can present some especially difficult challenges. In this post, I’ll describe some of the most common mistakes that occur when managing analytic projects.
Managing projects involving analytics can be difficult.
Mistake 1. Underestimating the time required to get the data. This is probably the most common mistake in modeling projects. Getting the data required for analytic projects usually requires a special request to the IT department. Any special requests made to IT departments can take time. Usually, several meetings are required between the business owners of the analytic problem, the statisticians building the models, and the IT department in order to decide what data is required and whether it is available. Once there is agreement on what data is required, then the special request to the IT department is made and the wait begins. Project managers are sometimes under the impression that good models can be built without data, just as statisticians are sometimes under the impression that modeling projects can be managed without a project plan.

Mistake 2. There is not a good plan for deploying the model. There are several phases in a modeling project. In one phase, data is acquired from the IT department and the model is built. A statistician is usually in charge of building the model. In the next phase, the model is deployed. This is the responsibility of the IT department. This requires providing the model with the appropriate data, post-processing the scores produced by the model to compute the associated actions, and then integrating these actions into the required business processes. Deploying models is in many cases just as complicated or more complicated than building the models and requires a plan. A good standards-compliant architecture can help here. It is often useful for the statistician to export the model as PMML. The model can then be imported by the application used in the operational system.

Mistake 3. Working backwards, instead of starting with an analytic strategy. To say it another way: first, decide on an analytic strategy; then, check that the data that is available supports the analytic strategy; then, make sure that there are modelers (or statisticians) available to develop the models; and, then, finally, make sure that the modelers have the right (software) tools. The most important factor effecting the success of an analytic project is choosing the right analytic project and approaching it in the right way. This is a matter of analytic strategy. Once the right project is chosen, the success of the project is most dependent on the data that is available; next on the talent of the modeler that is developing the models; and then on the software that is used. In general, companies new to modeling proceed in precisely the opposite direction. First, they buy software they don’t need (for many problems open source analytic software works just fine). Then, when the IT staff has trouble using the modeling software, they hire a statistician to build models. Finally, once a statistician is on board, someone looks at the data, and realizes (often) that the data will not support the model required. Finally, much later, the business owners of the problem realize they started with the wrong analytic problem. This is usually because they didn’t start with an analytic strategy.

Mistake 4. Trying to build the perfect model. Another common mistake is trying to build the perfect statistical model. Usually, the impact of a model will be much higher if a model that is good enough is deployed and then a process is put in place that: i) reviews the effectiveness of the model frequently with the business owner of the problem; ii) refreshes the model on a regular basis with the most recent data; and, iii) rebuilds the model on a periodic basis with the lessons learned from the reviews.

Mistake 5. The predictions of the model are not actionable. This was the subject of a recent post about an approach that I call the SAMS methodology. Recall that SAMS is an abbreviation for Scores/Actions/Measures/Strategy. From this point of view, the model is evaluated not just by its accuracy but instead by measures that directly support a specified strategy. For example, the strategy might be to increase sales by recommending another product after an initial product is selected. Here the relevant measure might be the incremental revenue generated by the recommendations. The actions would be either to present up to three additional products to the shopper. The scores might be a score from 1 to 1000. The products with the highest three scores are then presented. This is a simple example. Unfortunately, in most of the projects that I have been involved with determining the appropriate actions and measures often requires an iterative process to get it right.

Please share by making comments below any lessons you have learned building analytic models. I would like to expand this list over time to include many of the common mistakes that occur in analytic projects.

The image above is from www.flickr.com/photos/doobybrain/360276843 and is available under a Creative Commons license.


In Analytics, It’s the Actions that Matter

April 28, 2009

In this note, let’s define analytics as the analysis of data in order to take actions. (This is a narrow definition of analytics, but one that is useful here.) If you don’t have day to day work experience with analytics, it is easy to have the mistaken impression that analytics is only about data and statistical models.

Although understanding data and developing statistical models is certainly an important component of an analytic project, this is just one aspect of analytics. This aspect includes cleaning data, enriching data, exploring data, developing features, building models, validating models, and iterating the process. From a broad perspective, this is a process in which the input is data and the output is a statistical model. When most people think of modeling, this is what they think of. For many analytic projects, this is just a small part of what is required for a successful engagement.

The second aspect of analytics is what I am concerned with in this note. This is the aspect of analytics concerned with:

  • developing an appropriate score for a statistical model;
  • using the score to define useful actions;
  • determining which measures are best for evaluating the effectiveness of these actions;
  • tracking these measures (often with a dashboard) and making sure that that they advance the strategic objectives of the company or organization.

One way to remember this is using the mnemonic SAMS for Scores, Actions, Measures and Strategies.

For example, with a response model, often a threshold is used. If the score from the response model is above the threshold, an offer is made (this is the action); if not, no offer is made.

Here are some examples of SAMS:

Model Score Action Measure Strategy
on-line response model likelihood to respond to an offer display the offer to the visitor that has the highest likelihood of response and available inventory revenue per day generated by the web site increase revenue from a website by improving targeting of offers
fraud model likelihood that a transaction is fraudulent approve, decline, or obtain more information detection and false positive rates reduce costs and improve customer experience by lowering fraud rates
data quality model likelihood that a data source has data quality problems if the score is above a threshold, manually investigate the data to check whether there is in fact a data quality problem detection and false positive rates improve operational efficiencies by detecting data quality problems more quickly

A successful analytics projects requires a careful study of what actions are possible; of the possible actions, which can be deployed into operational systems; and, how the systems can be instrumented so that the data required to compute the required measures is available.

The organizational challenge when developing and deploying analytics is that four groups must work together to complete a successful analytic project:

  • The IT group must provide the required data to build the model.
  • The analytics group must build the appropriate models and develop the appropriate scores.
  • The operations group must decide which actions are possible and how these actions can be integrated with current systems and business processes.
  • An executive sponsor must make sure that the measures have strategic relevance and the three groups above collaborate effectively.