CRISP-DM

This is a new one for me, this is a formalized process to carry out data analytics for a given project.. a task I had to carry out as part of a MSc Course was to use CRISP-DM on a Energy company problem..

CRISP-DM

Business understanding

The primary business objective is to reduce churn. TO do that we need to identify categories of customers most at risk of churn. We need to carry out a segmentation exercise of existing and as the brief says to look at the last three years data, to look at past customers. In fact, the behaviour of left customers is vital as that provides raw data that will allow data analytics techniques to be used to make a decision on their motive to depart Western Power.

Some additional objective may be to upsell, to encourage single supply customers to adopt duel fuel for instance, to encourage direct debit payment as that reduces costs and perhaps to encourage lengthier fixed contracts at fixed prices that are hedged against future spot prices on the energy market. These additional objectives are worthy of separate data analytics exercises in their own right. Reducing the risk of failure of the project to reduce overall churn. Using well researched data analytics methods. And Finishing the project on time and budget will provide justification for further data analytics project to take place, perhaps.

Another minor point but one that may contribute to the success of the project is the soft side of it. Organisations of the size of Western Power often have internal communications departments. This can be engaged with to communicate the project using internal news, presentations. Prompting the project in a transparent way will encourage good feeling. Of course, there needs to be acknowledgement that the data around the project and its outputs will be extremely commercially sensitive, some measures as having staff sign non-disclosure agreements will be necessary, as well as good security controls around artefacts such as code and documentation.

Business objectives

Other business objectives are to enable upsell: encouraging single supply to adopt duel fuel, encourage customers to switch to direct debit, reducing costs of having to bill customers. Encouraging customers to adopt paper less billing. Asking customers to switch from prepayment meters to contracts. All these measures reducing cost to serve. Reducing being an indirect objective but an important strategic one.

The exercise will be deemed successful if churn is stabilised or preferably reduced. The project will be successful if the study finishes on time and budget.

The energy industry has many headwinds that make business not straightforward. The pace of competition, the price of the raw product, gas spot prices and energy generations costs. The rules from the regulators, meaning that for instance that it is simple for customers to switch and the cost of switching is paid for by the energy supplier. The political situation resulting in windfall taxes from central government. The encouragement of customers to become more efficient and use less energy.

Data understanding

It is imperative to understand the business domain. This can be achieved by building experience up, either by working for many years on the data. Working for similar clients, working for similar technical projects, such as insurance or mobile phones - industries where services are billed, the customer can switch and customer relationship is distant via call centre, website and mail. However, the duration of this project is six months. The most obvious thing is to make sure the projects has sponsorship for senior management. This will mean that it is taken seriously by managers whose business domain experience is critical for the success of the project. Having sponsorship at that level means that managers can be relied upon to support the project and that devoting their time and energies is required.

Potentially useful data could be customer demographics - social class e.g. reading of certain newspapers is that customer more likely to use switching sites. This could be important and a valid method to classify customers. Normally used by the press for advertising, for the production of interesting news articles, or to articulate segmentation to a wider audience, such as political, the general public. Data analytics has greater scope if it can be allowed to produce bespoke segmentation over the data, its input and output parameters.

We would hope to interpret bill payment behaviour - are "direct debiters" more likely to switch? We could observe how they joined e.g. from a switching site in the first place so what is their propensity to switch supplier?

Data preparation

Data preparation or the 3rd law of data mining (Khabaza, 2010). On the project plan, there should be from fifty to eighty percent of the estimation in resources and timings based on informal estimates.

Firstly, aggregation of data from various sources such as databases, billing systems, web logs, call centre CRM should take place during the selection stage. To aid performance of large queries in say SQL, the data should be optimised, placing indexes on key fields, carrying out appropriate normalisation techniques to say the 3rd normal form. Eschew data not needed. Backup the data and apply good stewardship of it. Normally it is appropriate for DBAs to be involved at this stage so that correct policies and procedures are put in place around the data.

Cleaning the data could be a significant task. Data will come from a variety of different sources and systems. There will most likely be encoding issues, collation issues and outliers. Consider the points raised earlier where we observed that for an energy company, there will most likely be systems that are legacy, so data may be present in upper case, it may be ASCII format and of an unusual collation order. Some systems may be the result of company reorganisation, such as mergers or divestment. Other systems may be more modern such as the web logs, where data is in Unicode. There will need to be some safe type conversion put in place to avoid errors in later stages. There may be erroneous records, for instance if a call centre assistant has entered an incorrect post code for example. In cleaning the data, the sources should be pulled in to the database and cleansed at that point, so that there is a consistent database following the tenets of ACID to carry out the data analysis from, this is the formatting data stage. The assumption is that a relational database is used, but if the data is very sparse for instance, then other data techniques such as NOSQL may be used. It is important to work with the skills of the IT functions and suppliers, so if for instance ORACLE is the enterprise choice of database, work should be carried out in that realm.

Another characteristic of the energy industry is the relatively recent rollout of smart meters. An issue of billing in the energy supply business is that billing is based upon forecasting and estimation of usage. Customers who commission smart meters may form an interesting variable in the analysis of churn though probably not now, but when the wider adoption of them occurs.

Data modelling

A wide variety of predictive modelling techniques could be selected. For the analysis of a commercial organisations data, it is seldom a task that has not been carried out before for a similar organisation or similar characteristics. This is good as it reduces the risk to the project, if techniques can be reused from other applications that have been peer reviewed. If we use algorithms and techniques that have been validated by mathematics. A literature research of techniques that have been used in similar projects "Application of data mining techniques in customer relationship management: A literature review and classification" (Ngai, Xiu, & Chau, 2009) suggests that for customer retention we might select classification, clustering or forecasting techniques. The algorithms suggested are either Decision trees, neural networks or K nearest neighbours. Since the chief objective is to take a given customer and assess the risk of losing their business, we need to identify them against a manageable set of segmentations. We could give these segments names such as "fickle" for the customer that every Sunday night reads the money sections of the newspapers or is signed up to the moneysaving expert site, "laid backs" as customers who do not have the time to change, and "loyals" who perhaps are too busy with their lives to have the time to change energy supplier.

For segmentation: some valuable insight into segmentation examples were found in the credit card industry (linoff & berry, 2011): "transactors" - customers who pay balance in full every month, "revolvers" making charges every month but not paying full balance and "convenience" customers who use the service to make large purchases paying off over a few months. Important to consider as the least profitable segmentation here are the "transactors". This should be a consideration when we identify certain customers in the Western Power exercise, in that we may be more pragmatic to lose certain customers, the least profitable ones. Yet we need to be mindful of regulation, where we may be obliged to maintain poorer prepay customers and the elderly.

We will be using predictive analytics, but there are two major classes of modelling here: Undirected vs directed: using undirected techniques such as clustering based upon demographics and overlaying behavioural data compared to directed techniques such as decision trees, using one or more target variables to guide the technique though data processing. In directed techniques there is a definitive notion of best, in the example of mobile phone handset pricing plan from (linoff & berry, 2011) is a study that uses undirected data mining using directed techniques.

In "Predict customer churn" (Li, 2017) it provides a practical example using regression, decision trees and random forests to predict customer churn. The example is based upon using the R language. We could take this model and re-code it to suit the variables present in Western Power, such as "dual fuel user", "business customer", "direct debit", "green sources" being that if the customer selects to have their energy from green sources such as biomass. This is of course impossible to provide, but I guess that revenue from such customers is passed to green energy suppliers in the long run.
However, a bespoke R solution is not advisable. Analogous to why a company will not normally write their own CRM system, they will commission a vendor such as SAP to deploy an instance of its product. Western Power would deploy IBM SPSS for instance and obtain guidance from documents hosted on the INSEAD data analytics GitHub page (IBM, 2011) for instance. An agreement over support and consultancy can be arranged with the vendor

Another model to consider is collaborative filtering where we can take item level or a user approach. Here we compare a customer with other customers who are "near" to them. Near being based upon geometric distance between plotting two axes. An example we see regularly is from websites where based upon retrospective purchasing suggests future purchases. A form of segmentation has taken place based upon various criteria that you fulfil, such as the fact that the customer has bought a Rolling Stones album and a Pink Floyd one means that they may like Jimi Hendrix.

One of the above or a combination could be used of approaches could be used to form the data model for the Western Power exercise.
Other methods that could be used but most likely not suited are Bayesian techniques, using probability that since you are a certain type of customer, then you are either at high risk of churn or not. Utilised in SPAM filtering, where we can take an email and compare it to a legitimate email or a spam one to decide whether to classify it so. This is a little too "binary" for the Western Power exercise, where we need to segment customers into say a set of five to fifteen segments perhaps.
Another class of methods are around Neural networks: fuzzy logic circuits can be devised that can be trained by showing it examples of a low risk churn customers and high-risk ones. Once trained, the artificial neuron can predict whether a customer is at risk of churn or not. In the Western Power case this is probably not a useful technique, as it more suited to finding exceptions, the rare case where there is a fraudulent transaction in its typical application in the banking industry.

For this exercise, I think it is important to try out the k nearest neighbours and method described by (Li, 2017) In the "Predict customer churn" example. The method described by (Li, 2017) relies upon a suite of attributes, such as broadband connection, multiple lines and other telecom customer data. Western Power may not have the luxury of supporting so many attributes consistently over the entirety of its customer base. Whereas using k nearest neighbours and tweaking the value of k, we can establish a good idea of whether a customer is of a particular segment. Using different attributes of a customer, we could establish whether they are high risk churn or not.

The model will be tested using a manageable set of customers that we know the outcome of. We will "roll back" the clock and observe the customer activity to understand their behaviour. Hopefully the right data model and inferences will mean that the model passes and that we can scale it up. If not go back to further research as we will have built in plenty of project "slack" to allow for this.

Another important test to be conducted during periodic review is to compare the churn numbers with baseline, so we can measure success of the model, which will hopefully show that we have met our objective to reduce churn.

  the references I read were very helpful, but I could have made more :-(

References

ABCDE socio-economic classification. (n.d.). Retrieved from Nielsen Admosphere: http://www.nielsen-admosphere.eu/products-and-services/tv-audience-measurement-in-the-czech-republic/abcde-socioeconomic-classification/
Hung, S.-Y., Yen, D. C., & Wang, H.-Y. (2006). Applying data mining to telecom churn management. Expert Systems with Applications, 515-524.
IBM. (2011). IBM SPSS modeler CRISP-DM guide. Retrieved from INSEAD data analytics github: https://inseaddataanalytics.github.io/INSEADAnalytics/CRISPDM.pdf
Khabaza, T. (2010, 04 01). Nine Laws of Data Mining. Retrieved 12 2017, from Data Mining & Predictive Analytics : http://khabaza.codimension.net/index
files/9laws.htm
Li, S. (2017, 11). Predict Customer Churn - Logistic Regression, Decision Tree and Random Forest. Retrieved from datascienceplus: https://datascienceplus.com/predict-customer-churn-logistic-regression-decision-tree-and-random-forest/
linoff, g. s., & berry, m. j. (2011). Data Mining Techniques. In g. s. linoff, & m. j. berry, Data Mining Techniques (p. 891). Indianapolis: Wiley.
Ngai, E., Xiu, L., & Chau, D. (2009). Application of data mining techniques in customer relationship managment: A literature review and classification. Expert Systems with Applications, 36.
Z. Xiaobin, G. F. (2009). Customer-Churn Research Based on Customer Segmentation. 2009 International Conference on Electronic Commerce and Business Intelligence (pp. 443-446). Beijing: IEEE.