• Ei tuloksia

Both LR and DT were found to be usable methods to study customer churn in the electricity industry.

It would be interesting to do more broad study about customer churn with other ML methods as well.

Xie et al. (2009) used balanced RF for studying customer churn, which reduced the noise and helped with the imbalance issues often found in customer churn studies. This type of more refined DT model could be interesting topic to study and then compare it to the simple decision tree model used in this thesis.

There are many other possible research opportunities in customer churn as well. More complex data with more variables and larger datasets could be used. It is important to remember that the results of this study are based on single dataset received from the electricity company. The limitations of the data make it difficult to generalize the results of the study, thus more research into the topic is needed.

One further research topic is to increase the time span of the data to years. Ballings and Van den Poel (2012) found that increasing the time span over 5 years in studying customer churn does not significantly increase the accuracy. But studying a data set of 1-5 years could be a topic for next research.

44

Finally, customer churn modelling could be used at electricity companies to increase customer retention by predicting the customers who are at danger of leaving the company and focusing retention efforts on them. Lima et al. (2009) suggested adding more domain knowledge to customer churn modelling, for example the number of calls to the customer helpdesk. As acquiring new customers is more expensive that retaining existing ones, these types of models have potential to increase profits for the companies.

45

References

Ahn, J. et al. (2006) ‘Customer churn analysis: Churn determinants and mediation effects of partial defection in the Korean mobile telecommunications service industry’, Telecommunications Policy, 30, 10-11, pp. 552–568.

Athanassopoulos, A. D. (2000) ‘Customer satisfaction cues to support market segmentation and explain switching behavior’, Journal of Business, Research 47, pp.191–207.

Bishop, C. M. (2006), ‘Pattern Recognition and Machine Learning’, Springer.

Bolance, C. and Guillen, M. (2016) ‘Predicting Probability of Customer Churn in Insurance ‘, Research Gate. DOI: 10.1007/978-3-319-40506-3_9

Burez, J. and Van den Poel, D. (2009) ‘Handling class imbalance in customer churn prediction’

Expert Systems with Applications. Volume 36, Issue 3, Part 1, pp. 4626-4636. Available at:

https://doi.org/10.1016/j.eswa.2008.05.027

Burez, J. and Van den Poel, D. (2007) ‘CRM at a pay-TV company: Using analytical models to reduce customer attrition by targeted marketing for subscription services’ Expert Systems with Applications.

Volume 32, pp. 277–288. Available at: https://doi.org/10.1016/j.eswa.2005.11.037

Cawley, G. C. et al. (2010) ‘On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation’, Journal of Machine Learning Research 11.

Chen, C. et al (2004). ‘Using random forests to learn imbalanced data’, Technical Report 666.

Statistics Department, University of California at Berkeley.

Council of European Energy Regulators CEER (2017). Retail Markets Monitoring Report.

[www.document] [Accessed 30 August 2020] Available at:

https://www.ceer.eu/documents/104400/6122966/Retail+Market+Monitoring+Report/56216063-66c8-0469-7aa0-9f321b196f9f

46

Coussement, K. and De Bock, K. (2013) ‘Customer churn prediction in the online gambling industry:

The beneficial effect of ensemble learning’, Journal of Business Research 66, pp. 1629–1636.

Coussement, K et al. (2016) ‘A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry’, Decision Support Systems, Volume 95, Pages 27-36

Cramer J. S. (2003) ‘The Origins of Logistic Regression’, Tinbergen Institute Working Paper No.

2002-119/4.

Dankers, F. J. W. M et al. (2018) ‘Prediction Modeling Methodology’, Fundamentals of Clinical Data Science. pp 101-120.

Chakrabarti et. al. (2006). ‘Data Mining Curriculum: A proposal’, ACM SIGKDD.

De ville, B. (2013). ‘Decision trees’, WIREs Comput Stat 2013, 5:448–455. doi: 10.1002/wics.1278

Drummond, C. and Holte, R. C. (2003). ‘Class imbalance, and cost sensitivity: Why under-sampling beats over-sampling’, In: Workshop on learning from imbalanced data sets II, international conference on machine learning.

Elitedatascience. (2017) Overfitting in machine learning: What it is and how to prevent it. [www document]. [Accessed 1 August 2020]. Available https://elitedatascience.com/overfitting-in-machine-learning.

European Energy Markets Observatory (2010): ‘Energy, utilities and chemicals’, 2009 and Winter 2009/2010 Data Set Twelfth Edition.

Fawcett, Tom. (2006) ‘An introduction to ROC analysis’, Pattern Recognition Letters Volume 27, Issue 8, June 2006, Pages 861-874.

Gur Ali, O. and Arıtürk, U. (2014) ‘Dynamic churn prediction framework with more effective use of rare event data: The case of private banking’, Expert Systems with Applications 41, pp. 7889–7903.

47

Hosmer, D. W. and Lemeshow, S. (2000). ‘Applied Logistic Regression’. 2nd Ed. Chapter 5. New York, NY: John Wiley and Sons. pp. 160 –164.

Huigevoort, C. (2015) ‘Customer churn prediction for an insurance company’ Eindhoven University of Technology, research.tue.nl

Idris, A. et al. (2013) ‘Intelligent churn prediction in telecom: Employing mRMR feature selection and rotboost based ensemble classification’, Applied Intelligence 39, pp. 659–672.

Keaveney, S. M. and Parthasarathy, M. (2001). ‘Customer switching behavior in online services: An exploratory study of the role of selected attitudinal, behavioral, and demographic factors’, Journal of the Academy of Marketing Science, 29(4), 374–390.

Kim, H. and C. Yoon. (2004) ‘Determinants of Subscriber Churn and Customer Loyalty in the Korean Mobile Telephony Market‘, Telecommunications Policy. 28: pp. 751-765.

Lazarov, V. and Capot, M. (2007) ‘Churn Prediction’, Bus. Anal. Course. TUM Comput. Sci, 2007 - Citeseer.

Lima, E., Mues, C., & Baesens, B. (2009). ‘Domain knowledge integration in data mining using decision tables: Case studies in churn prediction.’, Journal of the Operational Research Society. 60, pp. 1096–1106.

Magdalena et. al. (2017). ‘Information as potential key determinant in switching electricity suppliers.’

Zeitschrift für Betriebswirtschaft; Heidelberg Vol. 87, Iss. 2, pp. 263-290. DOI:10.1007/s11573-016-0821-9

Mathworks (2020), Available at: https://se.mathworks.com/?s_tid=gn_logo

Narasimha, M. M. and Susheela, D. V. (2015) ’Introduction to Pattern Recognition and Machine Learning’, pp.1-5.

48

Neslin, S, A. et al (2006) ‘Defection Detection: Measuring and Understanding the Predictive Accuracy of Customer Churn Models’, Journal of Marketing Research 204 Vol. XLIII 204–211.

Available at: https://doi-org.ezproxy.cc.lut.fi/10.1509/jmkr.43.2.204

Olle, G. (2014) ‘A Hybrid Churn Prediction Model in Mobile

Telecommunication Industry’, Int. J. E-Educ. E-Bus. E-Manag. ELearn.

Peng, J. (2002) ‘An Introduction to Logistic Regression Analysis and Reporting’, The Journal of Educational Research 96(1):3-14 DOI: 10.1080/00220670209598786

Powers, David M W (2008). ‘Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation’. Journal of Machine Learning Technologies. 2 (1): 37–63.

Pribil, J. and Polejova, M. (2017) ‘A Churn Analysis Using Data Mining Techniques: Case of Electricity Distribution Company ‘, Proceedings of the World Congress on Engineering and Computer Science Vol I WCECS.

Risselada, H. et al. (2010) ‘Staying power of churn prediction models’, Journal of Interactive Marketing 24, pp. 198–208.

Sabbeh, F. S. (2018) ‘Machine-Learning Techniques for Customer Retention: A Comparative Study’, International Journal of Advanced Computer Science and Applications, 9(2). doi:

10.14569/ijacsa.2018.090238.

Vafeiadis, T. et al (2015) ‘A comparison of machine learning techniques for customer churn prediction’, Simulation Modelling Practice and Theory Volume 55, pp. 1-9. Available at:

https://doi.org/10.1016/j.simpat.2015.03.003

Verbeke, W. et al. (2011) ‘Building comprehensible customer churn prediction models with advanced rule induction techniques’, Expert Systems with Applications 38, pp. 2354–2364

Wang, Y. et al (2009) ‘A recommender system to avoid customer churn: A case study ‘, Expert Systems with Applications Volume 36, Issue 4, pp. 8071-8075. Available at:

https://doi.org/10.1016/j.eswa.2008.10.089

49

Weiss, G. M. (2004). ‘Mining with rarity: A unifying framework.’ SIGKDD Explorations, 6(1), pp.

7–19.

Wei, C. P. and Chiu, I. (2002) ‘Turning telecommunications call details to churn prediction: a data mining approach. ’Expert Systems with Applications, Volume 23, Issue 2, Pages 103–112

Witten, I. H. et al. (2016) ‘Data Mining: Practical Machine Learning Tools and Techniques. ’ Morgan Kaufmann.

Xie, Y. et al. (2009) ‘Customer churn prediction using improved balanced random forests’, Expert Systems with Applications 36, pp. 5445–5449.

Zimek, A. and Filzmoser, P. (2018). ‘There and back again: Outlier detection between statistical reasoning and data mining algorithms’ Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery.

50

Appendices

Appendix 1. Logistic Regression Model