• Ei tuloksia

Data quality management process

Many scholars discuss the process steps for data quality management. Baesens et al. (2013) state most programs include four process which are data quality definition, measurement, analysis and improvement. Also, Cappiello et al. (2018) identified four stages for data quality management which include defining the data quality dimensions, measuring the chosen dimensions, analyzing the root causes of data quality issues, and finding improvements. In this research, this approach is applied. First, the dimensions to be assessed are identified and defined. Then, the most important measures are found through literature review, and lastly the data is analyzed. The scope of this research does not include data quality improvement.

Baesens et al. (2013) and Cappiello et al. (2018) named data quality definition as the first step where the appropriate dimensions should be identified. Granese et al. (2015) suggest that the quality assessment starts with choosing the appropriate data quality dimensions. The chosen (most important) attributes are measured and given scores against each of these dimensions. The data quality assessment is conducted using business rules and data profiling. The quality scores could be aggregated at function or enterprise level. (Granese et al. 2015) Batini and Scannapieca (2006) agree that data quality assessment process should start from selecting the dimensions to be measured. Cappiello et al. (2018) in contrast argue that identifying the data quality dimensions and their control methods is mostly done by

experts who might be biased. The control methods are therefore adopted on the basis of expected and evident data quality problems and they lack the effectiveness in dealing with unobserved problems. Thus, the result of quality values and the monitored dimensions could be overestimated or underestimated on the process outcome. (Cappiello et al. 2018) In this thesis, the most important dimensions are chosen based on the regulatory demands. It is not discussed whether other dimensions should be included as well.

Granese et al. (2015) also suggest that the most important attributes for the specific business area should be identified in the beginning of data quality assessment process. In their opinion, the size and complexity of data population of a large financial institution makes complete data quality assessments for all the attributes impractical. Thus, the required attributes being measured should be identified as well. (Granese et al. 2015)

Olson (2003) argued that most incorrect values can be identified if enough effort is devoted for searching them. There are two types of options for finding incorrect data: reverification and analysis. Reverification means manually starting to track information from the original source and check every value. Not all the errors could be identified since wrong values could be inserted again in the reverification process. In real life, this process is excessively time consuming and expensive for most organizations. Additionally, reverification process is not always possible for all data if the data does not exist in the source systems anymore. As a monitoring process before data use, it would most definitely violate the timeliness requirements. Selective reverification could be used as a monitoring technique so that only a small sample of records are reverified. (Olson 2003) To conclude, even if most of the incorrect data could be identified, it is not always economically feasible, and it is a trade-off between timeliness requirements. Heinrich et al. (2018) agree that inadequate measuring could lead to excessive costs. The metrics applied should be economically efficient to use them in practice (Heinrich et al. 2018). Even if errors could be best discovered through manual inspection of values, it is so time-consuming that it does not make sense when datasets are immensely large. Thus, the best data quality metrics identifies as many quality issues as possible in a least amount of time.

3 CREDIT RISK MODELLING

The aim of credit risk modelling is to determine the regulatory capital needed to compensate for potential losses (Baesens et al. 2010). The regulation requires banking institutions to evaluate the credit risks they’ve invested on. The aim of the institutions is to identify as accurately as possible the credit risks resulting from possible defaults on loans. (Baesens et al. 2013) The global financial crisis that began in 2007 showed some banks were not able to adequately aggregate their risk exposure. Regulators then increased the requirements to ensure risk data aggregation and risk reporting are integrated into all risk activities. Risk data aggregation means the activities to define, collect and process risk data to comply with the bank’s risk reporting requirements and become able to quantify their risk tolerance. (Bank for International Settlements 2013) The regulation affects the capital requirements and solvency of financial institutions. Regulators have increased their attention also to data quality issues on credit risk management since the modelling is based on banks’ internal data. Data quality is thus closely monitored in credit risk modelling. (Baesens et al. 2013).

Managing data quality is essential for meeting the regulatory demands. Prorokowski and Prorokowski (2015) remark the financial industry is rapidly becoming more regulated thus financial institutions should concentrate on developing their risk data aggregation processes.

They argue banks need to implement new tools and find efficient ways to achieve high standards. New regulation requires banks to improve their data aggregation processes, and to establish clear frameworks. The improvements would allow banks to remedy more easily from future episodes of financial distress. (Prorokowski and Prorokowski 2015) Gupta and Kulkarni (2016) show data quality issues can have notable impact on key risk numbers and cause inaccuracies in risk reports. Inconsistencies in data structures and formats, and the absence of common data systems and terminology across companies cause challenges for risk data aggregations. They name identifying data quality problems and understanding the root causes of them as critical part of complying with the regulatory requirements. (Gupta and Kulkarni 2016)

For regulatory reasons, insuring data quality has been a primary concern to almost every bank. Many organizations have increased the number of staff and taken short-term clean up

initiatives to improve data quality. Banking institutions are also developing their data quality frameworks in order to respond to the regulatory demands. (Robert Morris Associates 2017) The aim of the case company is to establish an effective data quality management framework to be part of their credit risk modelling projects.

This chapter first presents what are internal ratings based models and how the modelling projects are carried out. This chapter then focuses on explaining the history of regulation for internal ratings based approaches and the requirements for data quality posed by European Central Bank (ECB). The focus is on the components and dimensions of data quality that need to be assessed and monitored but the chapter also touches upon the subject of requirements of data quality management framework, responsibilities and reporting in order to get a comprehensive understanding of data quality management. The IT system requirements are not included in this study.