Data preparation - Reliability engineering case study

4. IMPLEMENTATION AND RESULTS

4.2 Reliability engineering case study

4.2.1 Data preparation

The dataset for this case study contains the failure data of 74 pumps used in a paper an industrial plant in Finland. For each pump, the starting date, the times between the start and each failure, the times between the start and each maintenance, and the cumulated lifetime is recorded. A few entries of the main dataset are shown in Table 12.

Table 12. The dataset for pump failure times ID No

1 10/23/1995 11/7/2016

0 2718

Started Cumulated life-time

2 11/21/1995 1/14/1998 10/6/2000 11/7/2016

0 381 511 1948

Started Failure Failure Cumulated

lifetime 3 11/24/1995 7/6/2004 11/7/2016

0 1231 2954

Started Failure Cumulated

lifetime

4 11/27/1995 7/31/2000 11/9/2000 2/23/2010 11/7/2016

0 826 835 2070 3131

Started Failure Failure Failure Cumulated

lifetime

The data

The failure times are the times in which the pumps failed and stopped functioning. The maintenance policy has been mostly corrective maintenance. The quality of the mainte-nance has been near perfect, so after each failure, the whole parts of the pumps have

been changed, and the pumps assumed to be as good as new (AGAN) after each maintenance.

The censoring in the data is random right censoring. As shown in Figure 30 and Table 12, each pump is started at a random time and monitored until a specific date. The cu-mulated lifetime is the age of the pump at the time the data collection is ended. This value shows the duration that the pump has been working from the starting time, and it is not failed yet; therefore there is a censored record after this period for a new failure.

Figure 30. The process and timeline of data collection

Figure 30 shows the timeline for the instances of pumps, their start time, their failure times and the last date of data recording. Pumps are installed at separate times and may or may not have failures during the data recording period. Since all of the pumps are manufactured with the same mechanical design and material, the time of their setup does not have any effect on their failure times. Therefore, the starting times of all in-stances can be aligned by shifting them to the left. Figure 31 shows the timelines after shifting.

Figure 31. The pump failure timelines after alignment

Based on the assumption about the similarity of the pumps, it can be assumed that all the pumps are the instances of one pump. Therefore, the timeline will look like Figure 32.

Figure 32. The failure timeline for an instance of the pump

Table 13 shows some statistics about the dataset. The total number of data points are 74 instances. The number of instances for each failure, the number of censored values for each failure and the percentage of not censored data points are presented in the table. Percentage of not censored values is calculated by dividing the number of previous failures by the censored values of the current failure.

Table 13. Number of censored and not censored failures Instances Censored Not Censored % Number of

pumps 74

Failure 1 70 4 94.59%

Failure 2 57 16 77.14%

Failure 3 30 27 52.63%

Failure 4 12 17 43.33%

Failure 5 8 4 66.67%

Failure 6 2 6 25.00%

Failure 7 1 1 50.00%

Failure 8 1 0 100.00%

Failure 9 0 1 0%

The time from the starting of the pump to each failure time is named total time to failure (TTTF) in this study. The data is rearranged in a way that each failure time of a pump is associated with the corresponding failure number, i.e., the time to failure (TTF) value for each failure occurrence is calculated. The TTF between every two consecutive failures is calculated by subtracting their TTTF values. A simple excel formula has been used to calculate the subtraction. The first 15 rows of the rearranged data are shown in Table 14.

Table 14. Pre-processed dataset

In addition to censoring in the data, there are several datapoints which their values are missing. For the cases that there is a missing value for any failures, and it is not possible to calculate the TTF. Those data points are considered as missing values. An “N/R” value is placed in the data set for the cells with a missing value, as shown in Table 14.

The censored times (CT) for each TTF value are moved to a value called CT#, in which

# is corresponding to the number of censored TTF. These values are shown in the col-umns CT1 to CT9 of Table 15. For each row, the TTF values after the last failure are impossible values, meaning that it is not possible to have a TTF 𝑖 + 1 when there is no TTF 𝑖 . Therefore, these values are marked with an asterisk (∗) as not available, so the software can detect and handle them as filtered value, not missing values.

Table 15. Censored failure times for each TTF

ID CF1 CF2 CF3 CF4 CF5 CF6 CF7 CF8 CF9

7 * * * 1 * * * * *

8 * * * 2283 * * * * *

9 * * 909 * * * * * *

10 * * * * * 197 * * *

11 * * 1932 * * * * * *

12 * 193 * * * * * * *

13 * * * * * * * * 169

14 2585 * * * * * * * *

15 * * 1212 * * * * * *

The number of datapoint for the variable TTF7, TTF8, CF7, and CF9 is only one data point, and there is no data point for the CF8. Therefore, it is not possible to use them for creating the model, and they are omitted from the dataset.

Missing values

The missingness of data for failure times in the dataset has two reasons. Some failure times have not been recorded due to an unknown reason. These are shown with a num-ber 0 in the database. The other missing failure data are missing because they have not happened at all (in case of the first failures) or the previous failure has not happened yet (failure 𝑖 + 1 when failure 𝑖 is not happened). These missingness can be modeled using the censoring concept in survival analysis, but the Bayesian missing value models can give a better perspective and it can be integrated into the structural learning process as well (see section 3.1.7), therefore, the missing value concept is used to address this issue.

Type of missingness should be characterized to determine the method for handling the missing values. For the first group of missing values, the values that are not recorded, it is not possible to classify them in any specific class. For the second group, since the missingness of failure 𝑖 + 1 is dependent on missingness of failure 𝑖, it is possible to classify them as missing at random (MAR). Assuming that the missingness is MAR, the Structural Expectation Maximization is chosen as the missing value estimation method, which has shown good results in the literature (Friedman, 1998).

4.2.2 Discretization of variables and machine learning of the

In document Bayesian networks in additive manufacturing and reliability engineering (sivua 112-116)