• Ei tuloksia

Markov’s chains in attribution modeling

2. ONLINE ADVERTISEMENT AND ATTRIBUTION MODELING

2.3 Markov’s chains in attribution modeling

Markov’s chains were first applied to digital marketing attribution modeling by Archak et al. in context of search engine advertisement (Archak et al. Apr 26, 2010). Later Anderl et al. proposed first and higher order Markov’s chains to be applied in channel attribution problem (Anderl et al. 2014)(Anderl et al. 2016). According to Anderl et al. the Markov’s chain methodology meets the most important requirements for attribution model: objec-tivity, predictive accuracy, versatility, interpretability, robustness and algorithmic effi-ciency.

In order to apply Markov’s chains in a channel attribution problem we need to present each customer journey as a sequence of touch points to the marketing channels (𝑐1, 𝑐2, … , 𝑐𝑛) that the customer has encountered before the conversion. Markov’s chain treads these marketing channels as states (𝑠1, 𝑠, … , 𝑠𝑛) in the customer journey

𝑆 = {𝑠1 , 𝑠2 , … , 𝑠𝑛} (1) and combines the states to the transitioning matrix W with transitioning probabilities be-tween each of the states

𝑊 = 𝑃(𝑋𝑡 = 𝑠𝑗 |𝑋𝑡−1= 𝑠𝑖), 0 ≤ 𝑤𝑖𝑗 ≤ 1, ∑𝑁𝑗=1𝑤𝑖𝑗= 1 ∀ 𝑖 (2) where 𝑊 is the transitioning probability to the next state 𝑠𝑗 given the current state 𝑠𝑖. The transitioning probability 𝑤𝑖𝑗 is always between 0 to 1 and the sum of all transitioning probabilities from on state is 1. A Markov’s chain is a sequence of states that represent individual customer journey. A Markov’s graph is a representation of all the states and the transitioning probabilities 𝑀 = {𝑆, 𝑊}. (Anderl et al. 2014)

As a simple example, if the marketing mix of a company would include three channels C1, C2 and C3, the states included in the Markov’s chain would be S1, S2 and S3. For modeling purposes, we would need to add 3 more steps (START, CONVERSION and NULL) to be able to fully present the customer journey and calculate the conversion probabilities between every channel. The START state represents the starting of the customer journey, the CONVERSION state the successful conversion and the NULL state the end of customer journey that hasn’t ended in successful conversion within the observation time window. The transitioning probability 𝑤𝑖𝑗 means the probability that the contact with state 𝑖 is followed by contact with state 𝑗. In the simple example our company

would have a dataset of three customer journeys as presented in table 2. Picture 5 shows an example Markov’s graph based with the transitioning probabilities calculated based on the three customer journeys presented in the table 2.

Table 2 Example customer journeys

Journey name Journey states

Journey 1 START, S1, S2, S3, CONVERSION

Journey 2 START, S2, NULL

Journey 3 START, S1, S2, NULL

The order of the Markov’s chain defines how many states before the current state is considered when calculating the probability (Anderl et al. 2014). First order Markov’s chain takes into account only the current state and the probability of going anywhere from that state. Second order Markov chain looks back one state, so it takes in to account current state and one state before the current state. Third order Markov chain looks back two states and so on. The transition probability of a k-order Markov’s chain is calculated as follows:

𝑤𝑖𝑗= 𝑃(𝑋𝑡= 𝑠𝑡 |𝑋𝑡−1= 𝑠𝑡−1, 𝑋𝑡−2 = 𝑠𝑡−2, … , 𝑋𝑡−𝑘 = 𝑠𝑡−𝑘). (3) As the order of the Markov’s chain increases, the number of independent parameters and complexity of the model increases exponentially and therefore the modeling task becomes more computing intensive and the risk of overfitting the model increases. Alt-hough higher order Markov’s chains tend to model the customer journeys more accu-rately, at some point the model becomes too complex for real world datasets that are usually limited by size (Anderl et al. 2016).

Picture 5. Example of a Markov’s graph with transitioning probabilities

To be able to use Markov’s chains technique for attribution modeling Anderl et al. (Anderl et al. 2014) propose a removal effect analysis to be used to determine the change in the

probability of reaching from START state to the CONVERSION state if one of the mar-keting channels is removed. This change in the overall probability models the effect of each marketing channel in the company’s marketing mix. For example, we can calculate the removal effect for the channel S1 in the example customer journeys in the table two by calculating the probability of reaching the conversion when we remove the channel S1 from the model:

𝑃(𝑐𝑜𝑛𝑣𝑒𝑟𝑠𝑖𝑜𝑛 𝑎𝑓𝑡𝑒𝑟 𝑟𝑒𝑚𝑜𝑣𝑖𝑛𝑔 𝑆1) =

𝑃(𝑆2 → 𝑆3 → 𝑐𝑜𝑛𝑣𝑒𝑟𝑠𝑖𝑜𝑛) = (3)

0,33 ∗ 0,33 ∗ 1 = 0,11

This means that we can convert 11% of all conversions if we remove the channel S1 from the marketing mix. With the S1 channel intact we can convert 33% of all customer journeys and therefore the removal effect of channel S1 is 0,11/0,33 = 0,33. This means we would lose 33% of all conversions if we remove the channel S1.

From the Markov’s graph in the picture 5 it is easy to see that all the customer journeys that lead to conversion use the channels S2 and S3 and therefore their removal effect is 1, meaning that if we remove the channels, we would lose all the conversions. Now that we know all the removal effect, we can calculate how much credit should each of the channels get from the total amount of conversions based on channels relative removal effect compared to the removal effects of other channels. For example, the S1 channel’s attribution coefficient is calculated 0,33 / (1+1+0,33) = 0,14. This implies that we should attribute 14% of all conversions to the channel S1. It is good to note that this is a highly simplified example of customer journeys as there are usually hundreds or even thou-sands different customer journeys especially if the company uses high number of mar-keting channels to drive traffic to the website.

Calculating the first order Markov’s chain results is a fairly simple task, but the complexity increases as we use higher order Markov’s chains. Higher order Markov’s models allow us to calculate the removal effect on states representing channel sequences and there-fore take the order of the touchpoints to marketing channels in to account. In such cases the effect of a single marketing channel is calculated as a mean of the removal effects of all the states having that specific marketing channel as the last channel in the se-quence.

Markov’s chain was selected as the analysis method for this thesis due the good results in prediction accuracy (Anderl et al. 2016)(Alblas 2018), the algorithmic efficiency that allows continuous recalculation of the model and the ease of implementation with widely

used data collection service (Google Analytics) data and the ChannelAttribution R pro-gramming language package.

3. DATA COLLECTION AND ANALYSIS