• Ei tuloksia

Fairness in recommender systems

2.1 Recommender Systems

2.1.4 Fairness in recommender systems

In recommender systems, we must identify the critical role of personalization. The core of recommendations is that the best-suited items for an individual user may not be the best for other users. It is also essential to note that recommender systems exist to help transactions. Thus, many recommendation applications include multiple stakeholders and, therefore, may give surge to fairness issues for more than one group of participants (Abdollahpouri, Burke, and Mobasher 2017). Recent research in recommender systems has identified diversity as an essential part of the quality recommendation. The algorithms which are trained at increasing diversity among recommended items result in bias treatment between the users. The concept user fairness has been ignored so far, which is satisfying user by recommending items according to their interests or in a group of users every recommended package must have something for every user in the group.

In order to suggest items to the user, recommender systems learn the prior user interactions and the choices they have previously made, like giving ratings (for movies, products, etc.). The effectiveness of such an algorithm usually depends on the accuracy of predicted items by the system to the user, which can be calculated by keeping in view the interests of the user regarding the predicted item (Leonhardt et al. 2018). The criticality of user fairness comes into existence when we find the effect of recommendations on users. On the contrary, it can be unfair to disregard the preference of a specific set of users while trying to improve diversity in recom-mendations. In contrast, it is equally not good if there is no element of novelty in the items recommended to the user (Leonhardt et al. 2018). A more desperate state can be observed in a job site where user behaviors can change due to additional features, which is a user who is less confident will click lower salary jobs, and he will be presented with only lower salary jobs even he is qualified for high salary job.

We considered two major sources of unfair distributions in recommender systems.

The first, and more obvious, is the constantly changing distribution caused by the recommendations of items to users. This results in unfairness where some items are recommended rarely. The second is caused by algorithms that address unfairness in the marketplace. Much of the prior research is related to enhancing personal rec-ommendations by improving diversity, in which the goal is to give distinguishable

recommendations to end-users and combine diversity, which focuses on enhancing item diversity among all users. Though individual and combined diversity can be used for enhancing the fairness factor for users and items, respectively. Individual diversity focuses on providing users with a diverse set of recommendations, while combined or aggregated diversity focuses on enhancing item diversity. In creating fairness, one needs to examine fairness factors that should not unfairly distinguish against a specific group of users (Leonhardt et al. 2018). There is sufficient litera-ture in Machine learning regarding fairness and discrimination in Data Mining, and Information retrieval systems. Fairness is implemented in different forms that some-times relate to different discussions such as securing fair results for diverse groups, or equal accuracy, or equal false positives/negative rates.

Basic Fairness Definitions

There are different definitions of fairness, depending on our interpretation for the satisfaction of user i with the recommended item i (Sacharidis 2019). In fairness proportion, one can make the assumption that user u is happy by recommended item i, if user u can see item i in top-ranked suggested items.

According to the definition, it means that the user u finds the package that is recommended as fair if there are at least some items in the package that the user likes. On the contrary, when the ranked item i was given by user u, is in top ratings, the user is satisfied by the item. This is known as envy-freeness in the group. In short, the user is satisfied with the recommended package if there are m items that the user likes in the package. The above mentioned definitions are given by Herreiner and Puppe (2009), Serbos et al.

(2017). In advanced statistical and modeling algorithms, the quality of fair results is normally attached to minimizing bias against a set of single or user groups. Fairness through awareness (Dwork et al. 2012) assures indifference to delicate characteristics (e.g. age, race, gender) as a plan for balancing results.

Hardt et al. (2016) recommends in the study that total true positives must be equal in all groups, which results in giving similar opportunities. When items are rated in a particular order, the percentage of individuals that appear in a prefix of the ranking must be more than a given percentage to assure statistical tests of representatives as described by Zehlike et al. (2017). The items which are in different positions according to their ranking receive a different level of attention: Lower ranked items receive a lot less attention than higher-ranked items. One solution to balance the opportunity for items to get attention is to re-rank them after the scores are computed (Singh and Joachims 2018).

This minimizes unfairness measures. The issue we handle here is that in the recommended list of items, there is almost the same relevance score between items. In these situations, an important decision has to be made regarding

the top-ranked items. One potential solution for this situation is Amortized Fairness, which is proposed by Biega, Gummadi, et al. (2018). In Amortized Fairness, the output result of the prediction related to the relevance, and accumulated relevance should be proportional to accumulated attention across a series of rankings.

Fairness in Collaborative filtering

Fairness in computing has been discussed in regards to equality and justice for ages. In machine learning, fairness is discussed as a lack of judgment. Almost every recommender system is dependent on data, and it has been observed that the data-driven system can cause statistical unfairness or bias. This can be due to different sets of reasons which mainly include training data bias, user’s reaction to a certain group of items which gives birth to user bias in behavior, and there is bias in data already (Altenburger et al. 2017).

The most dominant recommendation technique, namely collaborative filtering (Wang et al. 2015), takes an input of user behavior, it ignores user demo-graphics and item attributes. However, this does not mean that fairness can be ignored; for example, a recommender system suggesting job opportunities to job seekers. The ultimate goal of developing such a system should be to ensure that male and female users with similar qualifications get recommen-dations of jobs with same rank and salary. The system would need to guard against biases in recommendation output, even if the biases arise entirely due to behavioral diversity: for example, male users might be more inclined to click on high paying jobs. Tackling such biases is difficult if we cannot establish a shared global decision ranking over items.

Fair Ranking

A fair ranking has a certain set of characteristics that includes individual fair-ness, which means similar items are treated consistently. There should be an adequate appearance of items that belong to different groups such as disad-vantaged/preserved groups that limit harm to the other group members of this group. In fair rankings, two main lines of works have been proposed and discussed widely, which includes the attention-based and probability-based measures (Castillo 2019). The attention-based measure is a measure of the attention that different items receive through different sources which include click-through rates, or the possible consideration they might get, through dele-gates such as the possibility that items will be considered related (and probably clicked). Second group, probability-based measures consider that there is a random process that generates ranking one such process is proposed by Yang

and Stoyanovich (2017). This statistical process involves comparing distribu-tions of a wide set of groups (KL-divergence) and then taking the average of their difference. The difference which is taken is calculated in the discounted way, which is similar to Normalized Discounted Cumulative Gain (NDCG, it is widely used in extracting information (Järvelin and Kekäläinen 2002).

Multi-sided Fairness

Fairness can have a multi-sided purpose in different recommendation settings, especially in cases where we want to have fair results when multiple individ-uals are involved (Burke 2017). These multiple stakeholders are divided into different categories: consumers C, providers P, and platform or system S.

Recommendations are delivered to consumers; they are the main end-users.

The provider’s job is to stay behind and anticipate consumer demands and provide what is needed in case of recommendations; the provider offers the recommended items that are needed. In the last category, the platform which has built the recommender system is bridging the gap between consumers and providers and, as a result, gets the financial benefit. Recommendation pro-cesses within multi-sided platforms can give rise to questions of multi-sided fairness. There are three classes of systems, characterized by the fairness issues that arise relative to these groups: consumers (C-fairness), providers (P-fairness), and both (CP-fairness).

Fairness in Groups

Group recommendation has caused motivation in important research efforts for its influence in benefiting a group of users. The idea is to try to increase the user satisfaction of every member of the group while minimizing the unfairness measure between a member of the group. For example, in a given group Group of users, we like to recommend certain package P to group members u. We examine two distinct perspectives of fairness. One aspect is that each user gets what it is looking for in the package, which means there are sufficient items in the package which different users like as compared to the items which they don’t like. This is also called fairness proportionality. The second perspective is envy-freeness, which means for every member in a group, there are enough items in the package that one likes more than others do (Serbos et al. 2017).

We have seen the rise of multiple approaches concerning the adaptation of fairness in recommender systems. The fair ranking is predicted rating received by an item based on user behaviors. Diversity focuses on the variety, both from user and item point of view. On the other hand, attention-based measures calculate the attention

given by a user to the item, which tells the user’s interest. While evaluating dif-ferent approaches, we must have a good understanding of the problem. All these approaches help us understand not only the cause of unfairness but also make the path to unbiased recommendations.