Collecting annotated eye blood vessel images using crowdsourcing

(1)

Collecting annotated eye blood vessel images using crowdsourcing

Milad Nikrou

Master's thesis

School of Computing Computer Science

August 2018

(2)

UNIVERSITY OF EASTERN FINLAND, Faculty of Science and Forestry, Joensuu School of Computing

Computer Science

Student, Milad Nikrou: Collecting annotated eye blood vessel images using crowdsourcing

Master’s Thesis, 54 p., 3 appendix (12 p.)

Supervisors of the Master’s Thesis: PhD Ana Gebejes August 2018

Abstract:

Detection of blood vessel in human eye is an important objective in different fields and applications, it could be used for medical purposes or other fields like eye tracking and biometric authentication. This cause a great need for accuracy in image processing algorithms, to obtain this accuracy these algorithms need to be trained.

There should be proper and accurate datasets for training these algorithms. This thesis concentrates on gathering these datasets using crowdsourcing, by asking crowd to annotate the blood vessels in images of the scleral region of human eye. In this thesis, 25 crowd users annotate 21 images in an online web application which is implemented exclusively for this thesis. The blood vessels are considered as graphs in this thesis, so the output of the annotations are graphs. This makes it possible to compare and evaluate graphs by graph matching algorithms. In addition, this thesis suggests a graph edit distance technique for evaluating the output graphs, and demonstrates the evaluation process for the collected annotated images and shows the results of this evaluation.

Keywords: human eye, vessel, crowdsourcing, graphs, graph matching, graph edit distance.

(3)

Foreword

This thesis was done at the School of Computing, University of Eastern Finland during the spring 2018.

I would like to thank my parents for supporting me unconditionally whole my life.

After I earned my bachelor’s degree in 2009, there was about a 7 years gap for me to continue my studies in master’s, during this period my mother and my father always motivated me to continue my studies and told me not to be afraid of a new experience in my life. I think I owe them all the successes in my life.

I also want to thank Ana Gebejes who guided me as my supervisor throughout this thesis and helped me in different stages of this work. And Finally, I should thank the people who accepted to help me in data collection stage of my thesis and dedicated their time for doing the annotation tasks for this study.

(4)

List of abbreviations

CS Crowdsourcing GED Graph edit distance

(5)

1 INTRODUCTION

Our eyes are one the most important parts of our body. We perceive 80 percent of our impressions from our visual system. Our eyes are the best part of our body for protecting us from danger (Kaplan 2015). There are many applications for detecting different parts of the human eye using computers. Medical technologies in the field of diagnosis disorders related to human eyes are growing dramatically. There are disorders and health problems like diabetes and high blood pressure that can be detected by different eye examinations. The importance of the eye causes a great need for computer-based tools for detecting the changes in different parts of the eye.

Developments in medical computational technologies are opening possibilities for automatic medical image processing. This type of processing would allow algorithms to understand medical images. Detection of blood vessels in the eye can also be utilized in diagnosing different diseases. Apart from medical uses there are other important applications of scleral blood vessel detection including eye tracking and biometric authentication. However, to ensure accuracy, these image processing algorithms must be trained. Training in practice requires medical doctors to label medical images by hand, but obtaining large enough label sets is usually expensive and time-consuming. This objective could be reached more efficiently by crowdsourcing (CS) (Leifman et al., 2015), for this reason this work will be based on the concept of crowdsourcing which will be discussed in section 2.2.

In this thesis, we recognize blood vessel networks as complex structures that can be modeled with graphs (section 2.3). By annotating scleral blood vessels with graphs, we can use graph matching techniques to compare annotated images and analyze differences (more details in section 3.1). These annotations can be used for example in training blood vessel detection algorithms in machine learning. In this research we collect annotated data from expert user and crowd (see section 3.3), then we compare and analyze the graphs collected from the annotated images using a graph matching technique called Graph edit distance (GED) (Gao et al., 2010) (details in chapter 4).

A web application is developed for data collection, and all annotations are done online through this web application. An evaluation procedure will be performed to process the annotated images and collect the most accurate annotations based on graph edit distance technique. In section 3.2 an overview of the collected image

(8)

2

database is provided. The detailed information about the implemented web application is explained in in section 3.4. Finally, the collected data of the experiment including samples of annotated images and evaluation results could be found in chapter 4. Conclusion and future works are explained in chapter 5.

1.1 Aim of the study

The main goal of this study is to research if the crowdsourcing can be used for gathering large datasets of annotated images. The annotations will be evaluated based on a graph matching method to find out if they contain reliable and consistent data. The research questions to be addressed are:

1. How the quality of crowd annotated images can be evaluated?

2. How can we decrease the number of erroneous, random and inaccurate annotations?

(9)

3

2 BACKGROUND

This study is focused on using crowdsourcing for collecting annotated images of blood vessels in human eye. Because of the structure of blood vessel, the annotated vessels are modeled as graphs, and the quality of annotations will be evaluated based on graph matching. The background chapter is divided into three sections which are focused on human eye, crowdsourcing and graph matching concepts. Some general information about the human eye are given in section 2.1. Different aspects of crowdsourcing are described in section 2.2. Two successful examples of crowdsourcing in business are represented in section 2.2.1. Benefits and Challenges of crowdsourcing are discussed in section 2.2.2 and section 2.2.3. Section 2.2.4 explains the uses of crowdsourcing in scientific researches and related works.

Section 2.3 is focused on graph data structure. Different techniques of graph matching are briefly reviewed in section 2.3.1. Finally, graph edit distance technique which is used as an evaluation method for annotated images in this thesis is described in 2.3.2.

2.1 Eye

The eye is part of human visual system, and one of the most complicated parts of human body. The human eye consists of three layers. The outer part consists of the cornea and the sclera. Cornea is responsible for the refraction and transfer of the light to the lens. The sclera (Figure 2.1), which is also called the white part of the eye is a connective tissue coat that protects the eye from internal and external forces. Sclera contains small blood vessels which are responsible for nourishment (Willoughby et al., 2010). In this study, this connected blood vessels are modeled as unlabeled- undirected graphs (see section 2.3).

(10)

4

Figure 2.1. Sclera (Gary Heiting, 2018)

2.2 Crowdsourcing

The concept of crowdsourcing was first introduced by Jeff Howe in Wired magazine in 2006. Howe mentioned four examples in his article as crowdsourcing models:

Threadless.com, InnoCentive.com, Amazon’s Mechanical Turk and iStockphoto.com (Brabham, 2013). In his weblog(www.crowdsourcing.com) Howe uses two definitions for crowdsourcing:

“The White Paper Version: Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.” (Howe, 2010)

“The Soundbyte Version: The application of Open Source principles to fields outside of software.” (Howe, 2010)

Brabham (2013) describes crowdsourcing as follows:

“An online, distributed problem-solving and production model that leverages the collective intelligence of online communities to serve specific organizational goals. Online communities, also called crowds, are given the opportunity to respond to crowdsourcing activities promoted by the

(11)

5

organization, and they are motivated to respond for a variety of reasons. “ (Brabham, 2013)

And Doan, Ramakrishnan and Halevy (Doan et al., 2011) proposed the following definition for crowdsourcing:

“We say that a system is a CS system if it enlists a crowd of humans to help solve a problem defined by the system owners, and if in doing so, it addresses the following four fundamental challenges: How to recruit and retain users?

What contributions can users make? How to combine user contributions to solve the target problem? How to evaluate users and their contributions?”

(Doan et al., 2011)

In some definitions there might be sometimes a confusion between open source concept and crowdsourcing. In opensource.com there is a definition for this word:

“The term open source refers to something people can modify and share because its design is publicly accessible. The term originated in the context of software development to designate a specific approach to creating computer programs”. This definition makes it clear that crowdsourcing is different from open source.

Crowdsourcing is a Web 2.0 phenomenon, and it’s a compound word which is formed from two words, crowd and sourcing. Crowdsourcing can be described as outsourcing a task to a group of anonymous individuals with an open invitation which is mostly done through Internet. The anonymous individuals who are performing the task are called crowd, who can be experts and professionals in different fields or even beginners or normal people. In crowdsourcing process many individuals may participate at the same time on a task, and the crowdsourcing organization which can be called crowdsourcer will finally select the best outputs (Schenk and Guittard, 2011)

The third definition resented by Doan, Ramakrishnan and Halevy (Doan et al., 2011) is the closest definition of crowdsourcing based on what is performed in this study. A task is defined, and some randomly selected anonymous individuals (crowd) are asked to perform the task in an online environment.

(12)

6 2.2.1 Examples of crowdsourcing business

Threadless(Threadless.com) which was mentioned as an example in Jeff Howe’s article in Wired magazine is an online clothing company that sells silk-screened graphic T-shirts. Members of the Threadless online community can create their own designs using available templates on the website and upload them to a gallery. The other members of the community can score the designs on a 0 to 5 rating scale. The designs with highest scores are printed in the company’s Chicago headquarters and sold to the community in the online store in the website. The winners receive a 2000$ cash and a 500$ gift certificate as a reward. The company profits from this procedure since they know they’re only printing shirts which are already demanded by customers (Brabham, 2013).

InnoCentive (innocentive.com) is another example of crowdsourcing. Companies can define challenging scientific research and development problems in InnoCentive website and offer cash prizes for people who solve the problems. Companies can benefit from fast and low-cost solutions that are proposed from members of InnoCentive’s online community. Amazon’s Mechanical Turk (mturk.com) service helps organizations to crowdsource tasks with low cost to an online community of workers. Tasks that human beings can do more efficiently than computers (Brabham, 2013).

2.2.2 Benefits

There are several benefits of crowdsourcing. It simplifies accessing to talent that cannot be found in any other way, in other words crowdsourcing eases the procedure of finding individuals who have certain skills. It also makes it possible to find people who have ability to perform tasks that are difficult or impossible for machines to do.

It’s also possible to have different crowd groups to perform different tasks. It helps crowdsourcer to follow the trends in the market, and simplifies organisational processes. Different groups of crowdsourcers including small businesses, large businesses, non-profit organizations, scientists and researchers, artists and even single individuals can benefit from crowdsourcing (Grier et al., 2013).

(13)

7 2.2.3 Challenges

There are different problems regarding to crowdsourcing process which are addressed by scholars. All the people who are considered as the crowd in the crowdsourcing process have some kind of motivation for taking part in the process.

knowing the intentions of the crowd for participating in crowdsourcing can be challenging. It’s important for the crowdsourcer to know what motivates the crowd to participate before launching crowdsourcing. There might be different motivations, in most studies the common motivations are mentioned as follows: growing creative skills, improving resume for future employment, earning money, experiencing a challenge to find a solution for a problem, communicating with other creative individuals, finding new friends, keeping oneself busy when bored, or just having fun (Brabham, 2013).

The other difficulty about crowdsourcing can be legal issues. These legal issues can be caused by different reasons. One issue can be that in crowdsourcing process there isn’t any clear boundary between a professional individual and an amateur. In business context common concerns about crowdsourcing are copyright problems and intellectual property. For protecting both crowdsourcing organizations and crowd from any legal problems all the websites that contain user-generated content need to have terms of use, Digital Millennium Copyright Act (DMCA) statements, and all this statements and policies should be easy to find and understand. Crowdsourcer firm should always have clear rules for preventing the crowd from submitting content that is originated from another party (Brabham, 2013).

2.2.4 Crowdsourcing in scientific researches

Collecting large data sets in scientific researches can be sometimes challenging.

Crowdsourcing can be used as a data collection method in scientific research. Target groups for crowd in scientific researches can be non-professional individuals and also professionals and scientists.

The tasks or questions in scientific methods can be shared remotely through electronic channels like online communities and email. The collection of results can also be managed remotely (Buecheler et al., 2010). There are different research cases that utilized crowdsourcing for large data collection. Rudoy, Goldman, Shechtman

(14)

8

and Zelnik-Manor (Rudoy et al., 2012) proposed a method for crowdsourcing gaze data collection. In their study they used a method for acquisition of gaze direction data from a large number of crowd, using a mechanism which is self-reporting. They applied their technique to a data set of videos and demonstrated that outputs are similar to traditional gaze tracking. Eskenazi (2013) in crowdsourcing for speech processing discussed crowdsourcing applications in data collection for speech processing. In his book he described different web-based technologies for developing online applications to collect audio.

Leifman, Swedish, Roesch and Raskar (2015) proposed a technique for labelling medical images using crowdsourcing. In their approach they used a web application with two different user interfaces for different labeling tasks. They have two types of crowd, expert and crowd-workers. They also illustrated a validation approach designed to cope with noisy ground-truth data and with non-consistent input from both experts and crowdworkers (Leifman et sl., 2015).

2.3 Graph matching

Graphs as a data structure can widely be used for representing complex data like road networks or blood vessels. Graphs are being used in different fields including mathematics, pattern recognition, machine vision and geography. By representing data as graphs the task of finding similarities between two datasets can be converted to a problem called graph matching.

Graphs generally represented by two main elements: points that are called nodes and links which connect nodes that are called edges. Graph matching is the process of finding structural similarities between two graphs (Riesen and Bunke, 2009). Graph matching problem is a NP-hard problem, so the main focus of researches in graph matching is on optimizing the current algorithms to find more efficient algorithms in case of speed and accuracy (Caetano et al., 2009).

2.3.1 Graph matching techniques

Several different algorithms have been introduced to give approximate solutions for graph matching problem, and each of them has their own application, in continuation Some of these approaches are briefly explained.

(15)

9

Feature extraction and embedding is one approach used for graph matching. Cheong and his colleagues (Cheong et al., 2009) in Measuring the Similarity of Geometric Graphs use it to find similarities between geometric graphs in 2D space. They used this method for some shape matching problems like recognition of symmetries in molecules. In their technique they defined a heuristic distance function called landmark distance. Landmarks which are sets of nodes collected from each graph are used in landmark distance. Then, the Earth Mover’s distance is used to find similarities between two geometric graphs. Selection of landmarks for each graph is the main step of their technique. After all, their approach is not efficient and it cannot be used for matching graphs with different number of nodes (Armiti and Gertz, 2014;

Cheong et al., 2009).

Graph spectra is another approach which is commonly used in pattern recognition for graph matching. Umeyama (1988) illustrates an estimated solution for undirected and directed weighted graph matching problem using node to node adjacency matrix and the Hungarian algorithm. Cho (Cho et al., 2010) introduced an interpretation of random walk view graph matching algorithm. They did that by Introducing an association graph which includes nodes as candidate correspondences and edges as pairwise compatibilities between candidate correspondences. They utilized the Pagerank algorithm and Sinkhorn normalization. Their algorithm is not efficient and it’s memory consuming since the size of the adjacency matrix grows at a quartic rate with respect to the graph size (Armiti and Gertz, 2014; Umeyama, 1988; Cho et al., 2010).

Another approach which is used commonly in graph matching is Continuous optimization. Graph matching is inherently a discrete optimization problem, but using this technique it can be cast into continuous, nonlinear optimization problem.

After that an optimization algorithm should be found to solve this problem. One method which is based on this technique utilizes relaxation labeling. The main concept is that each node of one of the graphs can be labeled from a discrete set of labels, and that label defines correspondence of this node with a node of the other graph. There is a vector of the probabilities for each candidate label for each node which are computed based on node attributes, node connectivity and other available information. Armiti and Gertz (2013) introduced a probabilistic technique for graph matching which works by selecting the top-k similar pairs of nodes from both graphs

(16)

10

as seeds for the match, and after that, iteratively, they expanded the match using a probabilistic voting scheme. There is a limitation for this technique which is that the match between two nodes is estimated by only the similarity of their direct neighbors, and non-neighboring nodes are not taken into account (Armiti and Gertz, 2014; Conte et al., 2004).

Graph edit distance (GED) is another approach used for inexact graph matching. The inexact graph matching is based on finding a distortion or variation between two graphs, where an exact match may not be found. The main idea of GED is to search for the sequence of edit operations that transforms one graph to be the same as another graph with the minimum cost. Graph edit distance can be computed for both attributed and unattributed graphs, and directed and undirected graphs. On the other side GED is not so complicated to implement (Armiti and Gertz, 2013; Armiti and Gertz, 2014; Gao et al., 2010). GED will be explained in details in next section.

2.3.2 Graph Edit Distance

Graph edit distance (GED) is one of the most flexible and error tolerant techniques for graph matching. The main idea is to compute the total cost of the edit operations that are needed to transform one graph to be identical to another. GED has been extended from the string edit distance which is used to determine the similarity between two strings, and is calculated based on the minimum number of insertions, deletions, and substitutions required to transform one string into the other (Riesen, 2015; Ristad et al., 1998). In this research we represent graph as g = (V, E) where V is the set of nodes or vertices and E is the set of edges.

Let us assume two graphs g1 = (V1, E1) and g2 = (V2, E2), the idea is to transform g1

to g2 with a set of standard edit operations including insertions, deletions, and substitutions of both nodes and edges. Assuming ε refers to the empty node, let us denote the substitution of two nodes u ∈ V1 and v ∈ V2 by (u → v), the deletion of node u ∈ V1 by (u → ε), and the insertion of node v ∈ V2 by (ε → v). We use the same notations for edge edit operations (Riesen, 2015).

We can define an edit path λ(g1, g2) as a set of n edit operations {e1, . . . , en } that transforms g1 into g2. We can call a subset of λ a partial edit path. Since the edge edit operations are implied by node edit operations we can assume that edit path λ(g1, g2) contains only node edit operations (Riesen, 2015).

(17)

11

For example, let us define two undirected and labeled graphs g1 and g2 as shown in Figure 2.2:

g1 g2

Figure 2.2. Graphs g1 and g2.

The edit path between them can be defined as:

λ = {(u5 → ε), (u4 → ε), (u1 → v1), (u2 → v2), (u3 → v3)}.

(u5 → ε) (u4 → ε) (u1 → v1) (u2 → v2) (u3 → v3) Figure 2.3. The sequence of edit operations between g1 and g2.

Let us assume γ(g1, g2) denotes all complete edit paths between g1 and g2 , c denotes the cost function for each node operation ei and λmin denotes the edit path with minimum cost in γ, we can define graph edit distance as the following (Riesen, 2015):

d

_λ_min

(g

₁

, g

₂

)=min

_λ∈γ(g

1,g₂)

∑

e_i∈λ

c(

e_i

)

Equation 2.1. Graph edit distance

Cost function c defines the strength c(ei) of edit operation ei. Since there are infinite edit paths between two graphs, we need some conditions on cost function to limit the size of γ(g1, g2) to include a finite number of edit paths. These conditions are (Riesen, 2015):

(18)

12 - Non-negativity :

c(e) ≥ 0, for all node and edge edit operations e.

- Triangle inequality:

c(u → w) ≤ c(u → v) + c(v → w), c(u → ε) ≤ c(u → v) + c(v → ε), c(ε → v) ≤ c(ε → u) + c(u → v) - Symmetry:

c(e) = c(e⁻¹), (e⁻¹ denotes the inverse edit operation to e)

Considering two graphs g1 with n nodes and g2 with m nodes the number of possible edit paths between g1 and g2 is O(mⁿ). Defining a proper cost function is essential in graph matching algorithm which are based on edit distance technique (Riesen, 2015).

The graph edit distance is often computed by using a tree search algorithm which searches for all possible mappings of the edges and nodes between two graphs. Often A*-based search algorithm which is utilizing some heuristics are used in this technique (Riesen and Bunke, 2009; Riesen, 2015).

As mentioned in section 1.1, this research is based on collecting annotated images of blood vessels in human eye, and we represent these annotations by graphs, so graph edit distance will be used for evaluating the quality of annotations. Next chapter will describe the method used in this research.

(19)

13

3 FRAMEWORK

This chapter is focused on the method of the research and data collection part of the thesis and contains 4 sections. As mentioned before, in this research annotated blood vessels in eye are represented as graphs, and a graph edit distance is used to compare and evaluate the annotated images. In previous chapter we’ve reviewed the background of graph edit distance. The GED method used in this research is explained in section 3.1. Section 3.2 is focused on the original database of the images and selected sub database for this research. The selected participants which in context of crowdsourcing are called crowd are explained in 3.3. The technical information about the web application which is developed exclusively for this research is described in section 3.4.

3.1 Graph matching method

The blood vessel images which are annotated by the crowd provide us graphs which are unlabeled and undirected. Unlabeled means that the nodes are not represented by a label. The output graphs are also undirected. Instead we have a geometrical graph as output, which means each node has (x, y) attributes indicating the pixel coordinates of the node. The graph matching method used in this research is based on the technique proposed by Riesen and Bunke (Riesen and Bunke, 2009). This technique is based on Munkres’ algorithm for assignment problem and is explained in the following section.

3.1.1 Selected GED method

The basic idea of the assignment problem is finding an optimal assignment of the elements of two sets A and B, which have the same cardinality. Assuming A and B has the same size n, we can define n×n cost matrix c, and cij element of the matrix contains assignment cost of i-th element of A to the j-th element of B. The assignment problem can be defined as finding a permutation p = p1,...,pn that minimizes ∑ⁿ_i=1 C_ip

i. The assignment problem can also be defined as bipartite graph matching problem (Riesen and Bunke, 2009).

(20)

14

James Munkres (1957) introduces an algorithm which solves assignment problem in polynomial time. This algorithm is called Kuhn–Munkres algorithm or Munkres assignment algorithm. This algorithm could be found in Appendix 1.

Riesen and Bunke (Riesen and Bunke, 2009) proposed a cost matrix which makes it possible to use this method in graph edit distance.Assuming g1 = (V1, E1) with V1 = {u1,...,un} be the source graph and g2 = (V2, E2) with V2 = {v1,...,vm} be the target graph we can define the cost matrix c as follows :

Equation 3.1. Riesen & Bunke cost matrix

In this matrix cij corresponds to the cost of a node substitution c(ui→ 𝑣𝑖), cie

corresponds to the cost of a node deletion c(ui→ 𝜀), and cej corresponds to the costs of a node insertion c(𝜀 → 𝑢_𝑖) (Riesen and Bunke, 2009).

The rows of the matrix correspond to nodes of g1 and the columns of the matrix correspond to nodes of g2. As We can see in the matrix, the left upper corner of the matrix contains the costs of all possible node substitutions, the diagonal of the right upper corner contains the costs of all possible node deletions, and the diagonal of the bottom left corner corresponds to the costs of all possible node insertions.

Considering that each node can be deleted or inserted at most once, any non-diagonal element of the right-upper and left-lower part is set to ∞. The bottom right corner of the cost matrix is set to zero since substitutions (𝜀 → 𝜀) should not have any costs.

With this cost matrix we can use Munkres’ algorithm to compute the minimum cost of transforming graph g1 to graph g2 (Riesen and Bunke, 2009).

(21)

15

Since the annotated blood vessels are represented as undirected geometrical graphs, the cost matrix used in this experiment is Euclidean distance. Euclidean distance (mathworld.wolfram.com) measures the distance between two points p1 with coordinates (x1, y1) and p2 with coordinates (x2, y2) and it’s defined as:

d= √(x

₂

-x

₁

)

²

+(y

₂

-y

₁

)

²

Equation 3.2. Euclidean distance

3.2 Selecting images

The images for the experiment are collected from a database called the SPectral Eye vidEo Database (SPEED). This database has been collected as a part of doctoral dissertation done by Ana Gebejes in University of Eastern Finland. The study is focused on the possibility of using the SPEED database in multiple computer‐vision applications, such as spectral‐reflectance information‐based classification and segmentation, reflectance‐based object detection, illumination‐independent object tracking, and temporal spectral analysis, or other eye‐related areas of research like medicine, biometrics, and eye/vision‐based studies (Gebejes, 2017).

SPEED database is still growing, and it’s publicly is available to all researchers.

Images are collected using a FD spectral video and a Nuance EX (CRi, Inc., USA) Liquid Crystal Tunable Filter (LCTF). The database contains 30 fifty‐one‐channel spectral images, 60 seven‐channel spectral images, and 180 seven‐channel spectral videos of 30 voluntary subjects (Gebejes, 2017).

We’ve collected a subset of images form the SPEED as the experiment data in this research. The subset is extracted from seven-channel spectral videos, and final output is a set of RGB images. Since the original images contained extra areas around the eye, cropping needed to be done to extract the parts of sclera with most visible blood vessels. In the following, two images are taken as example to show the process for extracting the scleral areas:

Original image 1:

(22)

16

Figure 3.1. Original image

You can see the selected areas which are shown by red rectangles, and we can see the final images in Figure 3.2 and Figure 3.3:

Figure 3.2. Extracted image 1

(23)

17

And we can see the same process for another image:

(24)

18

A subset of 21 images have been extracted for the experiment. The images are selected based on difficulty and different positions of the pupil. In the web application, participants are asked to annotate the images. Images are shown in difficulty order, that means the images with blood vessels which are easier to

(25)

19

recognize and annotate are shown first and the more complicated images are shown in the last ones. The developed application is described in section 3.4.

3.3 Participants

Based on the purpose of the research and the challenges in annotation task, there are two main types of participants in this research, expert and crowd. And since these participants are defined as users in the designed web site, we will refer to them as users.

One expert user is defined in the system. We will consider the expert annotated images as perfect annotated images and they will be used for two purposes: images for annotation training, and ground-truth for evaluating crowd annotations. Crowd users are divided into two groups: trained and untrained. Trained users are a subset of users which will be trained by expert annotated images. There are totally 25 crowd users, 15 trained users and 10 untrained users. For trained users, the first three images will be used as training. Primary evaluation will be explained later.

Two different interfaces are designed for trained and untrained users in the website.

Users will be shown 21 images and they’re asked to annotate the longest most visible blood vessel in each image. Annotation here simply means drawing lines. For trained users the first three images are shown with expert annotations and they’re asked to just follow the lines drawn by the expert user. In the following one example of the original image and expert annotated image are shown:

(26)

20

Expert annotated image that is shown in the website for a trained crowd user:

Figure 3.8. Expert annotated image in the website (shown for trained user)

(27)

21

We can see the expert annotation as orange lines in Figure 3.8. The lines are basically drawn by connecting points which are drawn by left clicking on the image.

The application performs an evaluation in this stage, users are allowed to draw lines with limited number of points which are calculated from expert annotated image, therefore if the user annotates the image with different number of points the system will prevent saving annotation. In Figure 3.8 the number of allowed points is 6, and we can also see in the image that expert user is done the annotation task using 6 points. The same image is shown without expert annotation for untrained users. We can see the same image for untrained user in Figure 3.9. The orange lines which are expert annotations are not shown for untrained users.

Figure 3.9. Image shown for untrained user

Two different user manuals have been designed for each crowd user groups, manuals could be found in Appendix 2 and Appendix 3. The crowd users are selected arbitrarily from variety of available individuals: university students and teachers, colleagues, friends and family.

(28)

22

3.4 Web application

As mentioned earlier, as part of this thesis, a web application is developed for collecting annotated data, and it’s hosted in in Eastern Finland university servers and it could be found in this address: https://cs.uef.fi/VesselAnnotation. The website is implemented using python programming language and Django framework, and MySQL as database for storing data. The interface is mostly done using javascript and jQuery.

The application has three main parts: admin site, drawing page and the dashboard. In the admin site the users and groups are defined. There are 4 different user groups:

admin, expert, trained power users and untrained power users. Trained and untrained power users are the aforementioned crowd users of the experiment. There are different levels of permissions for the user groups, for example only the admin group has access to dashboard of the website.

3.4.1 Interface

The most important part of the website is the draw page, which shows the images to the users and enables them to perform the annotation task.

Figure 3.10. Draw page

(29)

23

As explained before users are asked to draw line on the most visible blood vessels on the images. At the top of each image the number of points that the user is allowed to use for drawing is shown, and the blue text indicates the number of points each user have already used for drawing. There are two modes for the image, draw and select mode, the user starts drawing lines using drawing mode and can delete the drawn point using select mode:

Figure 3.11. Selecting modes

Users can start drawing by left clicking on the image, once clicked a yellow point will be drawn on the image, and as the user moves the mouse a line will be drawn to the location of the mouse on the image

Figure 3.12. Drawing lines

The user can stop drawing by right clicking. Since the output of the images are graphs, all the drawn points should be connected together. Drawn points can be deleted in select mode. The full instructions for using the website in draw page is illustrated as user manuals in Appendix 2 for trained users and Appendix 3 for untrained users.

3.4.2 Backend

As the user clicks the save button, the application saves the annotation as a graph in the database. The data format for saving graphs is json, each graph is represented with an array of edges and each edge has start and end nodes. Each node is consisted of x and y coordinates which indicate the spatial coordinate of the drawn point. In the following we can see a sample annotated image and the saved json data:

(30)

24

Figure 3.13. Sample annotation

Saved json data:

1. {

2. "edges":[

3. {

4. "start":{

5. "x":228.5, 6. "y":332 7. },

8. "end":{

9. "x":232.5, 10. "y":287 11. }

12. }, 13. {

14. "start":{

15. "x":232.5, 16. "y":287 17. },

18. "end":{

19. "x":260.5, 20. "y":234 21. }

22. }, 23. {

24. "start":{

25. "x":260.5, 26. "y":234

(31)

25 27. },

28. "end":{

29. "x":264.5, 30. "y":99 31. }

32. }, 33. {

34. "start":{

35. "x":264.5, 36. "y":99 37. },

38. "end":{

39. "x":283.5, 40. "y":44 41. }

42. }, 43. {

44. "start":{

45. "x":283.5, 46. "y":44 47. },

48. "end":{

49. "x":316.5, 50. "y":19 51. }

52. } 53. ] 54. }

As we can see in Figure 3.13 and saved json data, the graph is formed from 5 edges.

The json data will be saved as text in a table in the database.

The database contains 10 tables, the most important ones are Images and ImageAnnotations tables which are shown in the partial database diagram in Figure 3.14:

(32)

26

Figure 3.14. Partial database diagram

The auth_user table contains the information about the users including expert and crowd users. Images which are used in experiment are saved in Images table with a given order for each image. The drawn graphs are saved as text in annotation_json field in ImageAnnotations table.

3.4.3 GED calculation

Calculating graph edit distance is one of the most important parts of the experiment.

In this research each graph drawn by trained/untrained users is compared to the graphs drawn by the expert user. This means that for each output graph, GED will be calculated between that graph and the graph which is drawn by expert user from the same image. The calculated distance is in pixels. As explained in section 3.1.1, for

(33)

27

calculating GED between two graphs, first a cost matrix is formed. We can see the python function for creating cost matrix in the following:

1. def create_cost_matrix(g1, g2):

2. n = g1.size() #number of nodes 3. m = g2.size() #number of nodes

4. cost_mat = [[0 for i in range(n + m)] for j in range(n+m)]

5.

6. nodes1 = g1.get_nodes() 7. nodes2 = g2.get_nodes() 8.

9. for i in range(n):

10. for j in range(m):

11. cost_mat[i][j]=

substitute_cost(nodes1[i], nodes2[j]) 12.

13. for i in range(m):

14. for j in range(m):

15. cost_mat[i + n][j] = float('inf') 16.

17. for i in range(n):

18. cost_mat[j][i + m] = float('inf') 19. return cost_mat

The function accepts two graphs with size n as parameters. The output is a matrix with 2n columns and 2n rows. As explained in section 3.3, users are allowed to draw graphs with exact number of points which is the number of points that the expert user is used for drawing the graph, so input graphs in this function have always the same number of nodes or same graph size. With the same reason the costs for insertion and deletion operations are also set to infinity, which basically means there are no insertion and deletion operations in this experiment. As a result, all elements in left upper corner and bottom right corner of the cost matrix are set to infinity (see section 3.1.1). The substitute_cost is calculated based on Euclidean distance, we can see the code in the following:

1. def substitute_cost(node1, node2):

2. if node1.equals(node2):

3. return 0

4. return euclidean_distance(node1, node2) 5.

6.

7. def euclidean_distance(node1, node2):

8. return math.sqrt((node1.x - node2.x)**2 + (node1.y - node2.y)**2)

(34)

28 Let’s take two graphs g1 and g2 as an example:

g1 g2 Figure 3.15. Sample graphs

Coordinates of nodes of these two graphs (without any special order) are:

g1 = {(12, 123), (89, 155), (174, 160), (235, 141), (172, 209)}

g2 = {(7, 120), (82, 156), (175, 155), (159, 203), (239, 152)}

We can see the created cost matrix of these two graphs in the following:

Equation 3.3. Calculated cost matrix

The final GED is calculated using a module which is provided by Brian M. Clapper (http://software.clapper.org/munkres/index.html) for implementing the Munkres algorithm. We can see the code for comparing graphs and calculating GED in the following:

(35)

29 1. def compareGraphs(g1, g2):

2. m = Munkres()

3. cost_matrix = create_cost_matrix(g1, g2) 4. index = m.compute(cost_matrix)

5. costs = [cost_matrix[i][j] for i, j in index]

6. distance = sum(costs) / g1.size() 7. return distance

As we can see in the code m is an instance of Munkres module, in line 4 m.compute(cost_matrix) returns a list of (row, column) tuples that describe the lowest cost path through the matrix. In line 5 an array of all minimum costs for each row of the matrix is created. And finally, in line 6 the GED is calculated. The division by g1.size() (number of nodes) is done to have a normalized GED that is comparable to other GEDs which are calculated for graphs with different sizes. The edit distance between g1 and g2 (the cost to transform g1 to g2) is equal to 8.8. This number is in pixels and it indicates the difference between these two graphs. The git repository for the python code of the aforementioned Munkres module could be found in https://github.com/bmc/munkres.

(36)

30

4 RESULTS

In this chapter the results of the research will be explained and analyzed. As explained in chapter 3, 25 (15 trained and 10 untrained) users annotated 21 images.

Since first 3 images were considered as training for trained users, they will be taken out from the results. It’s important to note that the GED for each user annotation is calculated by comparing the user annotation with expert annotation for the same image, in other words, GED is a relative value which indicates the difference between user annotation and expert annotation. Two images are taken as an example to compare trained and untrained annotations with expert annotation.

Image 1: Expert annotated image:

Figure 4.1. Original image 1 Figure 4.2. Expert annotation of image 1

We can see one sample untrained user annotation and one sample trained user annotation for the same image in Figure 4.3 and Figure 4.4:

(37)

31

Figure 4.3. Trained annotation of image 1 Figure 4.4. Untrained annotation of image 1

The calculated GED between expert annotated graph and the sample trained user annotation is 19.07.

The calculated GED for untrained user annotation is 80.19. We can see that the trained user has a better result for this image. Calculated GEDs of all trained users for this image is illustrated in a chart in Figure 4.5:

(38)

32

Calculated GEDs

Figure 4.5. Trained users GEDs of sample image 1

The calculated GED for each trained user comparing to expert user is shown by blue bars. The average GED related to trained users for this image is 39.74. The average is shown by a vertical green line in the chart. We can see in the chart that more than 50% of the trained users are performed better that the average.

Calculated GEDs of all untrained users for the same image are shown by blue bars in Figure 4.6:

Users

(39)

33

Calculated GEDs

Figure 4.6. Untrained users GEDs of sample image 1

The average GED related to untrained users for this image is 52.76. Only 4 untrained users performed worse than the average. For this image the trained users GEDs show an improvement of 32.77% comparing to untrained users.

We can see the same results for another image:

Figure 4.7. Original image 2 Figure 4.8. Expert annotation of image 2

Users

(40)

34 Calculated GEDs

Figure 4.9. Trained annotation of image 2 Figure 4.10. Untrained annotation of image 2

The calculated GED for trained user is 14.25 and for untrained user is 23.43. The numbers show a 64.47% improvement for trained user. We can see the GED charts for all trained and untrained users in Figure 4.11 and Figure 4.12:

Figure 4.11. Trained users GEDs of sample image 2

Users

(41)

35 Calculated GEDs

Figure 4.12. Untrained users GEDs of sample image 2

The average GED for trained users is 24.4 and for untrained users is 28.84. The numbers indicate a 18.19% improvement for trained users for this image.

In image 2, 73% of trained users and 70% percent of untrained users are performed better than the average GED for each group, also the average GED for both trained and untrained user groups are less than the same calculated average GEDs for image1. From the numbers it can be concluded that the image 2 was a simpler image to annotate for the users in both groups.

In both images there is at least one user in either groups which performed the annotation task with a great difference in calculated GED and the average GED of all users. For example, t_user_12 in trained users and u_user_4 in untrained users for image 1. This might be caused by different reasons like tiredness, lack of time or even lack of motive which has been mentioned in section 2.2.3. To have a more accurate dataset, the results from these users can be removed from final dataset.

We can see the average of GEDs of trained users for all images in Figure 4.13. The average GED is shown by blue bars for each image. Numbers in the vertical axis are the order of images which are used for showing images to users, for example 4 is the 4^th image which is shown to the users. As mentioned earlier in this chapter the first three images are taken out from the evaluation, so the numbers start from 4.

Users

(42)

36 Average GEDs Average GEDs

Figure 4.13. Trained users average GED for all images

The average calculated GED for trained users for all images is 39.83. Figure 4.14 illustrates the average of GEDs of trained users for all images.

Figure 4.14. Untrained users average GED for all images

Image numbers

(43)

37

The average calculated GED for untrained users for all images is 49.97. For all images, the trained users show an improvement of 17.92% comparing to untrained users, and we can conclude that the training process has a positive effect in the performance of the users. The average of GEDs from both groups for image 15 show a great difference from the total average, and it can indicate that the annotation task for this image was really challenging and it was a difficult image to annotate. The image 15 is shown in Figure 4.15.

Figure 4.15. Image number 15

The users were asked to annotate the image number 15 with 7 points. But there are many visible blood vessels in the image, and this makes this image really complicated to annotate, because the user is confused which vessel to choose for annotation.

(44)

38

5 CONCLUSIONS

In this thesis, first, the concept of crowdsourcing and graph matching has been reviewed, also the contribution of these two concepts in this thesis has been explained. There was a background check for both concepts, also describing the benefits and challenges of using them (chapter 2). Later the whole method of the thesis was explained. In this work, twenty-five random individuals(crowd) were asked to annotate blood vessels in 21 images of the scleral region of the human eye.

A web application is implemented as part of this thesis for collecting annotations.

There was 1 expert user in the system. Expert annotated images were considered as perfect annotated images and utilized as the ground-truth for the evaluation part of the thesis. Crowd users were divided into two groups: trained and untrained users.

There was a certain training process for trained users, they were asked to annotate the first three images just by following the lines which are already drawn by the expert user, later, these 3 images were taken out from the results. These blood vessel annotations were modeled as graphs (chapter 3).

An evaluation process was proposed in this study to measure the quality of crowd annotations by comparing them to expert annotated images and to indicate this difference by a numeric value. This evaluation process has been done by a Graph edit distance technique which is implemented in the web application. From the results it can be seen that the training process had a positive effect on the annotation task. In the result chapter (see chapter 4), the evaluation results of two sample images annotated by two sample users from each group were analyzed. We could see in both images that trained users had a better result. Later, the average GED of all users in each group were shown for each image (Figure 4.13, Figure 4.14), and this result also indicate an improvement for trained users. Trained users generally performed 17.92% better than untrained users and this shows that by a training process there could be an improvement in performance of the users, and as a result the collected annotation dataset could be more accurate.

5.1 Future work

There are some improvements that can be done in future works to obtain bigger datasets which contains more accurate results, also there could be some

(45)

39

improvements to increase the flexibility of the whole process. In this thesis I proposed a graph edit distance technique for obtaining the difference between expert user annotation and crowd user annotations. One limitation of this technique is that it only works for comparing graphs with the same size, with some enhancements we can have an algorithm which also could be used for graphs with different sizes.

There could be a live evaluation at the time that user is annotating the image, for example the GED could be calculated when the user is trying to save the image and if the result is bigger that a certain predefined number then the application prevents the user from saving the annotation and asks the user for trying a better annotation.

But this live evaluation might affect the performance of the application.

One other future work to focus on could be trying to have a bigger set of crowd users, or have a bigger vessel networks. For a big vessel network, one could define a procedure to divide image into multiple parts and annotate each part separately and then concatenate the annotated images to each other to obtain the original image.

In the results chapter we have seen that some images were challenging to annotate, this could be avoided by improving the training process or having more precise expert annotation, because for example in image 15 in chapter 4 we have seen that users were asked to annotate the image with only 7 points but there were more visible blood vessels to annotate and that confused the users. Also we have seen some users that performed really bad in most images and this could be caused by frustration, or lack of motivation, so having some motives or prizes for annotation tasks can enhance the general results.

(46)

40

R

EFERENCES

Armiti, A. and Gertz, M., 2013, November. Efficient geometric graph matching using vertex embedding. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 224-233). ACM.

Armiti, A. and Gertz, M., 2014, June. Geometric graph matching and similarity: a probabilistic approach. In Proceedings of the 26th International Conference on Scientific and Statistical Database Management (p. 27). ACM.

Brabham, D. C. 2013. Crowdsourcing. Cambridge, Massachusetts ; London, England: The MIT Press.

Buecheler, T., Sieg, J.H., Füchslin, R.M. and Pfeifer, R., 2010, August.

Crowdsourcing, Open Innovation and Collective Intelligence in the Scientific Method-A Research Agenda and Operational Framework. In ALIFE (pp. 679-686).

Caetano, T.S., McAuley, J.J., Cheng, L., Le, Q.V. and Smola, A.J., 2009. Learning graph matching. IEEE transactions on pattern analysis and machine intelligence, 31(6), pp.1048-1058.

Cheong, O., Gudmundsson, J., Kim, H.S., Schymura, D. and Stehn, F., 2009, June.

Measuring the similarity of geometric graphs. In International Symposium on Experimental Algorithms (pp. 101-112). Springer, Berlin, Heidelberg.

Cho, M., Lee, J. and Lee, K.M., 2010, September. Reweighted random walks for graph matching. In European conference on Computer vision (pp. 492-505).

Springer, Berlin, Heidelberg.

Doan, A., Ramakrishnan, R. and Halevy, A.Y., 2011. Crowdsourcing systems on the world-wide web. Communications of the ACM, 54(4), pp.86-96.

Eskenazi, M. 2013. Crowdsourcing for speech processing: Applications to data collection, transcription and assessment. Chichester: Wiley.

Gao, X., Xiao, B., Tao, D. and Li, X., 2010. A survey of graph edit distance. Pattern Analysis and applications, 13(1), pp.113-129.

(47)

41

Gary Heiting, O., 2018. Sclera (White of the Eye). [online] All About Vision.

Available at: https://www.allaboutvision.com/resources/sclera.htm [Accessed 8 Jul.

2018].

Gebejes, A. 2017. Spectral video: Application in human eye analysis and tracking.

Joensuu: University of Eastern Finland.

Grier, D. A. 2013. Crowdsourcing for dummies. Chichester [England]: Wiley.

Howe, J., 2010. crowdsourcing.com. [online] Crowdsourcing.com. Available at:

http://crowdsourcing.com/ [Accessed 8 Jul. 2018].

Kaplan M., 2015. The Secrets in Their Eyes: Transforming the Lives of People with Cognitive, Emotional, Learning, Or Movement Disorders Or Autism by Changing the Visual Software of the Brain. Jessica Kingsley Publishers. Chapter 1.

Leifman, G., Swedish, T., Roesch, K. and Raskar, R., 2015, August. Leveraging the crowd for annotation of retinal images. In Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE (pp. 7736- 7739). IEEE.

Munkres, J., 1957. Algorithms for the assignment and transportation problems. Journal of the society for industrial and applied mathematics, 5(1), pp.32- 38.

Riesen, K. and Bunke, H., 2009. Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision computing, 27(7), pp.950-959.

Riesen, K., 2015. Structural pattern recognition with graph edit distance. Advances in Computer Vision and Pattern Recognition. Springer, Cham.

Ristad, E.S. and Yianilos, P.N., 1998. Learning string-edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(5), pp.522-532.

Rudoy, D., Goldman, D.B., Shechtman, E. and Zelnik-Manor, L., 2012.

Crowdsourcing gaze data collection. arXiv preprint arXiv:1204.3367.

Schenk, E. and Guittard, C., 2011. Towards a characterization of crowdsourcing practices. Journal of Innovation Economics & Management, (1), pp.93-107.

(48)

42

Umeyama, S., 1988. An eigendecomposition approach to weighted graph matching problems. IEEE transactions on pattern analysis and machine intelligence, 10(5), pp.695-703.

Weisstein, Eric W. "Distance." From MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/Distance.html

Willoughby, C., Ponzin, D., Ferrari, S., Lobo, A., Landau, K. and Omidi, Y., 2010.

Anatomy and physiology of the human eye: effects of mucopolysaccharidoses disease on structure and function - a review. Clinical & Experimental Ophthalmology, 38, pp.2-11.

(49)

43

Appendix 1: Munkres’ algorithm for the assignment problem

Input: A cost matrix C with dimensionality n Output: The minimum cost node or edge assignment

1: For each row r in C, subtract its smallest element from every element in r

2: For each column c in C, subtract its smallest element from every element in c

3: For all zeros zi in C, mark zi with a star if there is no starred zero in its row or column

4: STEP 1:

5: for Each column containing a starred zero do cover 6: this column

7: end for

8: if n columns are covered then GOTO DONE else GOTO STEP 2

end if 9: STEP 2:

10: if C contains an uncovered zero then

11: Find an arbitrary uncovered zero Z0 and prime it 12: if There is no starred zero in the row of Z0

then

13: GOTO STEP 3 14: else

(50)

44

15: Cover this row, and uncover the column containing the starred zero GOTO STEP 2

16: end if 17: else

18: Save the smallest uncovered element emin GOTO STEP 4

19: end if

20: STEP 3: Construct a series S of alternating primed and starred zeros as follows:

21: Insert Z0 into S

22: while In the column of Z0 exists a starred zero Z1 do 23: Insert Z1 into S

24: Replace Z0 with the primed zero in the row of Z1. Insert Z0 into S

25: end while

26: Unstar each starred zero in S and replace all primes with stars. Erase all other primes and uncover every line in C GOTO STEP 1

27: STEP 4: Add emin to every element in covered rows and subtract it from every element in uncovered columns.

GOTO STEP 2

28: DONE: Assignment pairs are indicated by the positions of starred zeros in the cost matrix

(51)

45

Appendix 2: Trained user manual

Welcome!

In this application you’re asked to annotate the eye blood vessels in the images that are shown to you. Several images will be shown to you and in these images you should annotate the longest most visible blood vessel that you can identify. By annotating we simply mean drawing lines!

The first three image are for training, meaning that they are already annotated by an expert and you should only follow the lines that are drawn by the expert (orange lines). On the top of each image there is the number of points that are allowed to be used for drawing. For starting annotation please follow these steps:

Go to the DRAW page:

You will see the first image there, at the top the image you can see the number of points that you are allowed to use for drawing, and you can see the orange lines which are representing the drawn annotation of the expert user. The blue text indicates the number of point you have used for drawing so far.

Collecting annotated eye blood vessel images using crowdsourcing