• Ei tuloksia

Peer-to-Peer Networks

Peer-to-peer (P2P) systems differ from the traditional C/S and cloud comput-ing systems in that all the nodes can act both as servers and as clients. In [47], Schollmeier defines P2P networks as distributed systems, where the participants share a part of their resources with other peers in the network, and other par-ticipants access these resources directly, without passing intermediary entities.

These resources can be for example processing power, storage capacity or net-work bandwidth. Figure 1 presents the logical topologies of peer-to-peer and client/server architectures.

FIGURE 1 Peer-to-peer and client/server architectures

2.2.1 History of peer-to-peer networks

From the late 1960’s to the 1970’s the Defense Advanced Research Projects Agency of the United States Department of Defence, or DARPA, established a research project to develop a new kind of computer network. The design goals of the sys-tem were to develop a military network capable of functioning even if a large

18

number of the nodes or communication links of the network were lost either due to an attack on the infrastructure or for other reasons.

The resulting network was named ARPANET, which was the basis for the current internet, and to some extent it still defines the nature of the internet. In the 90’s the rapid growth of the internet and the arrival of the World Wide Web moved the internet towards a client/server architecture, where the network has a small number of dedicated servers and a large number of clients using services from the servers. These clients become second-class citizens of the internet, as they were usually connecting to the internet through slow and firewalled modem connections, and only being assigned a temporary, dynamic IP address. This cre-ated a very clear distinction between the servers and the clients. This trend has reverted a little since the turn of the millennium because of peer-to-peer networks and faster internet connections with static IP addresses emerging. The first P2P networks at the turn of the millennium were used mostly for sharing copyright protected files, like music, but since then a lot of legitimate uses for P2P networks have also emerged, for example the Skype Voice-over-IP telephony application and the Bitcoin P2P currency system. Even the DARPA is again using P2P net-works on the battlefield, thus closing the circle [4].

2.2.2 Hybrid Peer-to-Peer Networks

P2P networks come in two main flavors, hybrid and pure P2P networks [47]. Hy-brid P2P architecture, as the name implies, is a hyHy-brid of both client/server and pure peer-to-peer models, trying to combine the best parts of both models. In hybrid P2P networks the network is managed by a server which holds a database of resources held by the network nodes. The network nodes connect to the server and report their resources to the database. The nodes query the server for re-sources, and the server gives a reply containing information about nodes holding the queried resource. After receiving the reply, a node can then establish direct connection to the node(s) holding the resource and request the resource.

The best known hybrid P2P network was the original Napster network, but as the majority of files shared over the network were copyrighted material, and Napster operated without permission from the copyright holders, Napster was quickly sued for facilitating copyright infringement and later the network was shut down by US authorities. This case made the drawbacks of the hybrid model very clear, as it was very easy for the authorities to shut down the network by un-plugging the servers organizing the search. It could be argued that in the Napster case there was a moral justification for shutting down the network, but in several cases, for example military networks or networks used by dissidents in countries with no free speech rights, there might be a third party trying to actively shut down the network without such justification. This has been the case in China, and lately in Iran, where the authorities have been actively trying to shut down P2P communication networks used by political pro-democracy activists.

2.2.3 Pure Peer-to-Peer Networks

Pure peer-to-peer networks on the other hand drop the server altogether and run completely on nodes with equal rights and responsibilities. The nodes of the net-work do not have predetermined roles such as servers or clients, but are in an equal position to other nodes in the network and can take different roles based on the requirements of the network. The nodes of the network, called peers, are usually connected to a few other peers, most commonly using TCP connections.

The connections form an overlay network topology on top of the physical net-work connecting the nodes [40]. Nodes of the netnet-work can forward messages between other nodes, and can make resources available to other nodes of the net-work. The resources can be for example files, computing capacity, bandwidth, storage space, location data, etc. If a node is looking to use a resource from the network, it can act as a client and send a request to its neighbor nodes, which can then again act as routers and forward the request further, or act as servers and send a reply to the requesting client.

Oram [40] lists several benefits of pure P2P networks, some of which hold true also for hybrid networks. The most prominent one being resiliency to at-tacks and network failures. Due to the distributed nature of the network, there is no single point of failure, and the tasks of failed nodes can be delegated to other nodes of the network. P2P networks are also inherently scalable. As more peers join the network, the new peers provide more computing capacity and band-width to the network without the need to install more servers. This also lowers the hardware costs associated with setting up the service, as no expensive servers or datacenter space are needed.

As opposed to traditional client/server network architectures, P2P networks do not have a single point of authority, and while this makes P2P networks ro-bust, it is also the source of many problems in using these networks. As no party in the network has a global view of the network topology, or of the location of resources in the network, discovering resources and routing messages in the net-work becomes problematic. Commonly in pure peer-to-peer netnet-works the nodes only have knowledge of their neighbor nodes, i.e. the nodes that they have es-tablished connections to. To find a resource the node has to send a query to its neighbors, which then forward the query to their neighbors according to the re-source discovery algorithm the node is utilizing.

As there is no central authority in the network, it is complicated for the net-work nodes to find out whether the information they are receiving from other nodes of the network is trustworthy. If trust issues are not taken care of, rogue nodes in the network can intentionally reply to resource requests with corrupted data or otherwise send misleading and erroneous control messages hampering the functionality of the network. Rogue nodes in the network can be battled with several techniques, for example a web of trustdesign, where a user designates his friends to be trusted, who then again designate their friends as being trusted.

This, combined with the removal of ousted rogue nodes and the nodes on the path of trust to the rogue node, makes the network very difficult for rogue nodes

20

to infiltrate. This technique does require the network users to actually know each other, which usually is not the case in P2P networks. To eliminate this require-ment, a large number of architectures which automatically rate the nodes for their trustability have been suggested [35].

Joining pure P2P networks also presents challenges. When a node is outside the network, it has no knowledge of other nodes of the network, and thus an ex-ternal source of node names is required. Research for solving the joining problem has been carried out in our research group [56].

Significant research effort has been invested into solving these problems, and several routing and resource discovery algorithms have been suggested in literature. These algorithms are discussed in more detail in Chapter 3.