• Ei tuloksia

2 Clustering

2.3 Distance

Distance is a very core thing when performing Geolocation clustering. Geolocation coordinates are commonly present in WGS84 format i.e., latitude and longitude. The unit for the WGS84 system is degree (º). The result of converting 1º latitude to kilo-meters depends on how far that latitude is from north or south. On the other hand, we have Finland uniform coordinate system also known as KKJ which deals with the GPS values in eastings and northings. The unit for easting and northing is measured in me-ters (m).

2.3.1 Bird distance

Bird distance also known as the Great Circle distance, is an angled distance from point A to point B. The haversine formula is used for calculating the great circle distance.

Bird distance suits well with the WGS84 coordinates because we can project them on the map i.e., Google maps. The first parameter of the point is the latitude, and the second parameter of the point is considered as the longitude of the given location. As the Earth is nearly spherical it is meaningful to project the points on a map with a very less error rate of 1% on average providing a good approximation of the distance be-tween two points.

2.3.2 Euclidean distance

The STEMI dataset contains geolocations in the WGS84 coordinate system that are passed to the clustering algorithm for clustering. Euclidean distance is meaningless if we pass WGS84 coordinates i.e., latitude and longitude. To use Euclidean distance, the WGS84 coordinates should be converted to KKJ format. KKJ is a Finland uniform coordinate system. In KKJ format the coordinates are labeled as easting and northing.

Figure 3. KKJ Finland Uniform Coordinate System.

Figure 3 shows the projection of KKJ coordinates on a map. Northing and easting increase from bottom to top and left to right respectively. Northing and easting are measured in meters and make sense when passed in the Euclidean distance formula.

Performing clustering with KKJ coordinates and using distance function as Euclidean can give good results and also helps in projecting KKJ coordinates on a map.

2.3.3 Road distance

Road distance can also be used as a distance function for clustering but when working with a large number of datasets there are limitations like time and performance. It’s possible to incorporate a real road network into the clustering algorithm but calculating distance and time from each point to each centroid where our dataset contains thou-sands of locations it’s very time-consuming. It could take days or months to perform a single experiment. The real road network has its perks like traffic and other road obstacles are taken into account and we get the most accurate results but it’s not

real-Figure 4. Road distance and time retrieved by Google Maps between two locations.

Figure 4, the route is created from Joensuu to Kuopio using Google Maps navigation.

It provides us the shortest route between two locations by taking traffic, road obstacles, accidents, construction, and speed limits into account. As we can see, there are two routes and Google Maps has already chosen the shortest route which will save us an extra 50 km of traveling. These results are always realistic, and most people rely on navigation maps nowadays when traveling.

2.3.4 Travel estimates

To run the optimization with real-time and accurate road network results, we propose to use an overhead graph [Mariescu-Istodor and Fränti, 2021] to estimate the travel distance and travel time between two locations.

The optimization is based on clustering and uses an overhead graph to obtain fast and accurate navigation calculations. The overhead graph always returns the fastest route from point A to point B. The Distance Matrix API is used for calculating the distance and time. Overhead graph size depends on the nodes, for this research a graph of size 512 nodes is used. The overhead graph first creates the bird distance from point A to point B and then it is scaled to the overhead of the nearest nodes resulting in the true road network path. Node sizes vary from 2 to 1024.

Figure 5. Diagram of the proposed method.

Figure 5, the pre-processing step produces the overhead graph needed for the actual optimization. The Euclidean distance is used when performing clustering usually, however, this is not an ideal choice when moving in the real world. This happens for many reasons such as road curvature, different speed limits, and topography. The over-head graph uses the distance matrix API and computes all pairwise overover-heads. The travel distance and travel time are then calculated. The results are faster and much reliable except the road traffic is not taken into the account with this overhead graph approach.

Figure 6. Four overhead graphs with varying node sizes.

Figure 6 shows four example overhead graphs with varying sizes computed based on the patient locations (original dataset) in Finland. In this way, we can estimate travel times in constant O (1) time. We proceeded with the 512 nodes graph size for our research.

Travel distance and travel time when using an overhead graph of node size 512 are beneficial because the distance and time values are very accurate and similar to real road network values. Comparing our optimization results in the end by using real road network values was done for double verification that the algorithm is implemented correctly and does not contain any kind of bugs.

Figure 7. Travel distance and time estimated using the overhead graph (road path given for reference)

In figure 7, the route is generated using an overhead graph from Joensuu to Kuopio.

We can see that the results are much similar and contain a very small error rate. The black route is retrieved by an overhead graph that contains navigation instructions. The blue route is the bird distance from Joensuu to Kuopio.

LIITTYVÄT TIEDOSTOT