Large Random Systems

(1)

Large Random Systems

Antti Kemppainen and Kalle Kyt¨ ol¨ a

1

(2)

(3)

One of the principal aims of the course is to learn to apply probability theory to interesting probabilistic models. We thus assume familiarity with measure theo- retic probability as well as various undergraduate level topics in mathematics. To facilitate the study of the main content of the lectures, we nevertheless also recall some of the needed background in the Appendices. Our preference is to include a large number of different models in the lectures, but therefore none of the models can be studied in great depth. We devote no more than two lectures to any given model, and we can therefore only establish some basic results about each model. We refer the interested reader to more specialized texts about further results. Besides treating specific models, the course contains some development of general theory, in particular related to weak convergence and tightness.

The notes are still in a very preliminary and incomplete form, and it is our goal to gradually improve and extend them. The notes will in particular be frequently up- dated during the current course. Please send all feedback about mistakes, misprints, needs for clarification etc. to Kalle Kyt¨ol¨a (kalle.kytola@aalto.fi).

(7)

Lecture I

Introduction

In this introductory lecture we will discuss two examples:

• Random permutations: different applications

• Random walk: recurrence and transience depending on dimension The two examples are treated in different fashion.

For random permutations the objective is not to prove any theorems, but rather to illustrate how such apparently very simple random objects are relevant to modeling various interesting phenomena. Along as we describe the modeling, we also state a few known mathematical results relevant to the analysis of the models.

For random walks, instead, our focus is on proving P´olya’s theorem: a core result which shows a qualitative difference in the long time behavior properties of the random walk depending on the dimensionality of the space. Some applications of random walks and P´olya’s theorem will be encountered later on in this course, and the reader can probably come up with many other applications.

The examples in this lecture are thus intended as introductions to the topic of large random systems from the perspectives of modeling and of mathematical analysis.

1. Random permutations as large random systems

Let us first discuss random permutations, and explore the variety of situations in which they may arise.

As a preparation, we recall some facts about permutations, and fix some notation.

A permutation of a set X is a bijective function σ: X → X. The set of permutations of X is denoted by S(X), and it naturally has the structure of a group: the group operation is the composition of functions. We are commonly interested in permutations of a finite set X, and if the number of elements of that set is n, it is conventional to choose X = {1,2, . . . , n} for simplicity. For this case, we use the special notation

S_n = n

σ: {1, . . . , n} → {1, . . . , n} bijective o

. (I.1)

The groupS_n is called the symmetric group onn symbols. It is a finite group: the number of different permutations ofn symbols is

#S_n =n! =n·(n−1)· · ·2·1. (I.2) SinceS_nis a finite group, there is one particularly natural probability measure on it:

the uniform measure, which associates probability _n!¹ to each permutation σ ∈Sn. Ifσ is a random variable with values in Sn, whose law is this uniform measure on Sn, we say that σ is a uniform random permutation of n symbols.

1

(8)

To get a feeling for uniform random permutations, here are two problems that the reader should try to solve.

Exercise I.1(Cycle decomposition of a random permutation).

Letσbe a uniformly distributed random permutation of the set{1,2, . . . , n}. Compute the following quantities about its cycle decomposition.¹

(a) LetL be the length of the cycle that contains the element 1. What is the distribution ofL, i.e. probabilitiesP[L=`]? Calculate alsoE[L].

(b) LetS be the number of cycles in the cycle decomposition. CalculateE[S].

(c) What is the probability that elements 1 and 2 belong to the same cycle?

Exercise I.2(Fixed point in a random permutation).

(a) LetE₁, . . . , E_n⊂Ω be events. Prove the inclusion-exclusion formula:

Ph[ⁿ

j=1

Ej

i

= − X

J⊂{1,...,n}

J6=∅

(−1)^#J Ph \

j∈J

Ej

i

= X

1≤j1≤n

P[Ej₁]− X

1≤j1<j2≤n

P[Ej₁∩Ej₂] + X

1≤j1<j2<j3≤n

P[Ej₁∩Ej₂∩Ej₃]− · · · .

(b) What is the probability that a uniformly distributed random permutation of the set {1,2, . . . , n} has a fixed point, i.e., a cycle of length 1? Compute the limit of this probability asn→ ∞.

Hint: In part (a), you may want to use indicator random variables and consider the complementary event. In part (b), setEj={the pointj is a fixed point}.

1.1. Sorting algorithms

A common task in computer programming is to sort a list ofn elements, often for n very large. Computer scientists know various sorting algorithms for this purpose, among them, e.g., “quicksort”, “mergesort”, “insertion sort”, etc. Roughly speaking, a sorting algorithm is a procedure which makes pairwise comparisons between the order of elements in the list, and then makes rearrangements of the order of the list according to the results of the comparisons, so that eventually the list becomes completely sorted, i.e., its elements appear in an increasing order.

The effectiveness of an algorithm is measured by its use of computational resources, mainly by the amount of processor time needed before the algorithm outputs a final sorted list (one could also consider other aspects such as memory requirement). For sorting algorithms, the required processor time is (usually) well approximated by the number C of pairwise comparisons that were needed, see, e.g., Figure I.1(b).

The number C of comparisons depends on the input, i.e. the original list provided to the algorithm, which was to be sorted. In the absence of any further informa- tion about the input (or simply to analyze the algorithm on typical inputs), it is reasonable to assume that the input list is equally likely to be in any possible order — thus represented by a uniform random permutation of n elements. We thus model the input as being random. Although there may be nothing random about

1Recall: A permutation can be written as a composition of disjoint cycles so that each element appears in exactly one cycle, and up to the order of cycles this cycle decomposition is unique.

(9)

1. RANDOM PERMUTATIONS AS LARGE RANDOM SYSTEMS 3

(a) A good measure of the efficiency of the quicksort algorithm is the expected number m_n of needed pairwise comparisons as a function of the number n of elements to be sorted. Quicksort’s behavior mn ∼ 2nlog(n) is nearly optimal for sorting algorithms, and it is in fact not easy to distinguish the plot above from linear growth.

(b) The number of pairwise comparisons probes the actual wall clock running time of the sorting algorithm well. Shown here is data from 1000 quicksorts of lists ofn= 4000 elements. An es- sentially linear relationship between number of comparisons and computation time can be seen (the outliers above the main cloud are probably due to other tasks using processor time besides just the sorting).

(c) The number of needed pairwise comparisons is quite well concentrated around its expected value mn, and has a slightly skewed distribution. Shown here is a histogram of the number of pairwise comparisons from 40000 quicksorts of lists ofn= 500 elements. The expected number of comparisons ismn ≈4806 and the stan- dard deviation is √

vn ≈324.

Figure I.1. Visualizatons of quicksort performance.

the behavior of the algorithm for a given input, for example the required number of comparisons needed depends on the random input, and as such becomes random, see, e.g., Figure I.1(c).

To give some concreteness to the above discussion of sorting algorithm performance, in Example I.1 below we briefly consider the average case performance of a widely used Quicksort algorithm. The interested reader will find more details both about sorting algorithms in general and also specifically on the analysis of quicksort in the excellent book [Knu97].

(10)

Example I.1(Quicksort).

Quicksort is a recursive algorithm, the simplest variant of which is informally described as follows:

• The input to the algorithmQuickSortis a list of elementsa= (a1, . . . , an) in some set Xwith a total order relation≤.

• If the input list contains no more than one element, then just output the list itself.

Specifically, for a one element list (a₁) returnQuickSort(a₁) := (a₁), and for an empty list∅ returnQuickSort(∅) :=∅.

• Otherwise the input list contains more than one element. Then choose one element a_r from the list, and compare it to all other elements: the other elementsa_j with a_j ≤a_r form a lista⁻ of length at mostn−1, and those witha_j > a_r form another list a⁺ of length at mostn−1.

• Apply quicksort to the (shorter) sublistsa⁻anda⁺, to get the sorted sublistsQuickSort(a⁻) andQuickSort(a⁺).

• Output the list constructed from the sorted sublista⁻, the elementar, and the sorted sublista⁻

QuickSort(a) := QuickSort(a⁻);a_r;QuickSort(a⁺) .

As the comparison element a_r one could take the first elementa₁, but it is often safer to actually take a randomly chosen element of the list.

Consider the performance of the quicksort algorithm above for an input of nelements in a uniformly random order, with the simplifying assumption that the list can not contain two equal elements. Denote the random number of comparisons by Cn. We can first of all ask about the expected number of comparisons needed,

m_n=E[C_n],

which represents the average case performance of quicksort. The behavior ofmnas a function of the input sizen is illustrated in Figure I.1(a). It is also important to know how big are the random fluctuations around the average case, and for this purpose one can compute the variance

vn=Var(Cn).

The numbersmnandvncan be calculated as solutions to recursive equations. The recursion formn, for example, is the following. To sort a list ofnelements, the algorithm first needs to compare one chosen element ar to n−1 other elements. Then we obtain two sublists, of lengths k and n−1−k, wherek ∈ {0,1, . . . , n−1} is uniformly random, because the comparison element was the k+ 1:st smallest with probability _n¹. The total number of comparisons isn−1 plus the number of comparisons needed to sort the sublists, and the expected value is

m_n=n−1 + 1 n

n−1

X

k=0

m_k+m_n−1−k

. (I.3)

With initial conditions m₀ = 0 and m₁ = 0, the solution to this recursion is (verify by yourself)

mn= 2(n+ 1)

n

X

j=1

1

j −4n, (I.4)

whose asymptotic behavior for large nis

mn ∼2nlog(n). (I.5)

Figure I.1(a) contains a plot ofmn as a function of n.

By recursive analysis one can also show that that variance is asymptotically v_n∼c×n², wherec≈0.4202.

In particular, for largen, the typical random fluctuations ofCn around the expected value mn ∼ const.×nlog(n) are on a scale √

vn ∼ const.×n. Since √

vn/mn → 0, the random number of comparisons Cn is well concentrated around the average case value mn. Figure I.1(c) illustrates the distribution ofCn.

(11)

1. RANDOM PERMUTATIONS AS LARGE RANDOM SYSTEMS 5 Exercise I.3(Expected number of comparisons for quicksort exactly).

Check thatmn given by (I.4) satisfies the recursion (I.3)

Exercise I.4(Expected number of comparisons for quicksort asymptotically).

Check that the asymptotic behavior of mn given by (I.4) is as stated in (I.5), or more precisely, show that

n→∞lim m_n nlog(n) = 2.

1.2. Interface in a disordered material

As our second example of random permutations in interesting applications, we will discuss a model of disordered material, with an interface in it.

The model we will consider is visualized in Figure I.2, and is formally described as follows. Consider the unit squareS = [0,1]×[0,1] in the plane. Let Z_j = (X_j, Y_j), j = 1, . . . , n, be n independent random points uniformly distributed in the square S. Our object of interest will be a certain directed path γ from the bottom left corner (0,0) ∈ S to the top right corner (1,1) ∈ S whose both coordinates are non-decreasing functions, i.e., the direction of the path is restricted to the North- East quadrant. We want this pathγ to go through as many of the random points Z₁, . . . , Z_n∈S as possible. There may not be a unique such path, but the maximal number`of points on any such path is well defined, given the locations of the points.

Optimal pathsγ in various samples of random points are drawn in the illustrations of Figure I.2.

This is a simple model used in disordered materials physics. The easiest interpretation is that the square S represents a piece of some material, and the the random pointsZ₁, . . . , Z_n∈S represent impurities in this material. The material is consid- ered to be weaker at the locations of the impurities, so it is easier to break at the locations of the impurities: the energy needed to break the material along a pathγ could have a negative contribution from each of the impurities on γ, so that it re- quires less energy to break along paths with many impurities. The optimal interface γ is then the fracture that will actually be formed when the material is torn apart, and the number of impurities on it is related to the amount of energy needed to tear the material. Alternatively the interface could be modeling a directed polymer in a disordered environment or a domain wall between two phases that are enforced by boundary conditions. The reader can find more about modeling disordered materials and interfaces in them for example in [KZ87].

The maximal number of points that the directed path can pass through depends on the randomly chosen pointsZ1, . . . , Zn, and is therefore itself a random variable, which we denote byL_n, to emphasize the underlying choice of n random points. In the disordered materials interpretation this random number represents the energy advantage that the interface can achieve by making use of the randomly located impurities. We will next argue that L_n arises in a simple way from random permutations.

Fix the random points Z₁ = (X₁, Y₁), . . . , Z_n = (X_n, Y_n). It is convenient to order these points by the values of theirx-coordinates, so let us agree to relabel them so that 0 < X1 < X2 < · · · < Xn < 1 (note that the x-coordinate values are almost surely different, so no equalities arise with probability one). Then there is no reason for the y-coordinates to be in any specific order, and instead the rearrangement

Large Random Systems