• Ei tuloksia

Autumn2012 Lecture2:MathematicalPreliminariesJyrkiKivinen Information-TheoreticModeling

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Autumn2012 Lecture2:MathematicalPreliminariesJyrkiKivinen Information-TheoreticModeling"

Copied!
103
0
0

Kokoteksti

(1)

Outline Calculus Probability Inequalities

Information-Theoretic Modeling

Lecture 2: Mathematical Preliminaries

Jyrki Kivinen

Department of Computer Science, University of Helsinki

Autumn 2012

(2)

Outline Calculus Probability Inequalities

Lecture 2: Mathematical Preliminaries

(3)

Outline Calculus Probability Inequalities

1 Calculus

Limits and Convergence Convexity

2 Probability

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

3 Inequalities

Jensen’s Inequality Gibbs’s Inequality

(4)

Outline Calculus Probability Inequalities

1 Calculus

Limits and Convergence Convexity

2 Probability

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

3 Inequalities

Jensen’s Inequality Gibbs’s Inequality

(5)

Outline Calculus Probability Inequalities

1 Calculus

Limits and Convergence Convexity

2 Probability

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

3 Inequalities

Jensen’s Inequality Gibbs’s Inequality

(6)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

1 Calculus

Limits and Convergence Convexity

2 Probability

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

3 Inequalities

Jensen’s Inequality Gibbs’s Inequality

(7)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Exponent Function

-10 -5 0 5 10

-10 -5 0 5 10

exp x

Exponent function exp : R→ R+, expk =ek =

k

z }| { e×e×. . .×e:

multiplicative growth (nuclear reaction, “interest on interest”, ...)

expx·expy = exp(x+y) Derivative dexpx

dx = expx.

(8)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Exponent Function

-10 -5 0 5 10

-10 -5 0 5 10

exp x

Exponent function exp : R→ R+, expk =ek =

k

z }| { e×e×. . .×e:

multiplicative growth (nuclear reaction, “interest on interest”, ...) expx·expy = exp(x+y)

Derivative dexpx

dx = expx.

(9)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Exponent Function

-10 -5 0 5 10

-10 -5 0 5 10

exp x

Exponent function exp : R→ R+, expk =ek =

k

z }| { e×e×. . .×e:

multiplicative growth (nuclear reaction, “interest on interest”, ...) expx·expy = exp(x+y) Derivative dexpx

dx = expx.

(10)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Examples: Logarithm

-10 -5 0 5 10

-10 -5 0 5 10

exp x ln x

Natural logarithm ln : R+→R, ln expx =x:

time to grow tox, number of digits (log10).

General (basea) logarithm, logaax =x: logax= 1 lnalnx

(11)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Examples: Logarithm

-10 -5 0 5 10

-10 -5 0 5 10

exp x ln x

Natural logarithm ln : R+→R, ln expx =x:

time to grow tox, number of digits (log10).

General (basea) logarithm, logaax =x: logax= 1 lnalnx

(12)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Logarithm Function

-10 -5 0 5 10

-10 -5 0 5 10

exp x ln x

lnxy = lnx+ lny

lnxr =rlnx ln 1

x =−lnx lnx

y = lnx−lny lnx≤x−1 with equality if and only ifx = 1

(NB: doesn’t work with logax ifa6=e)

dlnx dx = 1

x

(13)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Logarithm Function

-10 -5 0 5 10

-10 -5 0 5 10

exp x ln x

lnxy = lnx+ lny lnxr =rlnx

ln 1

x =−lnx lnx

y = lnx−lny lnx≤x−1 with equality if and only ifx = 1

(NB: doesn’t work with logax ifa6=e)

dlnx dx = 1

x

(14)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Logarithm Function

-10 -5 0 5 10

-10 -5 0 5 10

exp x ln x

lnxy = lnx+ lny lnxr =rlnx ln 1

x =−lnx

lnx

y = lnx−lny lnx≤x−1 with equality if and only ifx = 1

(NB: doesn’t work with logax ifa6=e)

dlnx dx = 1

x

(15)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Logarithm Function

-10 -5 0 5 10

-10 -5 0 5 10

exp x ln x

lnxy = lnx+ lny lnxr =rlnx ln 1

x =−lnx lnx

y = lnx−lny

lnx≤x−1 with equality if and only ifx = 1 (NB: doesn’t work with logax ifa6=e)

dlnx dx = 1

x

(16)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Logarithm Function

-10 -5 0 5 10

-10 -5 0 5 10

exp x ln x x-1

lnxy = lnx+ lny lnxr =rlnx ln 1

x =−lnx lnx

y = lnx−lny lnx≤x−1 with equality if and only ifx = 1

(NB: doesn’t work with logax ifa6=e)

dlnx dx = 1

x

(17)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Logarithm Function

-10 -5 0 5 10

-10 -5 0 5 10

exp x ln x x-1

lnxy = lnx+ lny lnxr =rlnx ln 1

x =−lnx lnx

y = lnx−lny lnx≤x−1 with equality if and only ifx = 1

(NB: doesn’t work with logax ifa6=e)

dlnx dx = 1

x

(18)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Limits and Convergence

A sequence of values (xi : i ∈N) convergesto limit L, limi→∞xi =L, iff for any >0 there exists a numberN ∈N such that

|xi −L|< for all i ≥N .

f(x) has alimit L asx approaches c, limx→cf(x) =L, (from above c+/belowc) iff for any >0 there exists a number δ >0 such that

|f(x)−L|< for all





c <x <c+δ ‘above’ c−δ <x <c ’below’ 0<|x−c|< δ —

(19)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Limits and Convergence

A sequence of values (xi : i ∈N) convergesto limit L, limi→∞xi =L, iff for any >0 there exists a numberN ∈N such that

|xi −L|< for all i ≥N .

f(x) has alimit L asx approaches c, limx→cf(x) =L, (from abovec+/belowc) iff for any >0 there exists a number δ >0 such that

|f(x)−L|< for all





c <x <c+δ ‘above’

c−δ <x <c ’below’

0<|x−c|< δ —

(20)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Example: Logarithm Again

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1

0 0.2 0.4 0.6 0.8 1 x ln x

Even thoughxlnx is undefined atx = 0, we have(by l’Hˆopital’s rule):

x→0lim+xlnx = 0 .

(21)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Convexity

Functionf : X →Ris said to be convexiff for any x,y ∈ X and any 0≤t≤1 we have

f(tx+ (1−t)y)≤tf(x) + (1−t)f(y) .

0 2 4 6 8 10

-4 -2 0 2 4

exp x

Function f isstrictly convexiff the above inequality holds strictly (‘<’ instead of ‘≤’) when 0<t <1.

Function f is (strictly)concaveiff the above holds for −f.

(22)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Convexity

Functionf : X →Ris said to be convexiff for any x,y ∈ X and any 0≤t≤1 we have

f(tx+ (1−t)y)≤tf(x) + (1−t)f(y) .

0 2 4 6 8 10

-4 -2 0 2 4

exp x

Functionf is strictly convexiff the above inequality holds strictly (‘<’ instead of ‘≤’) when 0<t<1.

Function f is (strictly)concaveiff the above holds for −f.

(23)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Convexity

Functionf : X →Ris said to be convexiff for any x,y ∈ X and any 0≤t≤1 we have

f(tx+ (1−t)y)≤tf(x) + (1−t)f(y) .

0 2 4 6 8 10

-4 -2 0 2 4

exp x

Functionf is strictly convexiff the above inequality holds strictly (‘<’ instead of ‘≤’) when 0<t<1.

Functionf is (strictly) concaveiff the above holds for −f.

(24)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Convexity and Derivatives

Theorem

If functionf has a second derivativef00, andf00 is non-negative (≥0) for all x, then f is convex. Iff00 is positive (>0) for all x, thenf is strictlyconvex.

0 2 4 6 8 10

-4 -2 0 2 4

exp x

ex is convex!

Example: f0(x) = dexpx

dx = expx ⇒f00(x) = expx >0. Hence exp is strictly convex.

(25)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Convexity and Derivatives

Theorem

If functionf has a second derivativef00, andf00 is non-negative (≥0) for all x, then f is convex. Iff00 is positive (>0) for all x, thenf is strictlyconvex.

0 2 4 6 8 10

-4 -2 0 2 4

exp x

ex is convex!

Example: f0(x) = dexpx

dx = expx

⇒f00(x) = expx >0. Hence exp is strictly convex.

(26)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Convexity and Derivatives

Theorem

If functionf has a second derivativef00, andf00 is non-negative (≥0) for all x, then f is convex. Iff00 is positive (>0) for all x, thenf is strictlyconvex.

0 2 4 6 8 10

-4 -2 0 2 4

exp x

ex is convex!

Example: f0(x) = dexpx

dx = expx ⇒f00(x) = expx >0.

Hence exp is strictly convex.

(27)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Convexity and Derivatives

Theorem

If functionf has a second derivativef00, andf00 is non-negative (≥0) for all x, then f is convex. Iff00 is positive (>0) for all x, thenf is strictlyconvex.

0 2 4 6 8 10

-4 -2 0 2 4

exp x

ex is convex!

Example: f0(x) = dexpx

dx = expx ⇒f00(x) = expx >0. Hence exp is strictly convex.

(28)

Outline Calculus Probability Inequalities

Limits and Convergence Convexity

Convexity and Derivatives

Theorem

If functionf has a second derivativef00, andf00 is non-negative (≥0) for all x, then f is convex. Iff00 is positive (>0) for all x, thenf is strictlyconvex.

0 2 4 6 8 10

-4 -2 0 2 4

exp x

ex is convex!

Example: f0(x) = dexpx

dx = expx ⇒f00(x) = expx >0. Hence exp is strictly convex.

(29)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Probability

A.N. Kolmogorov, 1903–1987

(30)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

1 Calculus

Limits and Convergence Convexity

2 Probability

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

3 Inequalities

Jensen’s Inequality Gibbs’s Inequality

(31)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Probability Space

A probability space (Ω,F,P) is defined by

thesample space Ω whose elements are called outcomes ω, a sigma algebraF of subsets of Ω, whose elements are called eventsE, and

a measure P which determines the probabilities of events, P : F →[0,1].

MeasureP has to satisfy the probability axioms: P(E)≥0 for all E ∈ F,P(Ω) = 1, and P(E1∪E2∪. . .) =P

iP(Ei) if (Ei) is a countable sequence ofdisjoint events.

These axioms imply the usual rules of probability calculus, e.g., P(A∪B) =P(A) +P(B)−P(A∩B),P(Ω\E) = 1−P(E), etc.

(32)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Probability Space

A probability space (Ω,F,P) is defined by

thesample space Ω whose elements are called outcomes ω,

a sigma algebraF of subsets of Ω, whose elements are called eventsE, and

a measure P which determines the probabilities of events, P : F →[0,1].

MeasureP has to satisfy the probability axioms: P(E)≥0 for all E ∈ F,P(Ω) = 1, and P(E1∪E2∪. . .) =P

iP(Ei) if (Ei) is a countable sequence ofdisjoint events.

These axioms imply the usual rules of probability calculus, e.g., P(A∪B) =P(A) +P(B)−P(A∩B),P(Ω\E) = 1−P(E), etc.

(33)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Probability Space

A probability space (Ω,F,P) is defined by

thesample space Ω whose elements are called outcomes ω, a sigma algebraF of subsets of Ω, whose elements are called eventsE, and

a measure P which determines the probabilities of events, P : F →[0,1].

MeasureP has to satisfy the probability axioms: P(E)≥0 for all E ∈ F,P(Ω) = 1, and P(E1∪E2∪. . .) =P

iP(Ei) if (Ei) is a countable sequence ofdisjoint events.

These axioms imply the usual rules of probability calculus, e.g., P(A∪B) =P(A) +P(B)−P(A∩B),P(Ω\E) = 1−P(E), etc.

(34)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Probability Space

A probability space (Ω,F,P) is defined by

thesample space Ω whose elements are called outcomes ω, a sigma algebraF of subsets of Ω, whose elements are called eventsE, and

a measure P which determines the probabilities of events, P : F →[0,1].

MeasureP has to satisfy the probability axioms: P(E)≥0 for all E ∈ F,P(Ω) = 1, and P(E1∪E2∪. . .) =P

iP(Ei) if (Ei) is a countable sequence ofdisjoint events.

These axioms imply the usual rules of probability calculus, e.g., P(A∪B) =P(A) +P(B)−P(A∩B),P(Ω\E) = 1−P(E), etc.

(35)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Probability Space

A probability space (Ω,F,P) is defined by

thesample space Ω whose elements are called outcomes ω, a sigma algebraF of subsets of Ω, whose elements are called eventsE, and

a measure P which determines the probabilities of events, P : F →[0,1].

MeasureP has to satisfy the probability axioms: P(E)≥0 for all E ∈ F,P(Ω) = 1, and P(E1∪E2∪. . .) =P

iP(Ei) if (Ei) is a countable sequence ofdisjointevents.

These axioms imply the usual rules of probability calculus, e.g., P(A∪B) =P(A) +P(B)−P(A∩B),P(Ω\E) = 1−P(E), etc.

(36)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Probability Space

A probability space (Ω,F,P) is defined by

thesample space Ω whose elements are called outcomes ω, a sigma algebraF of subsets of Ω, whose elements are called eventsE, and

a measure P which determines the probabilities of events, P : F →[0,1].

MeasureP has to satisfy the probability axioms: P(E)≥0 for all E ∈ F,P(Ω) = 1, and P(E1∪E2∪. . .) =P

iP(Ei) if (Ei) is a countable sequence ofdisjointevents.

These axioms imply the usual rules of probability calculus, e.g., P(A∪B) =P(A) +P(B)−P(A∩B),P(Ω\E) = 1−P(E), etc.

(37)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Venn Diagrams

0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000 0000000

1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111 1111111

A B

A B

(38)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Probability Calculus

1 The conditional probabilityof event B given that event A occurs is defined as

P(B |A) = P(A∩B)

P(A) for A such thatP(A)>0.

2 P(A∩B) =P(A)·P(B |A) =P(B)·P(A|B) .

3 Bayes’ rule: P(B |A) = P(A|B)·P(B)

P(A) .

4 Chain rule: P(∩Ni=1Ei) =

N

Y

i=1

P(Ei | ∩ij−1=1Ej)

=P(E1)·P(E2 |E1)·P(E3 |E1∩E2)·. . .

·P(EN |E1∩. . .∩EN−1) .

(39)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Probability Calculus

1 The conditional probabilityof event B given that event A occurs is defined as

P(B |A) = P(A∩B)

P(A) for A such thatP(A)>0.

2 P(A∩B) =P(A)·P(B |A) =P(B)·P(A|B) .

3 Bayes’ rule: P(B |A) = P(A|B)·P(B)

P(A) .

4 Chain rule: P(∩Ni=1Ei) =

N

Y

i=1

P(Ei | ∩ij−1=1Ej)

=P(E1)·P(E2 |E1)·P(E3 |E1∩E2)·. . .

·P(EN |E1∩. . .∩EN−1) .

(40)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Probability Calculus

1 The conditional probabilityof event B given that event A occurs is defined as

P(B |A) = P(A∩B)

P(A) for A such thatP(A)>0.

2 P(A∩B) =P(A)·P(B |A) =P(B)·P(A|B) .

3 Bayes’ rule: P(B |A) = P(A|B)·P(B) P(A) .

4 Chain rule: P(∩Ni=1Ei) =

N

Y

i=1

P(Ei | ∩ij−1=1Ej)

=P(E1)·P(E2 |E1)·P(E3 |E1∩E2)·. . .

·P(EN |E1∩. . .∩EN−1) .

(41)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Probability Calculus

1 The conditional probabilityof event B given that event A occurs is defined as

P(B |A) = P(A∩B)

P(A) for A such thatP(A)>0.

2 P(A∩B) =P(A)·P(B |A) =P(B)·P(A|B) .

3 Bayes’ rule: P(B |A) = P(A|B)·P(B) P(A) .

4 Chain rule:

P(∩Ni=1Ei) =

N

Y

i=1

P(Ei | ∩ij−1=1Ej)

=P(E1)·P(E2 |E1)·P(E3 |E1∩E2)·. . .

·P(EN |E1∩. . .∩EN−1) .

(42)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Random Variables

Technically, a random variable is a (measurable) function X : Ω→Rfrom the sample space to the reals.

The probability measureP on Ω determines the distribution of X: PX(A) = Pr[X ∈A] =P({ω : X(ω)∈A}) ,

whereA⊆R.

It is often more natural to relabel the outcomes and denote them, for instance, by letters,A,B,C,..., or words red,black, ... In practice, we often forget about the underlying probability space Ω, and just speak of random variableX and its distribution PX.

(43)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Random Variables

Technically, a random variable is a (measurable) function X : Ω→Rfrom the sample space to the reals.

The probability measureP on Ω determines the distribution of X: PX(A) = Pr[X ∈A] =P({ω : X(ω)∈A}) ,

whereA⊆R.

It is often more natural to relabel the outcomes and denote them, for instance, by letters,A,B,C,..., or words red,black, ... In practice, we often forget about the underlying probability space Ω, and just speak of random variableX and its distribution PX.

(44)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Random Variables

Technically, a random variable is a (measurable) function X : Ω→Rfrom the sample space to the reals.

The probability measureP on Ω determines the distribution of X: PX(A) = Pr[X ∈A] =P({ω : X(ω)∈A}) ,

whereA⊆R.

It is often more natural to relabel the outcomes and denote them, for instance, by letters,A,B,C,..., or words red,black, ...

In practice, we often forget about the underlying probability space Ω, and just speak of random variableX and its distribution PX.

(45)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Random Variables

Technically, a random variable is a (measurable) function X : Ω→Rfrom the sample space to the reals.

The probability measureP on Ω determines the distribution of X: PX(A) = Pr[X ∈A] =P({ω : X(ω)∈A}) ,

whereA⊆R.

It is often more natural to relabel the outcomes and denote them, for instance, by letters,A,B,C,..., or words red,black, ...

In practice, we often forget about the underlying probability space Ω, and just speak of random variableX and its distribution PX.

(46)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Random Variables

The distribution of a random variable canalwaysbe represented as acumulative distribution function(cdf)FX(x) = Pr[X ≤x].

In addition:

A discreterandom variableX with countable alphabetX has a probability mass function(pmf) pX such that

Pr[X =x] =pX(x).

A continuous random variableY has aprobability density function(pdf) fY such that Pr[Y ∈A] =R

AfY(x)dy.

There are alsomixed random variables that are neither discrete nor continuous. They don’t have a pmf or pdf, but they do have a cdf. We often omit the subscriptsX,Y, . . .and write p(x),f(y), etc.

(47)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Random Variables

The distribution of a random variable canalwaysbe represented as acumulative distribution function(cdf)FX(x) = Pr[X ≤x].

In addition:

A discreterandom variableX with countable alphabetX has a probability mass function(pmf) pX such that

Pr[X =x] =pX(x).

A continuous random variableY has aprobability density function(pdf) fY such that Pr[Y ∈A] =R

AfY(x)dy.

There are alsomixed random variables that are neither discrete nor continuous. They don’t have a pmf or pdf, but they do have a cdf. We often omit the subscriptsX,Y, . . .and write p(x),f(y), etc.

(48)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Random Variables

The distribution of a random variable canalwaysbe represented as acumulative distribution function(cdf)FX(x) = Pr[X ≤x].

In addition:

A discreterandom variableX with countable alphabetX has a probability mass function(pmf) pX such that

Pr[X =x] =pX(x).

A continuousrandom variable Y has aprobability density function(pdf) fY such that Pr[Y ∈A] =R

AfY(x)dy.

There are alsomixed random variables that are neither discrete nor continuous. They don’t have a pmf or pdf, but they do have a cdf. We often omit the subscriptsX,Y, . . .and write p(x),f(y), etc.

(49)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Random Variables

The distribution of a random variable canalwaysbe represented as acumulative distribution function(cdf)FX(x) = Pr[X ≤x].

In addition:

A discreterandom variableX with countable alphabetX has a probability mass function(pmf) pX such that

Pr[X =x] =pX(x).

A continuousrandom variable Y has aprobability density function(pdf) fY such that Pr[Y ∈A] =R

AfY(x)dy.

There are alsomixed random variables that are neither discrete nor continuous. They don’t have a pmf or pdf, but they do have a cdf.

We often omit the subscriptsX,Y, . . .and write p(x),f(y), etc.

(50)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Random Variables

The distribution of a random variable canalwaysbe represented as acumulative distribution function(cdf)FX(x) = Pr[X ≤x].

In addition:

A discreterandom variableX with countable alphabetX has a probability mass function(pmf) pX such that

Pr[X =x] =pX(x).

A continuousrandom variable Y has aprobability density function(pdf) fY such that Pr[Y ∈A] =R

AfY(x)dy.

There are alsomixed random variables that are neither discrete nor continuous. They don’t have a pmf or pdf, but they do have a cdf.

We often omit the subscriptsX,Y, . . . and write p(x),f(y), etc.

(51)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Random Variables

Since random variables are functions, we can define more random variables as functions of random variables: iff is a function, andX andY are r.v.’s, thenf(X) : Ω→Ris a r.v.,X +Y is a r.v., etc.

Example: Let r.v.X be the outcome of a die.

The pmf of X is given bypX(x) = 1/6 for all x ∈ {1,2,3,4,5,6}.

The pmf of r.v. X2 is given bypX2(x) = 1/6 for all x ∈ {1,4,9,16,25,36}.

!

In particular, a pmfpX is a function, and hence, pX(X) is also a random variable. Further, pX2(X),lnpX(X), etc. are random variables.

(52)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Random Variables

Since random variables are functions, we can define more random variables as functions of random variables: iff is a function, andX andY are r.v.’s, thenf(X) : Ω→Ris a r.v.,X +Y is a r.v., etc.

Example: Let r.v.X be the outcome of a die.

The pmf of X is given bypX(x) = 1/6 for all x ∈ {1,2,3,4,5,6}.

The pmf of r.v. X2 is given by pX2(x) = 1/6 for all x ∈ {1,4,9,16,25,36}.

!

In particular, a pmfpX is a function, and hence, pX(X) is also a random variable. Further, pX2(X),lnpX(X), etc. are random variables.

(53)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Random Variables

Since random variables are functions, we can define more random variables as functions of random variables: iff is a function, andX andY are r.v.’s, thenf(X) : Ω→Ris a r.v.,X +Y is a r.v., etc.

Example: Let r.v.X be the outcome of a die.

The pmf of X is given bypX(x) = 1/6 for all x ∈ {1,2,3,4,5,6}.

The pmf of r.v. X2 is given by pX2(x) = 1/6 for all x ∈ {1,4,9,16,25,36}.

!

In particular, a pmfpX is a function, and hence, pX(X) is also a random variable. Further, pX2(X),lnpX(X), etc. are random variables.

(54)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Random Variables

Since random variables are functions, we can define more random variables as functions of random variables: iff is a function, andX andY are r.v.’s, thenf(X) : Ω→Ris a r.v.,X +Y is a r.v., etc.

Example: Let r.v.X be the outcome of a die.

The pmf of X is given bypX(x) = 1/6 for all x ∈ {1,2,3,4,5,6}.

The pmf of r.v. X2 is given by pX2(x) = 1/6 for all x ∈ {1,4,9,16,25,36}.

!

In particular, a pmfpX is a function, and hence, pX(X) is also a random variable. Further, pX2(X),lnpX(X), etc. are random variables.

(55)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Multivariate Distributions

The probabilistic behavior of two or more random variables is described by multivariate distributions.

Thejoint distributionof r.v.’s X andY is PX,Y(A,B) = Pr[X ∈A ∧ Y ∈B]

=P({ω : X(ω)∈A,Y(ω)∈B}) .

For each multivariate distributionPX,Y, there are uniquemarginal distributionsPX andPY such that

PX(A) =PX,Y(A,R), PY(B) =PX,Y(R,B) ,

pmf:pY(y) = X

x∈X

pX,Y(x,y) pdf:fY(y) = Z

R

fX,Y(x,y)dx .

(56)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Multivariate Distributions

The probabilistic behavior of two or more random variables is described by multivariate distributions.

Thejoint distributionof r.v.’s X andY is PX,Y(A,B) = Pr[X ∈A ∧ Y ∈B]

=P({ω : X(ω)∈A,Y(ω)∈B}) .

For each multivariate distributionPX,Y, there are uniquemarginal distributionsPX andPY such that

PX(A) =PX,Y(A,R), PY(B) =PX,Y(R,B) ,

pmf:pY(y) = X

x∈X

pX,Y(x,y) pdf:fY(y) = Z

R

fX,Y(x,y)dx .

(57)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Multivariate Distributions

The probabilistic behavior of two or more random variables is described by multivariate distributions.

Thejoint distributionof r.v.’s X andY is PX,Y(A,B) = Pr[X ∈A ∧ Y ∈B]

=P({ω : X(ω)∈A,Y(ω)∈B}) .

For each multivariate distributionPX,Y, there are uniquemarginal distributionsPX andPY such that

PX(A) =PX,Y(A,R), PY(B) =PX,Y(R,B) ,

pmf:pY(y) =X

x∈X

pX,Y(x,y) pdf:fY(y) = Z

R

fX,Y(x,y)dx .

(58)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Multivariate Distributions

Theconditional distributionis defined similar to conditional probability:

PY|X(B |A) =PX,Y(A,B)

PX(A) for Asuch that PX(A)>0.

For discrete/continuous variables we have: discrete r.v.’s:

pY|X(y |x) = pX,Y(x,y)

pX(x) , pX(x)>0 , continuousr.v.’s:

fY|X(y|x) = fX,Y(x,y)

fX(x) , fX(x)>0 .

(59)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Multivariate Distributions

Theconditional distributionis defined similar to conditional probability:

PY|X(B |A) =PX,Y(A,B)

PX(A) for Asuch that PX(A)>0.

For discrete/continuous variables we have:

discrete r.v.’s:

pY|X(y |x) = pX,Y(x,y)

pX(x) , pX(x)>0 ,

continuousr.v.’s:

fY|X(y|x) = fX,Y(x,y)

fX(x) , fX(x)>0 .

(60)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Multivariate Distributions

Theconditional distributionis defined similar to conditional probability:

PY|X(B |A) =PX,Y(A,B)

PX(A) for Asuch that PX(A)>0.

For discrete/continuous variables we have:

discrete r.v.’s:

pY|X(y |x) = pX,Y(x,y)

pX(x) , pX(x)>0 , continuousr.v.’s:

fY|X(y|x) = fX,Y(x,y)

fX(x) , fX(x)>0 .

(61)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Independence

VariableX is said to be independent of variableY (X Y) iff PX,Y(A,B) =PX(A)·PY(B) for all A,B⊆R.

This is equivalent to

PX|Y(A|B) =PX(A) for all B such that P(B)>0, and

PY|X(B|A) =PY(B) for all Asuch thatP(A)>0. In words, knowledge about one variable tells nothing about the other. Note that independence is symmetric,X Y ⇔Y X.

(62)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Independence

VariableX is said to be independent of variableY (X Y) iff PX,Y(A,B) =PX(A)·PY(B) for all A,B⊆R. This is equivalent to

PX|Y(A|B) =PX(A) for all B such thatP(B)>0,

and

PY|X(B|A) =PY(B) for all Asuch thatP(A)>0. In words, knowledge about one variable tells nothing about the other. Note that independence is symmetric,X Y ⇔Y X.

(63)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Independence

VariableX is said to be independent of variableY (X Y) iff PX,Y(A,B) =PX(A)·PY(B) for all A,B⊆R. This is equivalent to

PX|Y(A|B) =PX(A) for all B such thatP(B)>0, and

PY|X(B|A) =PY(B) for all Asuch thatP(A)>0.

In words, knowledge about one variable tells nothing about the other. Note that independence is symmetric,X Y ⇔Y X.

(64)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Expectation

Theexpectation(or expected value, or mean) of a discrete random variable is given by

E[X] = X

x∈X

p(x)x .

The expectation of a continuous random variable is given by E[X] =

Z

X

f(x)x dx . In both cases, it is possible thatE[X] =±∞.

E[kX] =kE[X] E[X+Y] =E[X] +E[Y] E[XY] =E[X]E[Y] if X Y

(65)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Expectation

Theexpectation(or expected value, or mean) of a discrete random variable is given by

E[X] = X

x∈X

p(x)x .

The expectation of a continuous random variable is given by E[X] =

Z

X

f(x)x dx .

In both cases, it is possible thatE[X] =±∞.

E[kX] =kE[X] E[X+Y] =E[X] +E[Y] E[XY] =E[X]E[Y] if X Y

(66)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Expectation

Theexpectation(or expected value, or mean) of a discrete random variable is given by

E[X] = X

x∈X

p(x)x .

The expectation of a continuous random variable is given by E[X] =

Z

X

f(x)x dx . In both cases, it is possible thatE[X] =±∞.

E[kX] =kE[X] E[X+Y] =E[X] +E[Y] E[XY] =E[X]E[Y] if X Y

(67)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Expectation

Theexpectation(or expected value, or mean) of a discrete random variable is given by

E[X] = X

x∈X

p(x)x .

The expectation of a continuous random variable is given by E[X] =

Z

X

f(x)x dx . In both cases, it is possible thatE[X] =±∞.

E[kX] =kE[X] E[X+Y] =E[X] +E[Y] E[XY] =E[X]E[Y] if X Y

(68)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Law of Large Numbers

Let X1,X2, . . . be a sequence of independent outcomes of a die, so that pXi(x) = 1/6 for all i ∈N,x ∈ {1,2,3,4,5,6}.

0 0.05 0.1 0.15 0.2 0.25 0.3

1 2 3 4 5 6

E[Xi] =

6

X

x=1

1

6x= 21

6 = 3.5 for all i ∈N.

(69)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Law of Large Numbers

Let X1,X2, . . . be a sequence of independent outcomes of a die, so that pXi(x) = 1/6 for all i ∈N,x ∈ {1,2,3,4,5,6}.

0 0.05 0.1 0.15 0.2 0.25 0.3

1 2 3 4 5 6

E[Xi] =

6

X

x=1

1

6x= 21

6 = 3.5 for alli ∈N.

(70)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Law of Large Numbers

LetSn=Pn

i=1Xnbe the sum of the first n outcomes.

The distribution ofSn is given by

PSn(x) = # of ways to get sum x withn dice 6n

(71)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Law of Large Numbers

LetSn=Pn

i=1Xnbe the sum of the first n outcomes.

The distribution ofSn is given by

PSn(x) = # of ways to get sum x withn dice 6n

(72)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Law of Large Numbers

LetSn=Pn

i=1Xnbe the sum of the first n outcomes.

The distribution ofSn is given by

PSn(x) = # of ways to get sum x withn dice 6n

distribution ofS1

0 0.05 0.1 0.15 0.2 0.25 0.3

1 2 3 4 5 6

(73)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Law of Large Numbers

LetSn=Pn

i=1Xnbe the sum of the first n outcomes.

The distribution ofSn is given by

PSn(x) = # of ways to get sum x withn dice 6n

distribution ofS2

0 0.05 0.1 0.15 0.2 0.25

2 4 6 8 10 12

(74)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Law of Large Numbers

LetSn=Pn

i=1Xnbe the sum of the first n outcomes.

The distribution ofSn is given by

PSn(x) = # of ways to get sum x withn dice 6n

distribution ofS3

0 0.05 0.1 0.15 0.2

4 6 8 10 12 14 16 18

(75)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Law of Large Numbers

LetSn=Pn

i=1Xnbe the sum of the first n outcomes.

The distribution ofSn is given by

PSn(x) = # of ways to get sum x withn dice 6n

distribution ofS4

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14

5 10 15 20

(76)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Law of Large Numbers

LetSn=Pn

i=1Xnbe the sum of the first n outcomes.

The distribution ofSn is given by

PSn(x) = # of ways to get sum x withn dice 6n

distribution ofS5

0 0.02 0.04 0.06 0.08 0.1 0.12

5 10 15 20 25 30

(77)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Law of Large Numbers

LetSn=Pn

i=1Xnbe the sum of the first n outcomes.

The distribution ofSn is given by

PSn(x) = # of ways to get sum x withn dice 6n

distribution ofS10

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

10 20 30 40 50 60

(78)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Law of Large Numbers

LetSn=Pn

i=1Xnbe the sum of the first n outcomes.

The distribution ofSn is given by

PSn(x) = # of ways to get sum x withn dice 6n

distribution ofS20

0 0.01 0.02 0.03 0.04 0.05 0.06

20 40 60 80 100 120

(79)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Law of Large Numbers

LetSn=Pn

i=1Xnbe the sum of the first n outcomes.

The distribution ofSn is given by

PSn(x) = # of ways to get sum x withn dice 6n

distribution of S100

0 0.005 0.01 0.015 0.02 0.025 0.03

100 200 300 400 500 600

(80)

Outline Calculus Probability Inequalities

Probability Space and Random Variables Joint and Conditional Distributions Expectation

Law of Large Numbers

Law of Large Numbers

Source: Wikipedia

Viittaukset

LIITTYVÄT TIEDOSTOT

It is easily seen that all prefix codes are uniquely decodable: each symbol can be decoded as soon as its codeword is read.. Outline Codes

Note that since Shannon-Fano gives E [`(X )] ≤ H(X ) + 1, and Huffman is optimal, Huffman must satisfy the same bound. Jyrki Kivinen

1 We know (think) that the source symbols are generated by a Bernoulli model with parameter p ∈ [0, 1].. 2 We’d like to encode data at

2 MDL Principle Rules &amp; Exceptions Probabilistic Models Old-Style MDL Modern MDL.. Jyrki Kivinen

Encoding Continuous Data Differential Entropy Linear Regression Subset Selection Problem Wavelet Denoising..

To summarize, the prefix property is reasonable from a practical point of view, and from a theoretical point of view can be assumed without increasing input lengths too much....

Since the uncertainty on all kinds of unobserved quantities may be expressed using conditional (posterior) probability distributions given the observed quantities (data), we only

In these models the conditional distribution, not only the conditional expectation (and possibly conditional variance) is speci…ed as a convex combination of (typically)