• Ei tuloksia

Stochastic approximation methods in stochastic optimization

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Stochastic approximation methods in stochastic optimization"

Copied!
82
0
0

Kokoteksti

(1)

Faulty of Tehnology

Department of Mathematis and Physis

Laboratory of Applied Mathematis

Stohasti approximation methods in

stohasti optimization

The topi of this Master's thesis was approved by the ounil of the Department of

Mathematis and Physis on

25 th

of Otober, 2013.

The examiners of the thesis were:

Prof. Heikki Haario and PHD Tuomo Kauranne.

The thesis was supervised by Professor Heikki Haario.

In Lappeenranta,8.11.2013

Ilya Bobylev

Ruskonlahdenkatu 13-15 E6

53850 Lappeenranta

Phone:

+3580417061366

Ilya.Bobylev[at℄lut.

(2)

Lappeenranta University of Tehnology

Faulty of Tehnology

Department of Mathematis and Physis

Ilya Bobylev

Stohasti approximation methods in stohasti optimization

Thesis for the Degree of Master of Siene inTehnology

2013 year

72 pages, 33gures,6 tables

Supervisors: Prof. Heikki Haario, PHD Tuomo Kauranne.

Examiners: Prof. Heikki Haario, PHD Tuomo Kauranne.

Key words: Stohasti Optimization, Deterministi equivalents, Stohasti Quasi-

gradient algorithm (SQG), mean-value, value-at-risk (VaR), onditional value-at-risk

(CVaR).

Stohastiapproximationmethodsforstohastioptimizationareonsidered. Reviewed

the main methods of stohasti approximation: stohasti quasi-gradient algorithm,

Kiefer-Wolfowitz algorithm and adaptive rules for them, simultaneous perturbation

stohasti approximation (SPSA) algorithm. Suggested the model and the solution

of the retailer'sprot optimizationproblemand onsidered an appliationof the SQG-

algorithm for the optimization problems with objetive funtions given in the form of

ordinary dierentialequation.

(3)

First of all, I would like to express my respet to the Department of Mathematis for

the sholarship duringmy studies,otherwise my stay in Finlandwould not bepossible.

Seondly, I have to thank the Finnishatmosphere: peaeful and quiet, that gives wide

possibilitiesfor produtive study and work.

Finally my sinere gratitude goes toprofessors Heikki Haario, Matti Heilioand Tuomo

Kauranne for their supervision and advies.

Lappeenranta, 2013.

Ilya Bobylev

(4)

CDF -umulative distributionfuntion.

PDF - probabilitydensity funtion.

SQG - stohasti quasi-gradientalgorithm.

VaR- value-at-risk.

CVaR - onditional value-at-risk.

w.r.t - with respet to.

a.s. - onvergene almost sure.

• σ{X 1 , X 2 , ..., X n }

- sigma-algebra, generated by the distribution of random vari- ables

X 1 , X 2 , ..., X n

.

MC - Monte-Carlosimulation.

MCMC -Markov Chain Monte-Carlosimulation.

LP - linearprogramming.

SPSA -Simultaneous perturbationstohasti approximation.

SA - Stohasti approximation.

KW - Kiefer-Wolfowitz algorithm.

ODE - Ordinarydierentialequation.

(5)

1 Introdution 1

1.1 Objetive of the thesis . . . 1

1.2 Struture of the thesis . . . 1

1.3 Sienti novelty . . . 2

2 Optimization riteria, used in stohasti programming 3 2.0.1 Mean-value riterion . . . 3

2.0.2 Minimaxriterion . . . 4

2.0.3 Value-at-Risk(VaR)riterion . . . 4

2.0.4 Conditional Value-at-Risk (CVaR)riterion . . . 5

2.0.5 Demonstration example . . . 5

3 Stohasti approximation 8 3.1 Conditional mathematialexpetation . . . 10

3.2 Disrete time martingales . . . 11

3.3 Convergene of the supermartingales . . . 12

3.4 Convergene of the random variables . . . 12

3.5 Stohasti quasi-gradientmethod . . . 14

3.6 Kiefer-Wolfowitzalgorithm . . . 20

3.7 Reipesfor step sizes . . . 22

3.7.1 Kesten Rule . . . 22

3.7.2 Adaptiverule of Uryasev . . . 26

3.8 Simultaneous perturbation stohasti approximation. . . 27

4 Appliation problem: EletriityRetailer Prot Optimization 31

(6)

4.2 Solutionwith mean-valueriterion . . . 33

4.2.1 Deterministiequivalent . . . 33

4.2.2 SQG-solution . . . 38

4.3 Solutionwith CVaRriterion . . . 38

4.3.1 Deterministiequivalent . . . 40

4.3.2 SQG-solution . . . 42

4.4 Numerialexamples. . . 43

4.5 Analysisof the results . . . 46

5 Appliation problem: SQG-algorithm for the objetive funtions in ODE-form 47 5.1 Problemformulation . . . 47

5.2 Solutiontehnique . . . 48

5.2.1 Equation invariations . . . 49

5.2.2 Appliationof the equation in variationsin the onsidered ase. . 51

5.3 Numerialexamples. . . 54

5.4 Analysisof the results . . . 62

6 Conlusion and main results 63 Referenes 64 7 Appendies 66 7.1 Implementation of the stohasti approximationalgorithms . . . 66

7.1.1 ImplementationoftheSQG-algorithmwith expliitlydenedgra- dient . . . 67

(7)

dient and adaptive Kestenrule . . . 67

7.1.3 ImplementationoftheSQG-algorithmwith expliitlydenedgra-

dient and adaptive Uryasevrule . . . 67

7.1.4 Implementation of the Kiefer-Wolfowitzalgorithm . . . 68

7.1.5 Implementation of the Kiefer-Wolfowitz algorithmwith adaptive

Kesten rule . . . 68

7.1.6 Implementation of the Kiefer-Wolfowitz algorithmwith adaptive

Uryasev rule . . . 69

7.1.7 Implementation of the SPSA algorithm . . . 69

7.1.8 The funtion whih alulates nite dierene approximation of

the stohasti quasi-gradient . . . 70

7.1.9 The funtion displayingthe onvergene rate . . . 70

7.1.10 Eletriity retailer prot optimization- solution with mathemat-

ialexpetation.. . . 70

(8)

1 Numerialvalues. . . 43

2 Disrete distribution of the random demand. . . 43

3 Disrete distribution of the random prie.. . . 43

4 Measurements of the reation. . . 54

5 Measurements of the reation . . . 57

6 Calulation results . . . 59

(9)

1 Illustrationof the solution. . . 7

2 Convergene of the random variablesin mean-square. . . 13

3 Convergene of the random variablesalmost sure. . . 13

4 Illustrationof the SQG-algorithm.. . . 17

5 Illustrationof the lassialgradient desent method.. . . 18

6 Illustrationof the stohasti gradientmethod. . . 19

7 Optimizationfuntion. . . 23

8 Small step size. Stohasti Gradient Desent (left). Stohasti Gradient Desent with adaptive Kestenrule (right). . . 24

9 Kiefer-Wolfowitzalgorithm(left). Kiefer-Wolfowitzalgorithmwithadap- tiveKesten rule (right). . . 24

10 Stohasti Gradient Desent (left). Stohasti Gradient Desent with adaptiveKesten rule (right). . . 25

11 Kiefer-Wolfowitzalgorithm(left). Kiefer-Wolfowitzalgorithmwithadap- tiveKesten rule (right). . . 25

12 Stohasti Gradient Desent (left). Stohasti Gradient Desent with adaptiveUryasev rule (right). . . 27

13 Kiefer-Wolfowitzalgorithm(left). Kiefer-Wolfowitzalgorithmwithadap- tiveUryasev rule (right). . . 27

14 Comparison of the SQG and SPSA onvergene traes. . . 29

15 Illustrationof the SPSA onvergene. . . 30

16 Distributionof the imbalane. . . 33

17 Solutionwith mean-valueriterion forthe disretedistribution. . . 44

18 Solutionwith mean-valueriterion forthe ontinuous distribution. . . 44

19 SQG-algorithmfor the ontinuous distribution. . . 45

(10)

21 Solutionwith CVaRriterionfor the ontinuous distribution. . . 46

22 SQG-solution. . . 46

23 The sheme of the solution tehnique. . . 53

24 Distributionof the parametersestimated by MCMC for the rst example. 55 25 Objetive funtionand the trae by the algorithmfor the rst test-ase. . 56

26 Convergene of the methodfor the rst test-ase. . . 56

27 Distributionof the parameters estimated by MCMC for the seondtest- ase. . . 57

28 SQG with equation invariations enhanement forthe seondexample. . 59

29 Convergene of the SQG-algorithmfor the seond example. . . 60

30 Kefer-Wolfowitz algroithmfor the seond test-ase. . . 60

31 Convergene of the KW-algorithm for the seondexample. . . 61

32 SPSA-algorithm forthe seondexample. . . 61

33 Convergene of the SPSA-algorithm forthe seondtest-ase. . . 62

(11)

Optimization problems under unertainty have been intensively studied by many re-

searhes during the last 50 years. In fat mathematial models based on stohasti

theory are more approahed tothe real life then deterministi beause the random pa-

rameters whih onsidered in objetive funtion or in onstraints reet the nature of

unertainty duringthe deision making.

Stohasti models are widely used in insurane, nanial and portfolio optimization

wherethebehaviorofmarketannotbepreiselyforeastedorinaerospaetehnologies

where it isrequired toobtain robust results.

Generally the solution of the stohasti optimization problems looks as follows: rstly

researher attempts to derive a deterministi equivalent and use deterministi method

of linear or nonlinear optimization depending on the ase. If the ost funtion is quite

ompliated but if the gradient an be expliitlyfound for the given realization of the

random parameters or suessfully approximated with nite dierenes, then stohas-

ti approximation (SA) will be a good hoie as a solution tehnique. Finally if the

methodfails (example: anobjetivefuntion ismultimodal, ontains both disreteand

ontinuous arguments), thenthe Monte-Carlosimulation,genetialgorithms,simulated

annealing or other methodsof globaloptimization are used.

Robustness, simpliity and omputation speed makes stohasti approximation a very

popularalgorithmforthe wide lass ofoptimizationproblems therefore developmentof

SA is animportantand demanded researh area.

1.1 Objetive of the thesis

The main purpose of the thesis is tostudy and develop stohasti approximation algo-

rithms whihare used instohasti optimization.

1.2 Struture of the thesis

The material of the thesis is organized as follows: the rst 3 hapters are devoted for

the denitions, theoretial bakground of the stohasti approximation and are mainly

neededtopreparethereaderfortheappliationproblemsthatarestudiedinthehapters

4-5.

(12)

The thesis ontains several new theoretial results: deterministiequivalents of the op-

timizationproblemswithmean-valueandCVaR- eletriityretailerprotoptimization

problem(seetheoremsintheorrespondingsetion)andinterestingomputationalidea:

equationinvariationsused inSQG-algorithmforthe objetivefuntionsgiveninODE-

form.

The rstresultgivesanopportunity tosubstitutethe solutionofthe stohastiproblem

with deterministi one and instantly get the result using a well-known simplex-method

for the linear-programming.

The seond result signiantly inreases omputational speed of the stohasti approx-

imation proedure for the onsidered problem.

(13)

ming

Let

Φ(u, X)

further in this setion denotes a ost funtion, where

u ∈ U ⊂ R n

is an optimization strategy and X is a random vetor with the realizations belonging to the

set

E ⊆ R m

.

2.0.1 Mean-value riterion

Historially the rst wasa mean-value riterionwhihan bedened as follows:

min u∈U E[Φ(u, X)].

(1)

Intuitively the riterion means to hose those strategy, whih minimizesaverage loses,

represented by the objetive funtion.

The mainpropertiesofthe mean-valueriterionthat allowsonetoderiveasolutionand

build eetive algorithmsare:

Linearity:

E[aX + bY + c] = aE[X] + bE [Y ] + c,

(2)

where

X, Y

are randomvariablesand parameters

a, b, c ∈ R

;

Simple representation for the disrete distribution with nite number of realiza- tions:

E[X] = X n

i=1

x i p i ,

(3)

where

x i

,

i = 1, .., n

are realisations of the random vetor X, and

p i

,

i = 1, .., n

are the

orresponding probabilities,so that

P (X = x i ) = p i

.

Unfortunately, the mean-value riterion together with suh attrative properties has a

numberofaseswhenitfails: togetabasiideaoneanimaginethesituationwhenthe

deision about the onditionsof the patientsina lini are made,based onthe average

temperature amongthem. The nonsense is obvious- it is possible to have the value of

the riterion

36, 6

but all the patients will be about to die: the rst half with

∼ 34C

and the rest with

∼ 40C

.

(14)

Theminimaxriterionomesfromthegametheoryandinthestatistialanalysismeans

to hose those strategy,whih attainsthe minimum of loses in the worst possible ase:

min u∈U max

X ∈E Φ(u, X).

(4)

The problem statement an be treated as a game against an aggressive nature all the

time forming the worst possible ase for the deision maker. The positive side of the

strategy hosen aording to minimax is an assurane that the losses will be always

below foreasted. Drawbak eets are obvious: suh strategy ould be too autious

and wasteful and often unahievable. For example in the book [4℄ the hapter devoted

to optimization of the airraft runway area ontains a joke that in order to solve the

problem with the minimax riterionit is required to onrete the whole globe together

with oeans and sees.

2.0.3 Value-at-Risk (VaR) riterion

To show an idea behind the VaR riterion let us dene two onepts: the probability

funtion

P φ (u)

and the quantilefuntion

φ α (u)

[4℄.

P φ (u) = P {Φ(u, X) ≤ φ},

(5)

φ α (u) = min {ϕ : P ϕ (u) ≥ α}.

(6)

The probability funtionrepresentsprobability thatthe ostfuntion

Φ(u, X)

doesnot

exeed thelevel

φ

forahosen xedstrategy

u

,while thequartilefuntionindiatesthe

orrespondingminimallevel. Finally,togettheVaR-strategy,thefollowingoptimization

task haveto be solved:

min u∈U φ α (u).

(7)

Unfortunately VaR laks of some desirable theoretial properties. Firstly, it does not

preserve onvexity, i.e. onvexity of the objetive funtion w.r.t. to a strategy, does

not always results in a onvexity of the quantile funtion. Seondly, there are quite a

smallnumberof ases when it ispossibleto derivea gradient w.r.t toa strategy of the

VaRexpliitly. Thismakesthe solutionproessofVaRproblems extremelyompliated

and therefore the Monte-Carlosimulationis oftenused. Oneapproah todeal with the

quantile riterionisto build anupper-value estimate, based onthe ondene set.

(15)

minimax problem:

φ α (u) = min

E∈ E α max

x∈E Φ(u, x),

where

E α

isafamilyoftheondentsets

P (E) ≥ α

,

E ∈ E α

. Obviouslyifinsteadofthe optimalondentset

E

willbeusedanarbitraryone

E ˆ

, thenanupperestimatetothe

optimal solutionwillbe found. Basedonthat observation therewere many appliation

problem solved [23, 24,25℄.

2.0.4 Conditional Value-at-Risk (CVaR) riterion

The riterion wasreated asan enhanement to VaRand represents anaverage rate of

loses exeeding the level

φ α (u)

:

ψ α (u) = E[Φ(u, X) | Φ(u, X) ≥ φ α (u)] = 1 1 − α

Z

Φ(u,X(w))≥φ α (u)

Φ(u, X(w))dP (w),

(8)

Aordingto[7℄, aminimizationofCVaRoverthestrategy

u ∈ U

equals tothe solution

of the following optimizationproblem:

ψ = min

(u,φ)∈U×R F α (u, φ),

(9)

where

F α (u, φ) = φ + 1

1 − α · E [max{F (u, Z) − φ, 0}.

(10)

This results opens wide possibilities for the CVaR-optimization via well-known teh-

niques of the mean-value minimization.

2.0.5 Demonstration example

Let us onsider a simple example to show the dierene between the foregoing riteria

in pratie. Assume that a ost funtion isdened aspresented below:

Φ(u, X ) = u 2 + X

(11)

and a randomparameter is a normally-distributed

X ∼ N (0, 1)

.

Mean-value solution:

min u∈ R {E[u 2 + X]}.

(12)

(16)

istiequivalent:

min u∈ R u 2

(13)

and the orrespondingsolution is

u = 0

,

E[u , X ] = 0

.

Minimaxsolution:

min u∈ R max

X {u 2 + X}.

(14)

The Normaldistributionhas aontinuousCDF, therefore

∀ C ∈ R

:

P (X > C ) >

0

. As a result, the left hand side of the previous equation is anon-restrited and the problem doesnot have a solution.

VaR-solution:

min u∈ R {φ α (u)}.

(15)

The probability funtion looks asfollows:

P φ (u) = P (u 2 + X ≤ φ) = P (X ≤ φ − u 2 ) = F X (φ − u 2 )

,where

F X

isaCDF of

X

The quantile funtion:

φ α (u) = min{φ : P φ (u) ≥ α} = min{φ : F X (φ − u 2 ) ≥ α}

By the denition CDF is a monotoniallynon-dereasing funtion, therefore:

F X (φ − u 2 ) ≥ α ⇐⇒ φ − u 2 ≥ x α = ⇒ φ ≥ u 2 + x α = ⇒ φ α (u) = u 2 + x α

Finally,the deterministiequivalentassumes the shown belowform:

min u∈ R {u 2 + x α }.

(16)

As it an be learly seen:

u = 0

,

φ α (u ) = x α

.

CVaR-solution: Considering the previous VaRsoultion:

ψ α (u) = E[Φ(u, X) | Φ(u, X) ≥ φ α (u)] = E[u 2 + x | u 2 + x ≥ u 2 + x α ] = u 2 + E[x | x ≥ x α ] = u 2 + y α

, where

y α = α − CV aR N(0,1)

. Finally,the determin-

istiequivalent assumesthe following form:

min u∈ R {u 2 + y α }

(17)

and the solution:

u = 0

,

ψ α (u ) = y α

.

It is worth to note, that in this partiulary example, the strategy

u = 0

is an optimal

simultaneously for the mean-value, VaR, and CVaR riteria.

(17)

−6 −4 −2 0 2 4 6 8 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

X: 2.34 Y: 0 X: 1.96

Y: 0

Mean−value α −CVaR α −VaR

Max Loss

Loss PDF(Loss)

Figure 1: Illustrationof the solution.

(18)

A historyof the stohasti approximationstarts fromtheartile ofRobbinsand Monro

[20℄ where authors applied a reursive SA proedure to nd a root of the equation

observing noisy data. As it has been disovered, there is noneed toknow exat values

of the funtion, beause during the solution we are only interested in the diretion of

the root. As aresultof thisobservation, the proessan workwith noisyvalues instead

of the exat data.

Let us review several illustrative appliation problems from the real life where the

stohasti approximationalgorithmsan be quiteuseful.

Duringthetestingofinsetiidesthereareaproblemofdeterminingthemaximum dose whih guarantee the intensity of the reation.

Thehardness ofthe iron-opperalloydependsonthetimeoftemperatureimpat.

Let

u

denotes a time,and

Y (u)

-the hardness ofalloy. The omputationproblem

is to determine suh

u

when the hardness of alloy

Y (u)

reahes

α

. It is also a

well-known fat that the hardness of alloy varies overthe items.

The sensitivityof the explosivesubstane tothe physialimpatare measured by hitting. The smallitem ofthe substane are punhed by solidobjetthrown from

the xed height. Some samples willexplode and some of them not. The problem

is todetermine the ritialheightfor the substane.

A grain eld are fertilized by some hemials. Let u denotes an amount of the

fertilizer and Y(u) denotes a orresponding amount of grain gathered from the

eld. It an be learlyseen that lak of the fertilizer (aswellas over-fertilization)

yields to a redution of the rops. Only some optimalamount of hemials leads

to a signiant outome. Moreover the proliness of the eld are hanging over

the years even if anamount of the fertilizer

u

remainsthe same.

The solution of the rst three ases an be organized aording to the proedure of

Robins and Monro: a tester piks an arbitrary value

x 1

from the admissible area, on-

duts anexperimentandobservesarealisation

y(x 1 )

of therandomvariable

Y (x 1 )

with

expetation

M (x 1 ) = E{Y (x 1 )}

, where

M

is some inreasing funtion. The tester also

hooses adereasing sequene

a n = n c

,where

c

is apositive onstantand

n

is anumber

of the step. The problemto besolved is todetermine suh

θ

, that

M (θ) = α

. For the

next experiment he takes

x

aording to the rule:

(19)

To understand the main idea of the method assume

α = 0

. In this ase the previous

statement yields

x n+1 = x n − c

n y(x n ).

(19)

If

y(x n ) > 0

, then

x n+1 < x n

, and

y(x n ) < 0

leads orrespondingly to

x n+1 > x n

,

meaningthattheequation(19)looksreasonablesinewearesearhing for

θ : M(θ) = 0

.

The solution of the last problem an be organized with the help of stohasti quasi-

gradient method whih is studiedin detailsfurther inthe thesis.

Kiefer andWolfowitz (in1d-ase) [17℄and Blum[21℄(inmultidimensionalase) applied

stohastiapproximationto optimizemean-value funtionals.

ThebooksofErmoliev[9℄,Powel[22℄ontaineasyunderstandableproofsandmotivation

of using stohastiapproximation.

KanandKibzun[4℄haveappliedstohastiapproximationfortheoptimizationproblems

with VaR riterion.

Spall [19℄ have suggested to use random perturbations in the lassial KW-proedure

and dereasethenumberof evaluationsofthe objetivefuntion. Moreoverhispersonal

page [19℄ lists alarge numberof referenes regarding appliationsof SPSA.

Let us list several dierent areas of siene where the SPSA-algorithmshave been su-

essfully applied[19℄:

Airraft modelingand ontrol

Atmospheri and planetarymodeling

Cardiologial data analysis

Noise anellation

Queuing network design

Robot ontrol

Sensor plaement

Tra signal timing orother transportation problems

Underground mine detetion

(20)

onvergene speed and suggested several dierent versionsof the proedure.

Thenextsetionontainsexplanationsofthestohastiapproximationproedureswhih

are used for the optimizationof ost-funtions.

The presentation is organized as follows: rstly we introdue onditional expetation,

then onitsbasissome denitionsfrommartingale theorywillbeshown. Havinginsight

in the martingale onvergene theory we gradually start the main proof regarding the

onvergeneoftheSQG-proedure. Attheendofthesetionwedesribearelativelynew

area ofstohastiapproximation-theSPSA algorithmanddisussadaptiveproedures

that pratially enhane onvergene.

3.1 Conditional mathematial expetation

Let

{Ω, F, P }

denotes a probability spae,

G

is a

σ

-algebra of random events

G ⊆ F

and

ξ

isarandomvariable

E[ξ] < ∞

. Arandomvariable

E[ξ | G]

isalledaonditional

mathematial expetationof

ξ

w.r.t.

G

if the followingonditions are valid:

1.

E[ξ | G]

is a

G

-measurable funtion, 2. for every set

A ⊆ G

,

Z

A

ξ(ω) P (dω) = Z

A

E [ξ | G] (ω) P (dω).

(20)

The main properties of the onditional mathematial expetation, that is used in the

next paragraphs are [13℄:

1. if

ξ = C = const

, then

E[ξ | G] = C (a.s.),

(21)

2. if

ξ ≤ η

(a.s) then

E[ξ | G] ≤ E [η | G] (a.s.),

(22)

3. if the randomvariable

ξ

is measurable with respet to

σ

-algebra

G

, then

E[η | G] = ξ (a.s.),

(23)

(21)

E [E[ξ | G]] = E[ξ] (a.s.),

(24)

5. linearity. If

a, b ∈ R

and

E [ξ] < ∞

,

E [η] < ∞

then

E [aξ + bη | G] = aE [ξ | G] + bE [η | G] (a.s.),

(25)

6. if the randomvariable

ξ

does not depend on

σ

-algebra

G

, then

E[ξ | G] = E[ξ] (a.s.),

(26)

7. onditionalmathematialexpetationwithrespet toarandomvariableisdened

as anexpetation w.r.t. the

σ

-algebra, generated by this variable:

E [ξ | F n ] = E [ξ | η] , F n = σ{η}.

(27)

To get intuitivesenseregarding onditionalmathematialexpetation, one an imagine

E [X | F ]

asarough,averagedversionofX,beauseonthearbitraryset

A ∈ F

random

variable X an assume any values, but

E [X | F ]

gets only one xed value:

E [X | A]

what is anaverage value of

X

on the set

A ∈ F

.

3.2 Disrete time martingales

Assume that

F 0 ⊆ F 1 ⊆ ... ⊆ F

is a non-dereasing family of

σ

-algebras, dened ona

probabilityspae

{Ω, F , P }

. Forexampleif

{ξ n }, n ≥ 0

isasetofrandomvariablesgiven

on

{Ω, F , P }

and

F n l = σ{ξ 0 , ... , ξ n }

isasigma-algebra,generatedby

{ξ k , k = 0, ... , n}

,

then

{F n l }

is a non-dereasing familyof

σ

-algebras.

Let

{X n , n ≥ 0}

isasequene ofrandomvariablesdened on

{Ω, F , P }

. Ifforall

n ≥ 0 X n

is

F n

-measurable, then

{X n , F n }

isalled a stohasti sequene.

Ifforall

n ≥ 0 X n

is

F n−1

-measurable,thenthesequene

{X n , F n−1 }

isalledastohas-

ti sequene.

Stohasti sequene

{X n , F n }

,

E [|X n |] < ∞

is alled:

a martingale,if

E [X n+1 | F n ] = X n (a.s.)

,

a submartingale,if

E [X n+1 | F n ] ≥ X n (a.s.)

,

a supermartingale,if

E [X n+1 | F n ] ≤ X n (a.s.)

.

(22)

It an be learly seen that supermartingales are stohasti analogues of non-inreasing

sequenes of the real numbers. As a result, supermartingales (as well as submartin-

gales) under some onditions an have a limit(what is infat arandomvariable). The

orresponding statement are shown in the following theorem [9℄.

Theorem 3.1 (Convergene theorem). If

{X n , F n }

is a supermartingale, suh that

inf n E [X n ] > −∞

,

X n = min{X n , 0}

then

| lim

n→∞ X n | < ∞.

(28)

From the theorem above itispossibletoonlude: if

{X n , F n }

isa non-negative super- martingale, then a.s. (with probability 1)there exists its limit.

3.4 Convergene of the random variables

Let

{X 1 , ..., X n , ...}

is a sequene of random variables, whih are dened on the same

probability spae

{Ω, F , P }

.

1.

X n

− → P X

(inprobability) if

∀ǫ > 0 lim

n→∞

P (|X n − X| > ǫ) = 0

,

2.

X n

−−→ m.s. X

(inmean square) if

lim

n→∞ E{|X n − X| 2 } = 0

,

3.

X n

−−→ a.s. X

(almostsure) if

P

w ∈ Ω : lim

n→∞ X n (w) = X(w)

= 1,

4.

X n

− d

→ X

(indistributionorweakly)if

lim

n→∞ F n (x) = F (x)

forevery

x ∈ R

atwhih

F (x)

is ontinuous. Here

F n (x)

and

F

are the umulative distribution funtions (CDF) of the random variables

X n

and

X

respetively.

Let

X n

is a sequene of random variables distributed as follows:

P (X n = 0) = 1 − n 1

,

P (X n = 1) = n 1

. This sequene onverges in mean-square and therefore in probability but doesnot onverges almost sure. [26℄

(23)

0 2000 4000 6000 8000 10000 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Step(n)

X n

Figure 2: Convergene of the random variables inmean-square.

Let

X n

is a sequene of random variables distributed as follows:

P (X n = 0) = 1 − n 1 2

,

P (X n = 1) = n 1 2

. This sequene onverges almost sure and thereforein probability but does not onverges inmean-square. [26℄

0 2000 4000 6000 8000 10000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Step(n)

X n

Figure3: Convergene of the randomvariables almostsure.

As it an be seen from the gures, in the rst ase even after5000 steps, there an be

outliers from the aumulation point, but in the ase of the onvergene almost sure it

(24)

does not happen, as for any

w ∈ Ω

we annot have values beyond the arbitrary given

epsilon-neighborhoodof the limited point.

3.5 Stohasti quasi-gradient method

Havingunderstandingoftheforegoingoneptswewillproeeddesribingthestohasti

quasi-gradientalgorithmfortheoptimizationproblemswithmean-valueriterion,whih

is formulated below.

min u∈U F (u), F (u) = E{Φ(u, X)}.

(29)

Here

Φ(u, X)

asbeforedenotes anobjetive funtiondependent onarandom vetor

X

and the strategy

u

whih shouldbehosen from the set

U ⊆ R n

.

If the gradient

∇F (u)

was known we ould use the lassial gradient-desent method, starting from some initial point

u 0

. The main statement of the algorithm is written

below:

u ν+1 = Π U (u ν − ρ ν ∇F (u ν )) ,

(30)

where

Π U (u)

denotes aprojetion operatoronto the set

U

:

Π U (u) = arg min

y∈U

||u − y||.

(31)

The projetion operator are used here to restrit the loation of the points

u 0

,...,

u n

generated by the algorithmtothe admissiblearea

U

.

The problem to nd a projetion of the point onto the set is not always an easy task.

Generally the set

U

has to be losed and onvex in order to have a unique projetion

(it mighthappen that the projetion does not exists at allif the set

U

is open and the

projetion mightbe non-unique in the ase of non-onvex

U

).

The projetion anbe foundinstantlyforthe elementarytypesofthe set

U

. Suhases

are listed below.

1. A non-negative orthant:

U = E + n = {u : u ∈ R n , u j ≥ 0, j = 1, .., n}.

(32)

(25)

Let usdenote

p j

the

j

-th oordinateof the projetion

Π U (u)

, then

p j =

0, u j < 0,

u j , u j ≥ 0.

(33)

2. n-dimensional parallelepiped:

U = {u : u ∈ R n , a j ≤ u j ≤ b j , j = 1, .., n}

(34)

p j =

a j , u j < a j , u j , a j ≤ u j ≤ b j , b j , u j > b j .

(35)

where

p j

as usuall the

j

-thoordinate of the projetion

Π U (u)

.

3. A sphere with the radius

r

:

U = {u : u ∈ R n , ||u|| ≤ r}.

(36)

The projetion an be found as follows:

Π U (u) =

u, u ∈ U,

r

||u|| u, u / ∈ U,

(37)

4. A hyperplane:

U = {u : u ∈ R n , hc, yi = b},

(38)

where

c ∈ E n

,

b ∈ R

.

Π U (u) = u + b − hc, ui

||c|| 2 c.

(39)

5. A halfspae:

U = {u : u ∈ R n , hc, yi ≤ b},

(40)

Π U (u) =

( u, u ∈ U,

u + b−hc,ui ||c|| 2 c, u / ∈ U,

(41)

Considering the remarkregarding projetion operator agradientdesent algorithman

bewrittern:

1. Initialize the step

ν = 0

,hoose the initialpoint

u 0

.

(26)

2. Compute the next point

u ν+1

using the previous point

u ν

and an equation:

u ν+1 = Π U (u ν − ρ ν ∇F (u ν )) ,

(42)

To nd a projetion onto the set

U

use above mentioned remarks regarding the

projetion operator.

3. Chek the stoppingriterion. Ifthestoppingondition issatised, put

u = u ν+1

,

otherwise assign

ν = ν + 1

and gothe step 2.

Let

f (u)

is a funtion, dened on the set

U ⊂ R n

. The vetor

∂(u 0 ) ⊂ R n

is alled a subgradientof the funtion

f (u)

atthe point

u 0 ∈ U

if

∀u ∈ U

f (u) ≥ f (u 0 ) + h∂(u 0 ), u − u 0 i.

(43)

The formula above geometrially means that the graph of the funtion

f (u)

loated

abovethe linearfuntion

f (u 0 ) + h∂(u 0 ), u − u 0 i

andatthe point

u 0

thegraphsoinide.

Obviously if the funtion is dierentiable there is only one suh vetor

∂(u 0 )

equals to

the gradient

∇f (u 0 )

. Fora non-dierentiable funtion at the point

(u 0 )

there exists a

set of the subgradients, that is alledsubdierentialset.

From the point of view of optimization methods, if the objetive funtion

F (u)

is not

dierentiable,the gradient

∇F (u)

issubstituted by thesubgradient

∂F (u)

and the rest

of the gradient desent method is used without any hanges. Suh method is alled

subgradientalgorithm.

In ase ofthe stohasti objetivefuntionanexat gradient isunknown, andtherefore

its stohasti version isused:

u ν+1 = Π U (u ν − ρ ν ∇ u Φ(u ν , x ν )),

(44)

where

x ν , (ν = 1, ...)

is a realization of

X

atthe step

ν

of the algorithm. So that

x ν

is

a random variable, beause we do not know the value whih assumes

X

at the step

ν

.

Moreover the point

u ν , (ν = 1, ...)

is also a random variable, beause it depends on

x ν

and the previous point

u ν−1

.

Finallynotethatanexpression(44) desribesarandomproessstartingfromtheinitial

point

u 0

and under some further formulated onditions should lead us to the optimal

value of the optimizationproblem (29).

(27)

1. Initialize the step

ν = 0

,hoose the initialpoint

u 0

.

2. Generate arealization

x ν

of the randomvetor

X

. Compute astohastigradient

∇ u Φ(u ν , x ν )

using the previous point

u ν

and the realization

x ν

.

3. Calulate the next point

u ν+1

using the previous point

u ν

and an equation:

u ν+1 = Π U (u ν − ρ ν ∇ u Φ(u ν , x ν )),

(45)

To nd a projetion onto the set

U

use the mentioned remarks regarding the

projetion operator.

4. Chek the stoppingriterion. Ifthestoppingondition issatised, put

u = u ν+1

,

otherwise assign

ν = ν + 1

and gothe step 2.

If the funtion

Φ(u ν , x ν )

is not dierentiable with respet to variable

u

a stohasti

gradient is substituted by the quasi-gradient:

E

ζ ν | u 0 , ..., u ν −1

∈ ∂ u E [Φ(u ν , X )] .

(46)

Intuitively the above mentioned statement means that the onditional expetation of

the stohasti quasi-gradient atthe point

u ν

with respet to allprevious pointsshould

oinide almost sure (a.s.) with one of the subgradients of the objetive funtion at

the point

u ν

(i.e. belonging to the subdierential set). And the main statement of the stohasti quasi-gradient method looksas follows:

u ν+1 = Π U (u ν − ρ ν ζ ν ).

(47)

(28)

When we are dealing with a deterministi problem the step size

ρ ν+1

an be obtained

from the solution ofa one-dimensional optimizationproblem:

ρ ν+1 = min

ρ {F (u ν − ρ∇ u F (u ν ))},

(48)

meaning that inevery step we shouldmove towards the gradientdiretion tominimize

the objetive funtion as muhas possible.

In the stohasti version there are several problems whih do not allow to use above

shown tehnique. Firstly we donot know exatly anexpeted value of

Φ(u, X)

for the

given

u

(weknowonly

Φ(u, x)

). Seondlythestohastigradientatsomestepsaneven

give a wrong diretion, so any positive value

ρ

leads to the worse point than previous

one.

The next gures showthe lassialgradient desentmethodand the stohasti version.

In the rst ase we expliitlyan nd the gradient and omputing anoptimal step size

reah an optimum with

1

step. In the seond ase we only observe noisy data and get

an optimum of the mean-value riterion applying stohasti gradient method, what is

of ause requires more steps.

−10

−5 0

5

10

−10

−5 0 5 10

0 50 100 150 200

u 1 u 2

F(u 1 ,u 2 )=u 1 2 +u 2 2

Figure5: Illustration ofthe lassialgradient desent method.

(29)

−5 −10 5 0

10

−10

−5

0

5

10 0

100 200

u 2

u 1 F(u 1 ,u 2 )=E{u 1 2 +u 2 2 +Xu 1 +Yu 2 }

Figure6: Illustrationof the stohasti gradient method.

Toprooftheonvergenea.s. (47)ofthealgorithmwehavetomakeseveralassumptions

[11℄:

the funtion

E [Φ(u ν , X )]

isa onvex nite-valued funtion;

the set

U

isa onvex and ompat;

the parameters

ρ

and

γ

satisfy a.s.:

ρ ν ≥ 0, X ∞

ν=0

ρ ν = ∞, X ∞

ν=0

E

ρ ν |γ ν | + ρ 2 νν k 2

< ∞.

(49)

Thersttwoonditionsislassialrequirementsforanexisteneandauniquenessofthe

solution in onvex optimization. The last two requirements mostly aet the sequene

of the step size. The sequene has to be very speial: it should onverge, but not so

fast so that the point generated by the algorithm will be able to reah an optimum

independently of the initialpoint wherethe algorithmstarts.

In order to highlightthe main steps of the proof [12℄,let usassume the followingnota-

tions:

u ∈ U

is anoptimalsolution of the problem(29),

F ν = σ{u 0 , ..., u ν }

- asigma-algebra,generated by the random variables

{u 0 , ..., u ν }

.

(30)

Themainideaoftheproofistoshowthatunderonditionsofthetheorem

E [|u ν+1 − u k |F ν ]

is a non-negative supermartingale, therefore a.s. it has some limit and then to proof

that the limitis0, meaning

u ν → u

a.s.

Projetion implyes:

ku ν+1 − u k 2 ≤ ku ν − u k 2 − 2ρ ν hζ ν , u ν − u i + ρ 2 νν k 2

.

Take onditionalexpetationw.r.t.

F ν

:

E [ku ν+1 − u k 2 |F ν ] ≤ ku ν − u k 2 − 2ρ ν hE [ζ ν |F ν ] , u ν − u i + ρ 2 ν E [kζ ν k 2 |F ν ]

.

By the onditions of the theorem:

γ ν + hE [ζ ν |F ν ] , u − u ν i ≥ E [Φ(u , X )] − E [Φ(u ν , X )] ≥ 0 = ⇒ hE [ζ ν |F ν ] , u − u ν i ≥ −γ ν

.

Two previous steps together results in:

E [ku ν+1 − u k 2 |F ν ] ≤ ku ν − u k 2 + 2ρ ν γ ν + ρ 2 ν E [kζ ν k 2 |F ν ] ≤ ku ν − u k 2 + 2ρ ν |γ ν | + ρ 2 ν E

ν k 2 |F ν

| {z }

R ν

= ku ν − u k 2 + R ν

.

Assume

Y ν = ku − u ν k 2 + P

k=0

R k

, hene:

E [Y ν+1 |F ν ] ≤ Y ν , Y ν ≥ 0 = ⇒ Y ν → Y

(a.s.) - onvergene theorem for

supermartingales.

Reursively, taking the full expetation:

E [ku ν+1 − u k 2 ] ≤ ku 0 − u k 2 + P ν k=0

ρ 2 k E kζ k k 2

− 2 P ν k=0

ρ k hE ζ k

, u k − u i = ⇒ E [ku ν+1 − u k 2 ]−E [ku 0 − u k 2 ]−

P ν k=0

E [R k ] ≤ 2 P ν k=0

ρ k E [Φ(u , X )] − E

Φ(u k , X ) .

Considering onditions of the theorem:

P ∞ k=0

E [R k ] < ∞

and

E [Φ(u , X )] < E

Φ(u k , X )

we have:

−∞ <

P ∞ k=0

ρ k E [Φ(u , X )] − E

Φ(u k , X )

≤ 0

,therefore

E

Φ(u k , X )

→ E [Φ(u , X )]

.

3.6 Kiefer-Wolfowitz algorithm

Ifthesubgradient

∂ u Φ(u, x)

oreven thegradient

∇ u Φ(u, x)

ofthefuntion

Φ(u, x)

w.r.t.

variable

u

an be expliitly found, then in every step of the SQG-algorithm

∂ u Φ(u, x)

(31)

or

∇ u Φ(u, x)

is usually used as a quasi-gradient estimate, otherwise a nite-dierene approximationis employed.

ζ k = 1 2δ k

X n

j=1

Φ(u k + δ k e j , x k ) − Φ(u k − δ k e j , x k )

e j .

(50)

where

x k

isasusualarealization oftherandomvetorX atthe step

k

of thealgorithm,

e j

,

j = 1, ..., n

are the unit vetors direted alongthe oordinateaxes.

ConsideringtheformulaabovethemethodofKieferandWolfowitzassumesthefollowing

form:

1. Initialize the step

k = 0

,hoose the initialpoint

u 0

.

2. Generate a realization

x k

of the random vetor

X

. Compute a stohasti quasi-

gradient

ζ k

viaformula(50) using the previous point

u k

and the realization

x k

.

3. Calulatethenextpoint

u k+1

usingthepreviouspoint

u k

aordingtotheformula:

u k+1 = Π U (u k − ρ k ζ k ),

(51)

To nd a projetion onto the set

U

use the mentioned remarks regarding the

projetion operator.

4. Chek the stoppingriterion. Ifthestoppingondition issatised, put

u = u k+1

,

otherwise assign

k = k + 1

and gothe step 2 .

Asitan beseenthealgorithmrequires

2n

evaluationsoftheobjetivefuntion

Φ(u, X)

in every step of the method,where

n

is a dimension of the ontrolvaraible

u

.

The onvergene theorem of Kiefer-Wolfowits [17℄ and its generalization an be found

in [4,9℄.

Theorem 3.2. If the below onditions are valid:

the funtion

f (u) = E[Φ(u, X)]

has only extremum

u ∈ int{U }

;

the seond derivatives of

f (u)

are ontinuous and bounded;

the variane

D[Φ(u, X)] ≤ C < ∞

for all

u ∈ U

;

the sequenes

ρ k

and

δ k

satisfy

ρ k ≥ 0, X ∞

k=1

ρ k = ∞, X ∞

k=1

ρ k

δ k

2

< ∞, X ∞

k=1

ρ k |δ k | < ∞.

(52)

then the sequene generated by Kiefer-Wolfowits algorithm onverges to the optimal so-

lution of the mean-value problem almost sure (

u k → u

a.s.).

(32)

As it was mentioned the step size is a very important feature diretly aeting perfor-

maneof themethod. Asitoftenhappensthe theoretialresults regardingonvergene

of the methoddo not suggest the way to inrease performane and therefore empirial

shemes are used. An opposite side of suh proess is that adaptive algorithmsshould

be theoretially justied otherwise there exist a risk to fae the speial ase when an

inaurate adaptive method fails. As an example: a stohasti approximation adap-

tive rule for the step size have to satisfy step size onditions to guarantee onvergene

otherwise all the proof must berevised.

Additionally adaptation should not signiantly inrease omplexity of the original

methodand preferably donot have alot of tuning parameters tobe adjusted. Despite

this hallenges several suessful results regarding adaptive stohasti approximation

are known.

3.7.1 Kesten Rule

The rst idea touse adaptive step size in stohasti approximation belongs toKesten.

His method based on the simple idea that if we are far from the optimum, the errors

tend to have the same sign but when we are getting loser, the errors start alternate.

Under the error here it is assumed the dierene between the previous

u n−1

and the

urrent

u n

point. Kesten suggested the followingrule:

ρ n−1 = a

a + K n − 1 ,

(53)

where

a

is a parameter to be alibrated.

K n

is a parameter, whih ounts the number

of times that the sign of the errors have been hanged.

K n =

n n = 1, 2,

K n−1 + 1 {hǫ nn− 1 i<0} n > 2,

(54)

where

ǫ n = u n−1 − u n .

(55)

To asses the performane of the adaptive method let us have a look at the following

example:

min E

u 2 1 + u 2 2 + max(u 2 1 , u 2 2 ) + u 1 X , X ∈ N (0, 1).

(56)

(33)

Obviously, the optimum of the funtion is

u 1 = 0, u 2 = 0

, beause

∀u 1 , u 2 u 2 1 + u 2 2 + max(u 2 1 , u 2 2 ) ≥ 0

and all terms

u 2 1 ≥ 0, u 2 2 ≥ 0, max(u 2 1 , u 2 2 ) ≥ 0

.

−10

−5 0 5 10

−10

−5 0

5 10 0

50 100 150 200 250 300

u 2 u 1

F(u 1 ,u 2 )=E{u 1 2 +u 2 2 +max(u 1 2 ,u 2 2 )+u 1 X}

Figure 7: Optimizationfuntion.

The stohasti gradientan bewritten expliitlyinthe form:

ξ(u, X) =

2u 1 + 2u 1 I u 2

1 >u 2 2 + X 2u 2 + 2u 2 I u 2 1 ≤u 2 2

.

(57)

Let us hoose the step size

ρ k

aording tothe rule:

ρ k = 0.1 · k 1

. In the standard ase

the steps will be too small, and the adaptive rule will preserve the step size from the

rapid drop enhaning onvergene.

On the next gures we an see how the ontrol variables where hanging towards the

optimumover thesteps. Initialvalue ofthe step sizewere hosenthe samefor adaptive

and non-adaptive methods. In fat non-adaptive algorithms even after 1000 iterations

were relativelyfar from the optimum, while the adaptive shemes were able torih the

steady state after100 steps.

(34)

0 200 400 600 800 1000 0

2 4 6

Step(N) u 1

0 200 400 600 800 1000

0 2 4 6

Step(N) u 2

0 200 400 600 800 1000

0 2 4 6

Step(N) u 1

0 200 400 600 800 1000

0 2 4 6

Step(N) u 2

Figure 8: Small step size. Stohasti Gradient Desent (left). Stohasti Gradient

Desent with adaptiveKesten rule (right).

0 200 400 600 800 1000

0 2 4 6 8

Step(N) u 1

0 200 400 600 800 1000

0 2 4 6 8

Step(N) u 2

0 200 400 600 800 1000

0 2 4 6 8

Step(N) u 1

0 200 400 600 800 1000

0 2 4 6 8

Step(N) u 2

Figure 9: Kiefer-Wolfowitz algorithm(left). Kiefer-Wolfowitz algorithmwith adaptive

Kesten rule (right).

As it an be seen the test resultsagree with the theoretial reasoning.

Now let us test the adaptive rule for the large step lengths on the same test example,

so

ρ k = 10 · k 1

. At the beginning when

k

is relatively small, the steps will be too large

and the points,generated by the algorithmswilltry toleavethe admissiblearea

U

but

due tothe projetion operatorthey willbe returned to the border of the set

U

. When

the step size assumes the reasonable values, the proedures will start onverge to the

optimum. Here the adaptive rule doesnot enhane onvergene asit does not derease

the steps faster than instandard versions.

(35)

0 200 400 600 800 1000

−10

−5 0 5 10

Step(N) u 1

0 200 400 600 800 1000

−10

−5 0 5 10

Step(N) u 2

0 200 400 600 800 1000

−10

−5 0 5 10

Step(N) u 1

0 200 400 600 800 1000

−10

−5 0 5 10

Step(N) u 2

Figure 10: Stohasti GradientDesent (left). Stohasti GradientDesent with adap-

tiveKesten rule (right).

0 200 400 600 800 1000

−10

−5 0 5 10

Step(N) u 1

0 200 400 600 800 1000

−10

−5 0 5 10

Step(N) u 2

0 200 400 600 800 1000

−10

−5 0 5 10

Step(N) u 1

0 200 400 600 800 1000

−10

−5 0 5 10

Step(N) u 2

Figure11: Kiefer-Wolfowitzalgorithm(left). Kiefer-Wolfowitz algorithmwith adaptive

Kesten rule (right).

The test examples again agree with theoretial reasoning. This example also demon-

strates the behaviourof the stohasti approximation methods: the onvergene traes

w.r.t. oordinate

u 2

are smoother, then w.r.t to

u 1

asa stohastiity mostly aets the rst oordinatedue tothe term

u 1 X

inthe objetive funtion.

FinallyregardingKestenruleweanonlude thatitreasonabletouse forthe relatively

smallormiddlesteplengthasitombinesonstantstep size rule(whenwe arefar from

the optimum) with the usual step size rule (whenwe are lose tothe end).

(36)

Uryasev [28℄ suggested more advaned adaptive rule whih an derease the steps as

well asinrease them depending onthe generated sequene of points:

ρ k+1 = min(¯ ρ, ρ k a −hξ k+1 ,∆ k+1 i−δρ k ),

k+1 = u k+1 − u k , a k > 1, δ > 0,

(58)

To understandthe meaningof the termsinthe mentionedabove ruleletushave alook

at the main formulaof SQG-algorithm:

u k+1 = u k − ρ k ξ k .

(59)

It would belogial to hoose the step, thatminimizes the funtion

F k (ρ)

w.r.t

ρ

:

F k (ρ) = E[Φ(u k − ρξ k ) | F k ].

(60)

A omputationof

F k (ρ)

isaverydiulttask,thereforeletusdierentiate

Φ(u k − ρξ k )

w.r.t

ρ

at the point

ρ k

.

∂ ρ Φ(u k − ρξ k )| ρ k = −hξ k , ∂Φ(u k − ρ k ξ k )i = −hξ k , ξ k+1 i.

(61)

Hene

−E[hξ k , ξ k+1 i | F k ] ∈ ∂F k (ρ k )

and the following gradient proedure an be used

to modify the step size:

ρ k+1 = ρ k + λ k hξ k , ξ k+1 i.

(62)

The adaptive proedure (58) diretly omes from the previous formula and works as

follows: the term

k+1 , ∆ k+1 i

gives information wether or not the minimum of

F k (ρ)

w.r.t

ρ

has been reahed. If

−hξ k+1 , ∆ k+1 i > 0

then with high probability we an say thattheminimumhasnotbeenahieved yetandthestep

ρ k

willbeinreased,otherwise

it dereases.

Let ustest this adaptive rule on the previous modelexample.

Viittaukset

LIITTYVÄT TIEDOSTOT

What are the benefits for engineering decision making applicable to both probability trees and Bayesian

What quantity does a wave or wind spectrum contain, and how can it be graphically presented.. How is probability linked to the

Keywords: stochastic integral equations, stochastic fractional differential equa- tion, regularity, nonnegative operator, Volterra equation, singular kernel.. Correspondence

To demonstrate that level set methods allow topology changes in numeri- cal shape optimization we studied equation (7) in an L-shaped domain, where the initial position of the

In Article 1 VAR and especially VEqC models are developed for modeling long term asset returns and liabilities of a Finnish pension insurance company.. The considered stochastic

Furthermore, to overcome another key bottleneck of stochastic optimization which is the heavy computation of proximal operators while maintaining fast convergence, we propose

The modeling ap- proaches, namely the deterministic differential equation modeling, stochastic differential equation modeling, and the stochastic simulation algorithm, are then

Lapan (2018) says that RL uses “many well-established methods of supervised learning” like deep neural networks for function approximation and stochastic gradient descent, as well