Faulty of Tehnology
Department of Mathematis and Physis
Laboratory of Applied Mathematis
Stohasti approximation methods in
stohasti optimization
The topi of this Master's thesis was approved by the ounil of the Department of
Mathematis and Physis on
25 th of Otober, 2013.
The examiners of the thesis were:
Prof. Heikki Haario and PHD Tuomo Kauranne.
The thesis was supervised by Professor Heikki Haario.
In Lappeenranta,8.11.2013
Ilya Bobylev
Ruskonlahdenkatu 13-15 E6
53850 Lappeenranta
Phone:
+3580417061366
Ilya.Bobylev[at℄lut.
Lappeenranta University of Tehnology
Faulty of Tehnology
Department of Mathematis and Physis
Ilya Bobylev
Stohasti approximation methods in stohasti optimization
Thesis for the Degree of Master of Siene inTehnology
2013 year
72 pages, 33gures,6 tables
Supervisors: Prof. Heikki Haario, PHD Tuomo Kauranne.
Examiners: Prof. Heikki Haario, PHD Tuomo Kauranne.
Key words: Stohasti Optimization, Deterministi equivalents, Stohasti Quasi-
gradient algorithm (SQG), mean-value, value-at-risk (VaR), onditional value-at-risk
(CVaR).
Stohastiapproximationmethodsforstohastioptimizationareonsidered. Reviewed
the main methods of stohasti approximation: stohasti quasi-gradient algorithm,
Kiefer-Wolfowitz algorithm and adaptive rules for them, simultaneous perturbation
stohasti approximation (SPSA) algorithm. Suggested the model and the solution
of the retailer'sprot optimizationproblemand onsidered an appliationof the SQG-
algorithm for the optimization problems with objetive funtions given in the form of
ordinary dierentialequation.
First of all, I would like to express my respet to the Department of Mathematis for
the sholarship duringmy studies,otherwise my stay in Finlandwould not bepossible.
Seondly, I have to thank the Finnishatmosphere: peaeful and quiet, that gives wide
possibilitiesfor produtive study and work.
Finally my sinere gratitude goes toprofessors Heikki Haario, Matti Heilioand Tuomo
Kauranne for their supervision and advies.
Lappeenranta, 2013.
Ilya Bobylev
•
CDF -umulative distributionfuntion.•
PDF - probabilitydensity funtion.•
SQG - stohasti quasi-gradientalgorithm.•
VaR- value-at-risk.•
CVaR - onditional value-at-risk.•
w.r.t - with respet to.•
a.s. - onvergene almost sure.• σ{X 1 , X 2 , ..., X n }
- sigma-algebra, generated by the distribution of random vari- ablesX 1 , X 2 , ..., X n.
•
MC - Monte-Carlosimulation.•
MCMC -Markov Chain Monte-Carlosimulation.•
LP - linearprogramming.•
SPSA -Simultaneous perturbationstohasti approximation.•
SA - Stohasti approximation.•
KW - Kiefer-Wolfowitz algorithm.•
ODE - Ordinarydierentialequation.1 Introdution 1
1.1 Objetive of the thesis . . . 1
1.2 Struture of the thesis . . . 1
1.3 Sienti novelty . . . 2
2 Optimization riteria, used in stohasti programming 3 2.0.1 Mean-value riterion . . . 3
2.0.2 Minimaxriterion . . . 4
2.0.3 Value-at-Risk(VaR)riterion . . . 4
2.0.4 Conditional Value-at-Risk (CVaR)riterion . . . 5
2.0.5 Demonstration example . . . 5
3 Stohasti approximation 8 3.1 Conditional mathematialexpetation . . . 10
3.2 Disrete time martingales . . . 11
3.3 Convergene of the supermartingales . . . 12
3.4 Convergene of the random variables . . . 12
3.5 Stohasti quasi-gradientmethod . . . 14
3.6 Kiefer-Wolfowitzalgorithm . . . 20
3.7 Reipesfor step sizes . . . 22
3.7.1 Kesten Rule . . . 22
3.7.2 Adaptiverule of Uryasev . . . 26
3.8 Simultaneous perturbation stohasti approximation. . . 27
4 Appliation problem: EletriityRetailer Prot Optimization 31
4.2 Solutionwith mean-valueriterion . . . 33
4.2.1 Deterministiequivalent . . . 33
4.2.2 SQG-solution . . . 38
4.3 Solutionwith CVaRriterion . . . 38
4.3.1 Deterministiequivalent . . . 40
4.3.2 SQG-solution . . . 42
4.4 Numerialexamples. . . 43
4.5 Analysisof the results . . . 46
5 Appliation problem: SQG-algorithm for the objetive funtions in ODE-form 47 5.1 Problemformulation . . . 47
5.2 Solutiontehnique . . . 48
5.2.1 Equation invariations . . . 49
5.2.2 Appliationof the equation in variationsin the onsidered ase. . 51
5.3 Numerialexamples. . . 54
5.4 Analysisof the results . . . 62
6 Conlusion and main results 63 Referenes 64 7 Appendies 66 7.1 Implementation of the stohasti approximationalgorithms . . . 66
7.1.1 ImplementationoftheSQG-algorithmwith expliitlydenedgra- dient . . . 67
dient and adaptive Kestenrule . . . 67
7.1.3 ImplementationoftheSQG-algorithmwith expliitlydenedgra-
dient and adaptive Uryasevrule . . . 67
7.1.4 Implementation of the Kiefer-Wolfowitzalgorithm . . . 68
7.1.5 Implementation of the Kiefer-Wolfowitz algorithmwith adaptive
Kesten rule . . . 68
7.1.6 Implementation of the Kiefer-Wolfowitz algorithmwith adaptive
Uryasev rule . . . 69
7.1.7 Implementation of the SPSA algorithm . . . 69
7.1.8 The funtion whih alulates nite dierene approximation of
the stohasti quasi-gradient . . . 70
7.1.9 The funtion displayingthe onvergene rate . . . 70
7.1.10 Eletriity retailer prot optimization- solution with mathemat-
ialexpetation.. . . 70
1 Numerialvalues. . . 43
2 Disrete distribution of the random demand. . . 43
3 Disrete distribution of the random prie.. . . 43
4 Measurements of the reation. . . 54
5 Measurements of the reation . . . 57
6 Calulation results . . . 59
1 Illustrationof the solution. . . 7
2 Convergene of the random variablesin mean-square. . . 13
3 Convergene of the random variablesalmost sure. . . 13
4 Illustrationof the SQG-algorithm.. . . 17
5 Illustrationof the lassialgradient desent method.. . . 18
6 Illustrationof the stohasti gradientmethod. . . 19
7 Optimizationfuntion. . . 23
8 Small step size. Stohasti Gradient Desent (left). Stohasti Gradient Desent with adaptive Kestenrule (right). . . 24
9 Kiefer-Wolfowitzalgorithm(left). Kiefer-Wolfowitzalgorithmwithadap- tiveKesten rule (right). . . 24
10 Stohasti Gradient Desent (left). Stohasti Gradient Desent with adaptiveKesten rule (right). . . 25
11 Kiefer-Wolfowitzalgorithm(left). Kiefer-Wolfowitzalgorithmwithadap- tiveKesten rule (right). . . 25
12 Stohasti Gradient Desent (left). Stohasti Gradient Desent with adaptiveUryasev rule (right). . . 27
13 Kiefer-Wolfowitzalgorithm(left). Kiefer-Wolfowitzalgorithmwithadap- tiveUryasev rule (right). . . 27
14 Comparison of the SQG and SPSA onvergene traes. . . 29
15 Illustrationof the SPSA onvergene. . . 30
16 Distributionof the imbalane. . . 33
17 Solutionwith mean-valueriterion forthe disretedistribution. . . 44
18 Solutionwith mean-valueriterion forthe ontinuous distribution. . . 44
19 SQG-algorithmfor the ontinuous distribution. . . 45
21 Solutionwith CVaRriterionfor the ontinuous distribution. . . 46
22 SQG-solution. . . 46
23 The sheme of the solution tehnique. . . 53
24 Distributionof the parametersestimated by MCMC for the rst example. 55 25 Objetive funtionand the trae by the algorithmfor the rst test-ase. . 56
26 Convergene of the methodfor the rst test-ase. . . 56
27 Distributionof the parameters estimated by MCMC for the seondtest- ase. . . 57
28 SQG with equation invariations enhanement forthe seondexample. . 59
29 Convergene of the SQG-algorithmfor the seond example. . . 60
30 Kefer-Wolfowitz algroithmfor the seond test-ase. . . 60
31 Convergene of the KW-algorithm for the seondexample. . . 61
32 SPSA-algorithm forthe seondexample. . . 61
33 Convergene of the SPSA-algorithm forthe seondtest-ase. . . 62
Optimization problems under unertainty have been intensively studied by many re-
searhes during the last 50 years. In fat mathematial models based on stohasti
theory are more approahed tothe real life then deterministi beause the random pa-
rameters whih onsidered in objetive funtion or in onstraints reet the nature of
unertainty duringthe deision making.
Stohasti models are widely used in insurane, nanial and portfolio optimization
wherethebehaviorofmarketannotbepreiselyforeastedorinaerospaetehnologies
where it isrequired toobtain robust results.
Generally the solution of the stohasti optimization problems looks as follows: rstly
researher attempts to derive a deterministi equivalent and use deterministi method
of linear or nonlinear optimization depending on the ase. If the ost funtion is quite
ompliated but if the gradient an be expliitlyfound for the given realization of the
random parameters or suessfully approximated with nite dierenes, then stohas-
ti approximation (SA) will be a good hoie as a solution tehnique. Finally if the
methodfails (example: anobjetivefuntion ismultimodal, ontains both disreteand
ontinuous arguments), thenthe Monte-Carlosimulation,genetialgorithms,simulated
annealing or other methodsof globaloptimization are used.
Robustness, simpliity and omputation speed makes stohasti approximation a very
popularalgorithmforthe wide lass ofoptimizationproblems therefore developmentof
SA is animportantand demanded researh area.
1.1 Objetive of the thesis
The main purpose of the thesis is tostudy and develop stohasti approximation algo-
rithms whihare used instohasti optimization.
1.2 Struture of the thesis
The material of the thesis is organized as follows: the rst 3 hapters are devoted for
the denitions, theoretial bakground of the stohasti approximation and are mainly
neededtopreparethereaderfortheappliationproblemsthatarestudiedinthehapters
4-5.
The thesis ontains several new theoretial results: deterministiequivalents of the op-
timizationproblemswithmean-valueandCVaR- eletriityretailerprotoptimization
problem(seetheoremsintheorrespondingsetion)andinterestingomputationalidea:
equationinvariationsused inSQG-algorithmforthe objetivefuntionsgiveninODE-
form.
The rstresultgivesanopportunity tosubstitutethe solutionofthe stohastiproblem
with deterministi one and instantly get the result using a well-known simplex-method
for the linear-programming.
The seond result signiantly inreases omputational speed of the stohasti approx-
imation proedure for the onsidered problem.
ming
Let
Φ(u, X)
further in this setion denotes a ost funtion, whereu ∈ U ⊂ R n is an optimization strategy and X is a random vetor with the realizations belonging to the
set
E ⊆ R m.
2.0.1 Mean-value riterion
Historially the rst wasa mean-value riterionwhihan bedened as follows:
min u∈U E[Φ(u, X)].
(1)Intuitively the riterion means to hose those strategy, whih minimizesaverage loses,
represented by the objetive funtion.
The mainpropertiesofthe mean-valueriterionthat allowsonetoderiveasolutionand
build eetive algorithmsare:
•
Linearity:E[aX + bY + c] = aE[X] + bE [Y ] + c,
(2)where
X, Y
are randomvariablesand parametersa, b, c ∈ R
;•
Simple representation for the disrete distribution with nite number of realiza- tions:E[X] = X n
i=1
x i p i ,
(3)where
x i, i = 1, .., n
are realisations of the random vetor X, and p i, i = 1, .., n
are the
i = 1, .., n
are theorresponding probabilities,so that
P (X = x i ) = p i.
Unfortunately, the mean-value riterion together with suh attrative properties has a
numberofaseswhenitfails: togetabasiideaoneanimaginethesituationwhenthe
deision about the onditionsof the patientsina lini are made,based onthe average
temperature amongthem. The nonsense is obvious- it is possible to have the value of
the riterion
36, 6 ◦ but all the patients will be about to die: the rst half with ∼ 34C ◦
and the rest with
∼ 40C ◦.
Theminimaxriterionomesfromthegametheoryandinthestatistialanalysismeans
to hose those strategy,whih attainsthe minimum of loses in the worst possible ase:
min u∈U max
X ∈E Φ(u, X). (4)
The problem statement an be treated as a game against an aggressive nature all the
time forming the worst possible ase for the deision maker. The positive side of the
strategy hosen aording to minimax is an assurane that the losses will be always
below foreasted. Drawbak eets are obvious: suh strategy ould be too autious
and wasteful and often unahievable. For example in the book [4℄ the hapter devoted
to optimization of the airraft runway area ontains a joke that in order to solve the
problem with the minimax riterionit is required to onrete the whole globe together
with oeans and sees.
2.0.3 Value-at-Risk (VaR) riterion
To show an idea behind the VaR riterion let us dene two onepts: the probability
funtion
P φ (u)
and the quantilefuntionφ α (u)
[4℄.P φ (u) = P {Φ(u, X) ≤ φ},
(5)φ α (u) = min {ϕ : P ϕ (u) ≥ α}.
(6)The probability funtionrepresentsprobability thatthe ostfuntion
Φ(u, X)
doesnotexeed thelevel
φ
forahosen xedstrategyu
,while thequartilefuntionindiatestheorrespondingminimallevel. Finally,togettheVaR-strategy,thefollowingoptimization
task haveto be solved:
min u∈U φ α (u).
(7)Unfortunately VaR laks of some desirable theoretial properties. Firstly, it does not
preserve onvexity, i.e. onvexity of the objetive funtion w.r.t. to a strategy, does
not always results in a onvexity of the quantile funtion. Seondly, there are quite a
smallnumberof ases when it ispossibleto derivea gradient w.r.t toa strategy of the
VaRexpliitly. Thismakesthe solutionproessofVaRproblems extremelyompliated
and therefore the Monte-Carlosimulationis oftenused. Oneapproah todeal with the
quantile riterionisto build anupper-value estimate, based onthe ondene set.
minimax problem:
φ α (u) = min
E∈ E α max
x∈E Φ(u, x),
where
E α isafamilyoftheondentsetsP (E) ≥ α
,E ∈ E α. Obviouslyifinsteadofthe
optimalondentset E ∗ willbeusedanarbitraryoneE ˆ
, thenanupperestimatetothe
E ∗ willbeusedanarbitraryoneE ˆ
, thenanupperestimatetothe
optimal solutionwillbe found. Basedonthat observation therewere many appliation
problem solved [23, 24,25℄.
2.0.4 Conditional Value-at-Risk (CVaR) riterion
The riterion wasreated asan enhanement to VaRand represents anaverage rate of
loses exeeding the level
φ α (u)
:ψ α (u) = E[Φ(u, X) | Φ(u, X) ≥ φ α (u)] = 1 1 − α
Z
Φ(u,X(w))≥φ α (u)
Φ(u, X(w))dP (w),
(8)Aordingto[7℄, aminimizationofCVaRoverthestrategy
u ∈ U
equals tothe solutionof the following optimizationproblem:
ψ ∗ = min
(u,φ)∈U×R F α (u, φ), (9)
where
F α (u, φ) = φ + 1
1 − α · E [max{F (u, Z) − φ, 0}.
(10)This results opens wide possibilities for the CVaR-optimization via well-known teh-
niques of the mean-value minimization.
2.0.5 Demonstration example
Let us onsider a simple example to show the dierene between the foregoing riteria
in pratie. Assume that a ost funtion isdened aspresented below:
Φ(u, X ) = u 2 + X
(11)and a randomparameter is a normally-distributed
X ∼ N (0, 1)
.•
Mean-value solution:min u∈ R {E[u 2 + X]}.
(12)istiequivalent:
min u∈ R u 2 (13)
and the orrespondingsolution is
u ∗ = 0
,E[u ∗ , X ] = 0
.•
Minimaxsolution:min u∈ R max
X {u 2 + X}. (14)
The Normaldistributionhas aontinuousCDF, therefore
∀ C ∈ R
:P (X > C ) >
0
. As a result, the left hand side of the previous equation is anon-restrited and the problem doesnot have a solution.•
VaR-solution:min u∈ R {φ α (u)}.
(15)The probability funtion looks asfollows:
P φ (u) = P (u 2 + X ≤ φ) = P (X ≤ φ − u 2 ) = F X (φ − u 2 )
,whereF X isaCDF ofX
The quantile funtion:
φ α (u) = min{φ : P φ (u) ≥ α} = min{φ : F X (φ − u 2 ) ≥ α}
By the denition CDF is a monotoniallynon-dereasing funtion, therefore:
F X (φ − u 2 ) ≥ α ⇐⇒ φ − u 2 ≥ x α = ⇒ φ ≥ u 2 + x α = ⇒ φ α (u) = u 2 + x α
Finally,the deterministiequivalentassumes the shown belowform:
min u∈ R {u 2 + x α }.
(16)As it an be learly seen:
u ∗ = 0
,φ α (u ∗ ) = x α.
•
CVaR-solution: Considering the previous VaRsoultion:ψ α (u) = E[Φ(u, X) | Φ(u, X) ≥ φ α (u)] = E[u 2 + x | u 2 + x ≥ u 2 + x α ] = u 2 + E[x | x ≥ x α ] = u 2 + y α, wherey α = α − CV aR N(0,1). Finally,the determin-
istiequivalent assumesthe following form:
min u∈ R {u 2 + y α }
(17)and the solution:
u ∗ = 0
,ψ α (u ∗ ) = y α.
It is worth to note, that in this partiulary example, the strategy
u = 0
is an optimalsimultaneously for the mean-value, VaR, and CVaR riteria.
−6 −4 −2 0 2 4 6 8 0
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
X: 2.34 Y: 0 X: 1.96
Y: 0
Mean−value α −CVaR α −VaR
Max Loss
∞ Loss PDF(Loss)
Figure 1: Illustrationof the solution.
A historyof the stohasti approximationstarts fromtheartile ofRobbinsand Monro
[20℄ where authors applied a reursive SA proedure to nd a root of the equation
observing noisy data. As it has been disovered, there is noneed toknow exat values
of the funtion, beause during the solution we are only interested in the diretion of
the root. As aresultof thisobservation, the proessan workwith noisyvalues instead
of the exat data.
Let us review several illustrative appliation problems from the real life where the
stohasti approximationalgorithmsan be quiteuseful.
•
Duringthetestingofinsetiidesthereareaproblemofdeterminingthemaximum dose whih guarantee the intensity of the reation.•
Thehardness ofthe iron-opperalloydependsonthetimeoftemperatureimpat.Let
u
denotes a time,andY (u)
-the hardness ofalloy. The omputationproblemis to determine suh
u
when the hardness of alloyY (u)
reahesα
. It is also awell-known fat that the hardness of alloy varies overthe items.
•
The sensitivityof the explosivesubstane tothe physialimpatare measured by hitting. The smallitem ofthe substane are punhed by solidobjetthrown fromthe xed height. Some samples willexplode and some of them not. The problem
is todetermine the ritialheightfor the substane.
•
A grain eld are fertilized by some hemials. Let u denotes an amount of thefertilizer and Y(u) denotes a orresponding amount of grain gathered from the
eld. It an be learlyseen that lak of the fertilizer (aswellas over-fertilization)
yields to a redution of the rops. Only some optimalamount of hemials leads
to a signiant outome. Moreover the proliness of the eld are hanging over
the years even if anamount of the fertilizer
u
remainsthe same.The solution of the rst three ases an be organized aording to the proedure of
Robins and Monro: a tester piks an arbitrary value
x 1 from the admissible area, on-
duts anexperimentandobservesarealisation
y(x 1 )
of therandomvariableY (x 1 )
withexpetation
M (x 1 ) = E{Y (x 1 )}
, whereM
is some inreasing funtion. The tester alsohooses adereasing sequene
a n = n c,where c
is apositive onstantand n
is anumber
of the step. The problemto besolved is todetermine suh
θ
, thatM (θ) = α
. For thenext experiment he takes
x
aording to the rule:To understand the main idea of the method assume
α = 0
. In this ase the previousstatement yields
x n+1 = x n − c
n y(x n ).
(19)If
y(x n ) > 0
, thenx n+1 < x n, and y(x n ) < 0
leads orrespondingly to x n+1 > x n,
meaningthattheequation(19)looksreasonablesinewearesearhing for
θ : M(θ) = 0
.The solution of the last problem an be organized with the help of stohasti quasi-
gradient method whih is studiedin detailsfurther inthe thesis.
Kiefer andWolfowitz (in1d-ase) [17℄and Blum[21℄(inmultidimensionalase) applied
stohastiapproximationto optimizemean-value funtionals.
ThebooksofErmoliev[9℄,Powel[22℄ontaineasyunderstandableproofsandmotivation
of using stohastiapproximation.
KanandKibzun[4℄haveappliedstohastiapproximationfortheoptimizationproblems
with VaR riterion.
Spall [19℄ have suggested to use random perturbations in the lassial KW-proedure
and dereasethenumberof evaluationsofthe objetivefuntion. Moreoverhispersonal
page [19℄ lists alarge numberof referenes regarding appliationsof SPSA.
Let us list several dierent areas of siene where the SPSA-algorithmshave been su-
essfully applied[19℄:
•
Airraft modelingand ontrol•
Atmospheri and planetarymodeling•
Cardiologial data analysis•
Noise anellation•
Queuing network design•
Robot ontrol•
Sensor plaement•
Tra signal timing orother transportation problems•
Underground mine detetiononvergene speed and suggested several dierent versionsof the proedure.
Thenextsetionontainsexplanationsofthestohastiapproximationproedureswhih
are used for the optimizationof ost-funtions.
The presentation is organized as follows: rstly we introdue onditional expetation,
then onitsbasissome denitionsfrommartingale theorywillbeshown. Havinginsight
in the martingale onvergene theory we gradually start the main proof regarding the
onvergeneoftheSQG-proedure. Attheendofthesetionwedesribearelativelynew
area ofstohastiapproximation-theSPSA algorithmanddisussadaptiveproedures
that pratially enhane onvergene.
3.1 Conditional mathematial expetation
Let
{Ω, F, P }
denotes a probability spae,G
is aσ
-algebra of random eventsG ⊆ F
and
ξ
isarandomvariableE[ξ] < ∞
. ArandomvariableE[ξ | G]
isalledaonditionalmathematial expetationof
ξ
w.r.t.G
if the followingonditions are valid:1.
E[ξ | G]
is aG
-measurable funtion, 2. for every setA ⊆ G
,Z
A
ξ(ω) P (dω) = Z
A
E [ξ | G] (ω) P (dω).
(20)The main properties of the onditional mathematial expetation, that is used in the
next paragraphs are [13℄:
1. if
ξ = C = const
, thenE[ξ | G] = C (a.s.),
(21)2. if
ξ ≤ η
(a.s) thenE[ξ | G] ≤ E [η | G] (a.s.),
(22)3. if the randomvariable
ξ
is measurable with respet toσ
-algebraG
, thenE[η | G] = ξ (a.s.),
(23)E [E[ξ | G]] = E[ξ] (a.s.),
(24)5. linearity. If
a, b ∈ R
andE [ξ] < ∞
,E [η] < ∞
thenE [aξ + bη | G] = aE [ξ | G] + bE [η | G] (a.s.),
(25)6. if the randomvariable
ξ
does not depend onσ
-algebraG
, thenE[ξ | G] = E[ξ] (a.s.),
(26)7. onditionalmathematialexpetationwithrespet toarandomvariableisdened
as anexpetation w.r.t. the
σ
-algebra, generated by this variable:E [ξ | F n ] = E [ξ | η] , F n = σ{η}.
(27)To get intuitivesenseregarding onditionalmathematialexpetation, one an imagine
E [X | F ]
asarough,averagedversionofX,beauseonthearbitrarysetA ∈ F
randomvariable X an assume any values, but
E [X | F ]
gets only one xed value:E [X | A]
what is anaverage value of
X
on the setA ∈ F
.3.2 Disrete time martingales
Assume that
F 0 ⊆ F 1 ⊆ ... ⊆ F
is a non-dereasing family ofσ
-algebras, dened onaprobabilityspae
{Ω, F , P }
. Forexampleif{ξ n }, n ≥ 0
isasetofrandomvariablesgivenon
{Ω, F , P }
andF n l = σ{ξ 0 , ... , ξ n }
isasigma-algebra,generatedby{ξ k , k = 0, ... , n}
,then
{F n l }
is a non-dereasing familyofσ
-algebras.Let
{X n , n ≥ 0}
isasequene ofrandomvariablesdened on{Ω, F , P }
. Ifforalln ≥ 0 X n is F n-measurable, then {X n , F n }
isalled a stohasti sequene.
{X n , F n }
isalled a stohasti sequene.Ifforall
n ≥ 0 X nisF n−1-measurable,thenthesequene {X n , F n−1 }
isalledastohas-
{X n , F n−1 }
isalledastohas-ti sequene.
Stohasti sequene
{X n , F n }
,E [|X n |] < ∞
is alled:•
a martingale,ifE [X n+1 | F n ] = X n (a.s.)
,•
a submartingale,ifE [X n+1 | F n ] ≥ X n (a.s.)
,•
a supermartingale,ifE [X n+1 | F n ] ≤ X n (a.s.)
.It an be learly seen that supermartingales are stohasti analogues of non-inreasing
sequenes of the real numbers. As a result, supermartingales (as well as submartin-
gales) under some onditions an have a limit(what is infat arandomvariable). The
orresponding statement are shown in the following theorem [9℄.
Theorem 3.1 (Convergene theorem). If
{X n , F n }
is a supermartingale, suh thatinf n E [X n − ] > −∞
,X n − = min{X n , 0}
then| lim
n→∞ X n | < ∞. (28)
From the theorem above itispossibletoonlude: if
{X n , F n }
isa non-negative super- martingale, then a.s. (with probability 1)there exists its limit.3.4 Convergene of the random variables
Let
{X 1 , ..., X n , ...}
is a sequene of random variables, whih are dened on the sameprobability spae
{Ω, F , P }
.1.
X n
− → P X
(inprobability) if∀ǫ > 0 lim
n→∞
P (|X n − X| > ǫ) = 0
,2.
X n
−−→ m.s. X
(inmean square) iflim
n→∞ E{|X n − X| 2 } = 0,
3.
X n
−−→ a.s. X
(almostsure) ifP
w ∈ Ω : lim
n→∞ X n (w) = X(w)
= 1,
4.
X n
− d
→ X
(indistributionorweakly)iflim
n→∞ F n (x) = F (x)foreveryx ∈ R
atwhih
F (x)
is ontinuous. HereF n (x)
andF
are the umulative distribution funtions (CDF) of the random variablesX n and X
respetively.
Let
X n is a sequene of random variables distributed as follows: P (X n = 0) = 1 − n 1,
P (X n = 1) = n 1. This sequene onverges in mean-square and therefore in probability
but doesnot onverges almost sure. [26℄
P (X n = 1) = n 1. This sequene onverges in mean-square and therefore in probability but doesnot onverges almost sure. [26℄
0 2000 4000 6000 8000 10000 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Step(n)
X n
Figure 2: Convergene of the random variables inmean-square.
Let
X n is a sequene of random variables distributed as follows: P (X n = 0) = 1 − n 1 2,
P (X n = 1) = n 1 2. This sequene onverges almost sure and thereforein probability but does not onverges inmean-square. [26℄
0 2000 4000 6000 8000 10000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Step(n)
X n
Figure3: Convergene of the randomvariables almostsure.
As it an be seen from the gures, in the rst ase even after5000 steps, there an be
outliers from the aumulation point, but in the ase of the onvergene almost sure it
does not happen, as for any
w ∈ Ω
we annot have values beyond the arbitrary givenepsilon-neighborhoodof the limited point.
3.5 Stohasti quasi-gradient method
Havingunderstandingoftheforegoingoneptswewillproeeddesribingthestohasti
quasi-gradientalgorithmfortheoptimizationproblemswithmean-valueriterion,whih
is formulated below.
min u∈U F (u), F (u) = E{Φ(u, X)}.
(29)
Here
Φ(u, X)
asbeforedenotes anobjetive funtiondependent onarandom vetorX
and the strategy
u
whih shouldbehosen from the setU ⊆ R n.
If the gradient
∇F (u)
was known we ould use the lassial gradient-desent method, starting from some initial pointu 0. The main statement of the algorithm is written
below:
u ν+1 = Π U (u ν − ρ ν ∇F (u ν )) ,
(30)where
Π U (u)
denotes aprojetion operatoronto the setU
:Π U (u) = arg min
y∈U
||u − y||.
(31)The projetion operator are used here to restrit the loation of the points
u 0,...,u n
generated by the algorithmtothe admissiblearea
U
.The problem to nd a projetion of the point onto the set is not always an easy task.
Generally the set
U
has to be losed and onvex in order to have a unique projetion(it mighthappen that the projetion does not exists at allif the set
U
is open and theprojetion mightbe non-unique in the ase of non-onvex
U
).The projetion anbe foundinstantlyforthe elementarytypesofthe set
U
. Suhasesare listed below.
1. A non-negative orthant:
U = E + n = {u : u ∈ R n , u j ≥ 0, j = 1, .., n}.
(32)Let usdenote
p j the j
-th oordinateof the projetion Π U (u)
, then
p j =
0, u j < 0,
u j , u j ≥ 0.
(33)2. n-dimensional parallelepiped:
U = {u : u ∈ R n , a j ≤ u j ≤ b j , j = 1, .., n}
(34)p j =
a j , u j < a j , u j , a j ≤ u j ≤ b j , b j , u j > b j .
(35)
where
p j as usuall the j
-thoordinate of the projetion Π U (u)
.
3. A sphere with the radius
r
:U = {u : u ∈ R n , ||u|| ≤ r}.
(36)The projetion an be found as follows:
Π U (u) =
u, u ∈ U,
r
||u|| u, u / ∈ U, (37)
4. A hyperplane:
U = {u : u ∈ R n , hc, yi = b},
(38)where
c ∈ E n, b ∈ R
.
Π U (u) = u + b − hc, ui
||c|| 2 c.
(39)5. A halfspae:
U = {u : u ∈ R n , hc, yi ≤ b},
(40)Π U (u) =
( u, u ∈ U,
u + b−hc,ui ||c|| 2 c, u / ∈ U,
(41)Considering the remarkregarding projetion operator agradientdesent algorithman
bewrittern:
1. Initialize the step
ν = 0
,hoose the initialpointu 0.
2. Compute the next point
u ν+1 using the previous point u ν and an equation:
u ν+1 = Π U (u ν − ρ ν ∇F (u ν )) ,
(42)To nd a projetion onto the set
U
use above mentioned remarks regarding theprojetion operator.
3. Chek the stoppingriterion. Ifthestoppingondition issatised, put
u ∗ = u ν+1,
otherwise assign
ν = ν + 1
and gothe step 2.Let
f (u)
is a funtion, dened on the setU ⊂ R n. The vetor ∂(u 0 ) ⊂ R n is alled a
subgradientof the funtionf (u)
atthe pointu 0 ∈ U
if ∀u ∈ U
f (u)
atthe pointu 0 ∈ U
if∀u ∈ U
f (u) ≥ f (u 0 ) + h∂(u 0 ), u − u 0 i.
(43)The formula above geometrially means that the graph of the funtion
f (u)
loatedabovethe linearfuntion
f (u 0 ) + h∂(u 0 ), u − u 0 i
andatthe pointu 0 thegraphsoinide.
Obviously if the funtion is dierentiable there is only one suh vetor
∂(u 0 )
equals tothe gradient
∇f (u 0 )
. Fora non-dierentiable funtion at the point(u 0 )
there exists aset of the subgradients, that is alledsubdierentialset.
From the point of view of optimization methods, if the objetive funtion
F (u)
is notdierentiable,the gradient
∇F (u)
issubstituted by thesubgradient∂F (u)
and the restof the gradient desent method is used without any hanges. Suh method is alled
subgradientalgorithm.
In ase ofthe stohasti objetivefuntionanexat gradient isunknown, andtherefore
its stohasti version isused:
u ν+1 = Π U (u ν − ρ ν ∇ u Φ(u ν , x ν )),
(44)where
x ν , (ν = 1, ...)
is a realization ofX
atthe stepν
of the algorithm. So thatx ν is
a random variable, beause we do not know the value whih assumes
X
at the stepν
.Moreover the point
u ν , (ν = 1, ...)
is also a random variable, beause it depends onx ν
and the previous point
u ν−1.
Finallynotethatanexpression(44) desribesarandomproessstartingfromtheinitial
point
u 0 and under some further formulated onditions should lead us to the optimal
value of the optimizationproblem (29).
1. Initialize the step
ν = 0
,hoose the initialpointu 0.
2. Generate arealization
x ν of the randomvetor X
. Compute astohastigradient
∇ u Φ(u ν , x ν )
using the previous pointu ν and the realizationx ν.
3. Calulate the next point
u ν+1 using the previous point u ν and an equation:
u ν+1 = Π U (u ν − ρ ν ∇ u Φ(u ν , x ν )),
(45)To nd a projetion onto the set
U
use the mentioned remarks regarding theprojetion operator.
4. Chek the stoppingriterion. Ifthestoppingondition issatised, put
u ∗ = u ν+1,
otherwise assign
ν = ν + 1
and gothe step 2.If the funtion
Φ(u ν , x ν )
is not dierentiable with respet to variableu
a stohastigradient is substituted by the quasi-gradient:
E
ζ ν | u 0 , ..., u ν −1
∈ ∂ u E [Φ(u ν , X )] .
(46)Intuitively the above mentioned statement means that the onditional expetation of
the stohasti quasi-gradient atthe point
u ν with respet to allprevious pointsshould
oinide almost sure (a.s.) with one of the subgradients of the objetive funtion at
the point
u ν (i.e. belonging to the subdierential set). And the main statement of the stohasti quasi-gradient method looksas follows:
u ν+1 = Π U (u ν − ρ ν ζ ν ).
(47)When we are dealing with a deterministi problem the step size
ρ ν+1 an be obtained
from the solution ofa one-dimensional optimizationproblem:
ρ ν+1 = min
ρ {F (u ν − ρ∇ u F (u ν ))}, (48)
meaning that inevery step we shouldmove towards the gradientdiretion tominimize
the objetive funtion as muhas possible.
In the stohasti version there are several problems whih do not allow to use above
shown tehnique. Firstly we donot know exatly anexpeted value of
Φ(u, X)
for thegiven
u
(weknowonlyΦ(u, x)
). Seondlythestohastigradientatsomestepsanevengive a wrong diretion, so any positive value
ρ
leads to the worse point than previousone.
The next gures showthe lassialgradient desentmethodand the stohasti version.
In the rst ase we expliitlyan nd the gradient and omputing anoptimal step size
reah an optimum with
1
step. In the seond ase we only observe noisy data and getan optimum of the mean-value riterion applying stohasti gradient method, what is
of ause requires more steps.
−10
−5 0
5
10
−10
−5 0 5 10
0 50 100 150 200
u 1 u 2
F(u 1 ,u 2 )=u 1 2 +u 2 2
Figure5: Illustration ofthe lassialgradient desent method.
−5 −10 5 0
10
−10
−5
0
5
10 0
100 200
u 2
u 1 F(u 1 ,u 2 )=E{u 1 2 +u 2 2 +Xu 1 +Yu 2 }
Figure6: Illustrationof the stohasti gradient method.
Toprooftheonvergenea.s. (47)ofthealgorithmwehavetomakeseveralassumptions
[11℄:
•
the funtionE [Φ(u ν , X )]
isa onvex nite-valued funtion;•
the setU
isa onvex and ompat;•
the parametersρ
andγ
satisfy a.s.:ρ ν ≥ 0, X ∞
ν=0
ρ ν = ∞, X ∞
ν=0
E
ρ ν |γ ν | + ρ 2 ν kζ ν k 2
< ∞.
(49)Thersttwoonditionsislassialrequirementsforanexisteneandauniquenessofthe
solution in onvex optimization. The last two requirements mostly aet the sequene
of the step size. The sequene has to be very speial: it should onverge, but not so
fast so that the point generated by the algorithm will be able to reah an optimum
independently of the initialpoint wherethe algorithmstarts.
In order to highlightthe main steps of the proof [12℄,let usassume the followingnota-
tions:
u ∗ ∈ U
is anoptimalsolution of the problem(29),F ν = σ{u 0 , ..., u ν }
- asigma-algebra,generated by the random variables{u 0 , ..., u ν }
.Themainideaoftheproofistoshowthatunderonditionsofthetheorem
E [|u ν+1 − u ∗ k |F ν ]
is a non-negative supermartingale, therefore a.s. it has some limit and then to proof
that the limitis0, meaning
u ν → u ∗ a.s.
•
Projetion implyes:ku ν+1 − u ∗ k 2 ≤ ku ν − u ∗ k 2 − 2ρ ν hζ ν , u ν − u ∗ i + ρ 2 ν kζ ν k 2.
•
Take onditionalexpetationw.r.t.F ν:
E [ku ν+1 − u ∗ k 2 |F ν ] ≤ ku ν − u ∗ k 2 − 2ρ ν hE [ζ ν |F ν ] , u ν − u ∗ i + ρ 2 ν E [kζ ν k 2 |F ν ]
.•
By the onditions of the theorem:γ ν + hE [ζ ν |F ν ] , u ∗ − u ν i ≥ E [Φ(u ∗ , X )] − E [Φ(u ν , X )] ≥ 0 = ⇒ hE [ζ ν |F ν ] , u ∗ − u ν i ≥ −γ ν.
•
Two previous steps together results in:E [ku ν+1 − u ∗ k 2 |F ν ] ≤ ku ν − u ∗ k 2 + 2ρ ν γ ν + ρ 2 ν E [kζ ν k 2 |F ν ] ≤ ku ν − u ∗ k 2 + 2ρ ν |γ ν | + ρ 2 ν E
kζ ν k 2 |F ν
| {z }
R ν
= ku ν − u ∗ k 2 + R ν.
•
AssumeY ν = ku ∗ − u ν k 2 + P ∞
k=0
R k, hene:
E [Y ν+1 |F ν ] ≤ Y ν , Y ν ≥ 0 = ⇒ Y ν → Y
(a.s.) - onvergene theorem forsupermartingales.
•
Reursively, taking the full expetation:E [ku ν+1 − u ∗ k 2 ] ≤ ku 0 − u ∗ k 2 + P ν k=0
ρ 2 k E kζ k k 2
− 2 P ν k=0
ρ k hE ζ k
, u k − u ∗ i = ⇒ E [ku ν+1 − u ∗ k 2 ]−E [ku 0 − u ∗ k 2 ]−
P ν k=0
E [R k ] ≤ 2 P ν k=0
ρ k E [Φ(u ∗ , X )] − E
Φ(u k , X ) .
•
Considering onditions of the theorem:P ∞ k=0
E [R k ] < ∞
andE [Φ(u ∗ , X )] < E
Φ(u k , X )
we have:
−∞ <
P ∞ k=0
ρ k E [Φ(u ∗ , X )] − E
Φ(u k , X )
≤ 0
,thereforeE
Φ(u k , X )
→ E [Φ(u ∗ , X )]
.3.6 Kiefer-Wolfowitz algorithm
Ifthesubgradient
∂ u Φ(u, x)
oreven thegradient∇ u Φ(u, x)
ofthefuntionΦ(u, x)
w.r.t.variable
u
an be expliitly found, then in every step of the SQG-algorithm∂ u Φ(u, x)
or
∇ u Φ(u, x)
is usually used as a quasi-gradient estimate, otherwise a nite-dierene approximationis employed.ζ k = 1 2δ k
X n
j=1
Φ(u k + δ k e j , x k ) − Φ(u k − δ k e j , x k )
e j .
(50)where
x k isasusualarealization oftherandomvetorX atthe stepk
of thealgorithm,
e j, j = 1, ..., n
are the unit vetors direted alongthe oordinateaxes.
j = 1, ..., n
are the unit vetors direted alongthe oordinateaxes.ConsideringtheformulaabovethemethodofKieferandWolfowitzassumesthefollowing
form:
1. Initialize the step
k = 0
,hoose the initialpointu 0.
2. Generate a realization
x k of the random vetor X
. Compute a stohasti quasi-
gradient
ζ k viaformula(50) using the previous point u k and the realization x k.
x k.
3. Calulatethenextpoint
u k+1usingthepreviouspointu k aordingtotheformula:
u k+1 = Π U (u k − ρ k ζ k ),
(51)To nd a projetion onto the set
U
use the mentioned remarks regarding theprojetion operator.
4. Chek the stoppingriterion. Ifthestoppingondition issatised, put
u ∗ = u k+1,
otherwise assign
k = k + 1
and gothe step 2 .Asitan beseenthealgorithmrequires
2n
evaluationsoftheobjetivefuntionΦ(u, X)
in every step of the method,where
n
is a dimension of the ontrolvaraibleu
.The onvergene theorem of Kiefer-Wolfowits [17℄ and its generalization an be found
in [4,9℄.
Theorem 3.2. If the below onditions are valid:
•
the funtionf (u) = E[Φ(u, X)]
has only extremumu ∗ ∈ int{U }
;•
the seond derivatives off (u)
are ontinuous and bounded;•
the varianeD[Φ(u, X)] ≤ C < ∞
for allu ∈ U
;•
the sequenesρ k and δ k satisfy
ρ k ≥ 0, X ∞
k=1
ρ k = ∞, X ∞
k=1
ρ k
δ k
2
< ∞, X ∞
k=1
ρ k |δ k | < ∞.
(52)then the sequene generated by Kiefer-Wolfowits algorithm onverges to the optimal so-
lution of the mean-value problem almost sure (
u k → u ∗ a.s.).
As it was mentioned the step size is a very important feature diretly aeting perfor-
maneof themethod. Asitoftenhappensthe theoretialresults regardingonvergene
of the methoddo not suggest the way to inrease performane and therefore empirial
shemes are used. An opposite side of suh proess is that adaptive algorithmsshould
be theoretially justied otherwise there exist a risk to fae the speial ase when an
inaurate adaptive method fails. As an example: a stohasti approximation adap-
tive rule for the step size have to satisfy step size onditions to guarantee onvergene
otherwise all the proof must berevised.
Additionally adaptation should not signiantly inrease omplexity of the original
methodand preferably donot have alot of tuning parameters tobe adjusted. Despite
this hallenges several suessful results regarding adaptive stohasti approximation
are known.
3.7.1 Kesten Rule
The rst idea touse adaptive step size in stohasti approximation belongs toKesten.
His method based on the simple idea that if we are far from the optimum, the errors
tend to have the same sign but when we are getting loser, the errors start alternate.
Under the error here it is assumed the dierene between the previous
u n−1 and the
urrent
u n point. Kesten suggested the followingrule:
ρ n−1 = a
a + K n − 1 ,
(53)where
a
is a parameter to be alibrated.K n is a parameter, whih ounts the number
of times that the sign of the errors have been hanged.
K n =
n n = 1, 2,
K n−1 + 1 {hǫ n ,ǫ n− 1 i<0} n > 2,
(54)where
ǫ n = u n−1 − u n .
(55)To asses the performane of the adaptive method let us have a look at the following
example:
min E
u 2 1 + u 2 2 + max(u 2 1 , u 2 2 ) + u 1 X , X ∈ N (0, 1).
(56)Obviously, the optimum of the funtion is
u ∗ 1 = 0, u ∗ 2 = 0
, beause∀u 1 , u 2 u 2 1 + u 2 2 + max(u 2 1 , u 2 2 ) ≥ 0
and all termsu 2 1 ≥ 0, u 2 2 ≥ 0, max(u 2 1 , u 2 2 ) ≥ 0
.−10
−5 0 5 10
−10
−5 0
5 10 0
50 100 150 200 250 300
u 2 u 1
F(u 1 ,u 2 )=E{u 1 2 +u 2 2 +max(u 1 2 ,u 2 2 )+u 1 X}
Figure 7: Optimizationfuntion.
The stohasti gradientan bewritten expliitlyinthe form:
ξ(u, X) =
2u 1 + 2u 1 I u 2
1 >u 2 2 + X 2u 2 + 2u 2 I u 2 1 ≤u 2 2
.
(57)Let us hoose the step size
ρ k aording tothe rule: ρ k = 0.1 · k 1. In the standard ase
the steps will be too small, and the adaptive rule will preserve the step size from the
rapid drop enhaning onvergene.
On the next gures we an see how the ontrol variables where hanging towards the
optimumover thesteps. Initialvalue ofthe step sizewere hosenthe samefor adaptive
and non-adaptive methods. In fat non-adaptive algorithms even after 1000 iterations
were relativelyfar from the optimum, while the adaptive shemes were able torih the
steady state after100 steps.
0 200 400 600 800 1000 0
2 4 6
Step(N) u 1
0 200 400 600 800 1000
0 2 4 6
Step(N) u 2
0 200 400 600 800 1000
0 2 4 6
Step(N) u 1
0 200 400 600 800 1000
0 2 4 6
Step(N) u 2
Figure 8: Small step size. Stohasti Gradient Desent (left). Stohasti Gradient
Desent with adaptiveKesten rule (right).
0 200 400 600 800 1000
0 2 4 6 8
Step(N) u 1
0 200 400 600 800 1000
0 2 4 6 8
Step(N) u 2
0 200 400 600 800 1000
0 2 4 6 8
Step(N) u 1
0 200 400 600 800 1000
0 2 4 6 8
Step(N) u 2
Figure 9: Kiefer-Wolfowitz algorithm(left). Kiefer-Wolfowitz algorithmwith adaptive
Kesten rule (right).
As it an be seen the test resultsagree with the theoretial reasoning.
Now let us test the adaptive rule for the large step lengths on the same test example,
so
ρ k = 10 · k 1. At the beginning when k
is relatively small, the steps will be too large
and the points,generated by the algorithmswilltry toleavethe admissiblearea
U
butdue tothe projetion operatorthey willbe returned to the border of the set
U
. Whenthe step size assumes the reasonable values, the proedures will start onverge to the
optimum. Here the adaptive rule doesnot enhane onvergene asit does not derease
the steps faster than instandard versions.
0 200 400 600 800 1000
−10
−5 0 5 10
Step(N) u 1
0 200 400 600 800 1000
−10
−5 0 5 10
Step(N) u 2
0 200 400 600 800 1000
−10
−5 0 5 10
Step(N) u 1
0 200 400 600 800 1000
−10
−5 0 5 10
Step(N) u 2
Figure 10: Stohasti GradientDesent (left). Stohasti GradientDesent with adap-
tiveKesten rule (right).
0 200 400 600 800 1000
−10
−5 0 5 10
Step(N) u 1
0 200 400 600 800 1000
−10
−5 0 5 10
Step(N) u 2
0 200 400 600 800 1000
−10
−5 0 5 10
Step(N) u 1
0 200 400 600 800 1000
−10
−5 0 5 10
Step(N) u 2
Figure11: Kiefer-Wolfowitzalgorithm(left). Kiefer-Wolfowitz algorithmwith adaptive
Kesten rule (right).
The test examples again agree with theoretial reasoning. This example also demon-
strates the behaviourof the stohasti approximation methods: the onvergene traes
w.r.t. oordinate
u 2 are smoother, then w.r.t tou 1 asa stohastiity mostly aets the
rst oordinatedue tothe term u 1 X
inthe objetive funtion.
u 1 X
inthe objetive funtion.FinallyregardingKestenruleweanonlude thatitreasonabletouse forthe relatively
smallormiddlesteplengthasitombinesonstantstep size rule(whenwe arefar from
the optimum) with the usual step size rule (whenwe are lose tothe end).
Uryasev [28℄ suggested more advaned adaptive rule whih an derease the steps as
well asinrease them depending onthe generated sequene of points:
ρ k+1 = min(¯ ρ, ρ k a −hξ k+1 ,∆ k+1 i−δρ k ),
∆ k+1 = u k+1 − u k , a k > 1, δ > 0,
(58)
To understandthe meaningof the termsinthe mentionedabove ruleletushave alook
at the main formulaof SQG-algorithm:
u k+1 = u k − ρ k ξ k .
(59)It would belogial to hoose the step, thatminimizes the funtion
F k (ρ)
w.r.tρ
:F k (ρ) = E[Φ(u k − ρξ k ) | F k ].
(60)A omputationof
F k (ρ)
isaverydiulttask,thereforeletusdierentiateΦ(u k − ρξ k )
w.r.t
ρ
at the pointρ k.
∂ ρ Φ(u k − ρξ k )| ρ k = −hξ k , ∂Φ(u k − ρ k ξ k )i = −hξ k , ξ k+1 i.
(61)Hene
−E[hξ k , ξ k+1 i | F k ] ∈ ∂F k (ρ k )
and the following gradient proedure an be usedto modify the step size:
ρ k+1 = ρ k + λ k hξ k , ξ k+1 i.
(62)The adaptive proedure (58) diretly omes from the previous formula and works as
follows: the term
hξ k+1 , ∆ k+1 i
gives information wether or not the minimum ofF k (ρ)
w.r.t
ρ
has been reahed. If−hξ k+1 , ∆ k+1 i > 0
then with high probability we an say thattheminimumhasnotbeenahieved yetandthestepρ kwillbeinreased,otherwise
it dereases.
Let ustest this adaptive rule on the previous modelexample.