LappeenrantaUniversity ofTechnology
Ville Kyrki
LOCAL AND GLOBAL FEATURE EXTRACTION
FOR INVARIANT OBJECT RECOGNITION
ThesisforthedegreeofDoctorofScience(Technology)
tobepresentedwithdue permissionforpublicexami-
nationandcriticismintheAuditoriumoftheStudent
UnionHouse atLappeenranta Universityof Technol-
ogy, Lappeenranta, Finland on the5th of December,
2002,atnoon.
ActaUniversitatis
Lappeenrantaensis
133
DepartmentofInformationTechnology
LappeenrantaUniversityofTechnology
Finland
Reviewers ProfessorJohnIllingworth
CentreforVision,Speech,andSignalProcessing
DepartmentofElectronicand ElectricalEngineering
UniversityofSurrey
United Kingdom
ProfessorJoukoLampinen
LaboratoryofComputationalEngineering
DepartmentofElectricalandCommunicationsEngineering
Helsinki UniversityofTechnology
Finland
Opponent AcademyProfessorErkkiOja
LaboratoryofComputerandInformation Science
Helsinki UniversityofTechnology
Finland
ISBN 951-764-698-4
ISSN1456-4491
Lappeenrannanteknillinenkorkeakoulu
Digipaino2002
Preface
TheworkpresentedinthisthesishasbeencarriedoutattheLaboratoryofInformation
ProcessingintheDepartmentofInformationTechnologyoftheLappeenrantaUniversity
ofTechnology,Finland,during theyears1999-2002.
I want to express my gratitude to my supervisor, Professor Heikki Kälviäinen, for his
supportiveguidance aswell asproviding me thefacilities andnancial support for my
research.
I would like to warmly thank all other co-authors of the publications, Professor Pasi
Fränti,MrJoniKämäräinen,MrAlexeyMednonogov,andMrJaniPeusaari. Inaddition,
IwouldliketoexpressmythankstoProfessorJariPorrasandDrJouniIkonenfortheir
contributionintheearlierphasesoftheresearchonparallelHoughtransform.
I thank the reviewers, Professor John Illingworth and Professor Jouko Lampinen, for
criticalreadingof themanuscriptandvaluablecomments.
I wish to thank also the colleagues at the Laboratory of Information Processing at
Lappeenranta University of Technology for a stimulating atmosphere and CSC - Sci-
enticComputingforprovidingparallelcomputingresources.
ThenancialsupportoftheEastFinlandGraduateSchoolonComputerScienceandEn-
gineering(ECSE),KAUTEfoundation(Kaupallistenjateknillistentieteidentukisäätiö),
JennyandAnttiWihurifund,FoundationoftheLappeenrantaUniversityoftechnology,
and Foundation ofthe Advancein Technology(Tekniikanedistämissäätiö) isgratefully
acknowledged.
Finally,I wishtothankmydearwifeAnnaforherendlessencouragements.
Lappeenranta,October2002
VilleKyrki
Abstract
Ville Kyrki
LOCALANDGLOBALFEATUREEXTRACTIONFORINVARIANTOB-
JECT RECOGNITION
Lappeenranta,2002
115 p.
ActaUniversitatisLappeenrantaensis133
Diss. LappeenrantaUniversityofTechnology
ISBN 951-764-698-4
ISSN 1456-4491
Perceiving theworld visuallyis abasicact forhumans,but for computers itis still an
unsolved problem. The variability present in natural environments is an obstacle for
eectivecomputervision. Thegoalofinvariantobjectrecognitionistorecogniseobjects
in adigitalimagedespitevariationsin,forexample,pose,lightingorocclusion.
In this study, invariant objectrecognition is considered from the viewpoint of feature
extraction. Thedierencesbetweenlocalandglobal featuresarestudiedwithemphasis
onHoughtransformandGabor lteringbasedfeatureextraction. Themethodsareex-
amined withrespectto fourcapabilities: generality,invariance,stability, andeciency.
Invariant features are presented using both Hough transform and Gabor ltering. A
modiedHoughtransform techniqueis also presentedwhere thedistortion toleranceis
increased by incorporating local information. In addition, methods for decreasing the
computationalcostsoftheHoughtransformemployingparallelprocessingand localin-
formationareintroduced.
Keywords: feature extraction, invariant object recognition, Hough transform, Gabor
ltering,computervision,imageanalysis
UDC 004.93'1: 004.272.2: 004.932.2
a(k) Fourierdescriptor
g[x] transformationbelongingtogroupG actingonsignalx
H(;) Houghtransformaccumulator
m
pq
geometricmomentoforder(p+q)
r
(t) responseof1-dGaborlter
r
(x;y) responseof2-dGaborlter
Tfxg measurementofsignalx
Fffg Fouriertransformoff
pq
normalisedcentralmomentoforder(p+q)
pq
centralmomentoforder(p+q)
momentinvariant
(t) 1-dGaborelementaryfunction
(x;y) 2-dGaborelementaryfunction
(f) 1-dGaborelementaryfunctionin Fourierdomain
(u;v) 2-dGaborelementaryfunctionin Fourierdomain
CRHT ConnectiveRandomizedHoughtransform
GEF Gaborelementaryfunction
GHT GeneralisedHoughtransform
HT Houghtransform
LFA Localfeatureanalysis
PCA Principalcomponentanalysis
PE processingelement
RHT RandomizedHoughtransform
SIMD Single-instructionmultiple-data
VLSI Verylargescaleintegration
I. Kyrki, V., Kälviäinen, H., Combination of Local and Global Line Extrac-
tion,JournalofReal-TimeImaging, Vol.6,No.2,April2000,pages7991.
II. Kyrki,V.,Kälviäinen,H.,HighPrecision2-DGeometricalInspection,Pro-
ceedings 15th International Conference on Pattern Recognition, Barcelona,
Spain,September37,2000,Vol.4,pages779782.
III. Fränti,P.,Mednonogov,A.,Kyrki,V.,Kälviäinen,H.,Content-BasedMatch-
ingofLine-DrawingImagesUsing HoughTransform,International Journal
onDocumentAnalysisandRecognition, Vol.3,No.2,2000,pages117124.
IV. Kyrki,V.,Kämäräinen,J.,Kälviäinen,H., InvariantShapeRecognitionUs-
ingGlobalGaborFeatures,Proceedings ofthe12thScandinavianConference
onImageAnalysisSCIA2001,Bergen,Norway,June1114,2001,pages671
678.
V. Kyrki,V., Kämäräinen,J.-K.,Kälviäinen,H., Content-basedImageMatch-
ingUsingGaborFiltering,AdvancedConceptsforIntelligentVisionSystems,
ACIVS2001,Baden-Baden,Germany,July30August3,2001,pages4549.
VI. Kyrki,V., PeusaariJ.,Kälviäinen,H., Intermediatelevelfeatureextraction
in novelparallel environments, acceptedfor publicationin Machine Vision
andApplications.
InthisthesisthesepublicationsarereferredasPublicationI,PublicationII,Publication
III, Publication IV,Publication V,andPublicationVI.
1 Introduction 1
2 Invariance in Object Recognition 5
2.1 Theroleofinvariance . . . 5
2.2 Variationinnaturalimages . . . 6
2.3 Globalfeatures andinvarianttransforms . . . 7
2.4 Local featuresandmatching. . . 10
2.5 Objectrecognitioninhumanvisualsystem . . . 11
3 HoughTransform 15 3.1 Houghtransformforlinedetection . . . 15
3.2 Extendingtheproblemdomain . . . 18
3.2.1 Morecomplexshapes . . . 18
3.2.2 Invariantimagematching . . . 19
3.3 Computationalimprovements . . . 20
3.3.1 Probabilisticmethods . . . 20
3.3.2 Parallelprocessing . . . 22
4 GaborFiltering 27 4.1 Gaborelementaryfunctions . . . 27
4.1.1 One-dimensionaltime-frequencyspace . . . 27
4.1.2 Two-dimensionalspatial-frequencyspace. . . 29
4.2 Applicationsinobjectrecognition. . . 32
5 Discussion 37
Bibliography 41
Publications 51
Introduction
Eortshavebeenmadeovermanyyearsto equipcomputerswith vision. Objectrecog-
nitionisaclassicalproblemincomputervisionthataimstocategoriseobjectstoknown
objectclassesusing adigitalimage. Itcanbeutilised in avastnumberofapplications
includingindustrial,medical,andscienticones.
Alreadyin 1960,SelfridgeandNeisser presentedin thecontextofcharacterrecognition
that theformulationoftherecognitionasamulti-stageprocessisbenecial[99]. Today,
mostrecognitionsystemsarebasedonasequentialmodelthathasatleasttwomainparts:
feature extraction andclassication. The goaloffeature extractionis to extract useful
information from an input image, which is then utilised to perform the classication.
That is,thedigitalimage isprocessedinorderto ndsomestructure,e.g.edges,which
isthenusedasthecriteriononwhichthecategorisationisbased. Thefeatureextraction
andclassicationarequiteoftenalsocomposedofseveralsub-stages.
The humanvisual system is nowadaysthought to beasimilar multi-stagemodel with
several levels of preattentive feature analysis at the early stages of processing [107].
As in many other tasks, the skills of a computer and those of a human are largely
complementaryin objectrecognition. Humanscanaccuratelydeterminetheclass ofan
object among hundreds of others almost regardless of the viewpoint, illumination, or
partial occlusion. Incontrast,incontrolledenvironmentscomputervisioncantirelessly
ndawsincomplexmanufacturedproductsthatwouldnotappeartoahumanobserver.
The main dierence between the capabilities seems to result from the fact that the
machine is almost unerring in acontrolled environmentwhere it is knownwhat visual
features to seek. On the other hand, in anatural environment the viewing conditions
vary,whichisaremarkableproblemforcomputervision. Forexample,theobjectmaybe
viewedfromadierentposeorthelightingmaychange. Thus,theproblemofinvariant
objectrecognitionistorecogniseobjectsdespitethesevariances.
Not much has changed since 1978 when Granlund [35] referred to feature extraction
methodsasabagoftrickssinceamethodisoftenchosenbasedonexperienceseemingly
without explicit reasons. Nevertheless, the feature extraction method selected has a
fundamentaleectonthecharacteristicsofarecognitionsystem. Featureextractioncan
playdierentparts in invariant recognition,depending if thefeature itself is invariant,
suchasmomentinvariants[39]ortheFourier-Mellincoecients[18],oriftheinvariance
is a characteristicof the system, asin labelled graphmatching [65]. Regardlessof the
role offeatureextraction, thefollowingrequirementsandobjectivescanbestated: The
system should be general enoughto recognise a large variety of objects, invariant to
naturalvariations,stable againstdistortions,andcomputationallyecient.
Theaimofthisthesisistostudysomefeatureextractionmethodsthatcanbeappliedto
invariantobjectrecognition. Whileingeneral,invariantobjectrecognitionencompasses
variationscausedby thethree-dimensionalstructureof theworld,this thesisis limited
to thestudy oftwo-dimensionalvariations. Inaddition, themethodspresentedassume
thattheobjecttoberecognisedhasbeensegmentedfromthebackground,andthatonly
asingleobjectisconsideredatatime. Twowellknownfeatureextractionmethods, the
Hough transform(HT) andGabor ltering, are examinedand improved to be suitable
forinvariantrecognition. Primarilytheshapeoftheobjectsisinspectedasitissupposed
that theshapeincludes enoughinformation forsuccessfulrecognition. Forthat reason,
thescopeofthethesisislimitedtotwo-dimensionalgraylevelimagesandthemethods
presented are primarily applicable to man-made, geometric objects, where the object
shapecan beecientlymodelled.
The thesis is divided into vechapters. InChapter 2, the role of invariance in object
recognitionisexamined. Next,twofeatureextractionmethodsarepresentedandapplied
toobjectrecognition,namely,theHoughtransforminChapter3,andGaborlteringin
Chapter 4. Finally, thediscussionandconclusionsare presentedin Chapter5.
Summary of publications
In Publication I anew method forlocatingline segments was proposed. The method
is anextensionoftheRandomizedHoughtransform(RHT)whichexploitsconnectivity
for the estimation of local structureof the image. The experimentsindicated that the
methodwasmoretoleranttomissingdatawhencomparedtoitspredecessor. Theauthor
ofthisthesisparticipatedinthedevelopmentofthemethod,performedtheprogramming
andtheexperiments,andwrotethemajorpartofthepublication.
In Publication II a systemof inspecting two-dimensional planarobjectswasproposed.
The inspection is based on aCAD model of the part. The system rst estimates the
positionofthepartfollowedbyalocalmeasurementofeachfeature. Intheexperiments,
itwasfoundthatcalibrationwasthemostlimitingfactorfortheaccuracyofthesystem.
Theauthordevelopedthesystem,performedtheexperiments,andwrotethemajorpart
ofthepublication.
In Publication III two new methods for matching binary images were presented. The
methods use the Hough transform to extract prominent line features which are then
used tondmatchingimages. Theauthorparticipatedin themethoddevelopment,the
experiments,andwritingthepublication.
InPublication IV theuseofGaborltersasdirectionsensitiveedgedetectorswasstud-
ied. Atranslation,rotation,andscalinginvariantdistancemeasurewasproposedforthe
Gabor features. Experimentswere performedonimagesofdigitstostudy therepresen-
tationpowerofthefeatures. Theauthorparticipatedinthedevelopmentofthemethod,
In Publication V the Gabor feature was applied to matching of binary line-drawing
images. Rotationandscaleinvariancewasdemonstratedtogether withverygoodnoise
tolerance against salt-and-pepper noise. The author of the thesis participated in the
development,programming,experiments,and writingofthepublication.
In Publication VI theperformanceof theHoughtransformwasinspectedin aordable
parallel environments. Aparallel HTalgorithmwas developedsuitablebothforshared
memory multiprocessorsand distributed memory workstation networks. Also, parallel
implementations of the RandomizedHough transformwere proposed. The approaches
improved the performance notablycompared to a sequential algorithm. However, the
approachesdidnotseemtobesuitableforlargescaleparallelisation. Theauthor ofthe
thesis developed the methods, participated in the programming and the experiments,
andwrotethemajorpartofthepublication.
Invariance in Object Recognition
The fundamental problem of object recognition is to analyse an image and provide a
high level symbolic representation of its contents. Each separate object in the image
should be localised, i.e., its location should be resolved, and classied, assigned to a
specic category. Inadditionto thethree-dimensionalstructureof anobject,there are
severalotherfactorsthataecttheimageoftheobject,forexample,lighting,perspective
projection, and sensor noise. Invariant object recognition aims to solve the problem
despitethesevariances. Inthiswork,thefocuswillbeonrecognitionofobjectsbasedon
theirshapebecausetheshapeisfairlyindependentof phenomenasuch as illumination,
contrast,andcolour.
In this chapter, invariance in object recognition is discussed. The rst section deals
with therole of invariancein general,concentratingonthe denition,taxonomies,and
advantages. Next, variations occurring in naturalimages are examined to dene more
precisely the types of invariance necessary for object recognition. In order to discuss
invariance, the idea of a class is also considered. After that, two groups of methods
for invariant object recognition are discussed. First, global invariants and invariant
transformsarepresented,formingtheoldestapproachtoinvarianceincomputervision.
Second,theuseoflocalfeaturesandmatchingisconsidered.Finally,theroleofinvariance
in humanvisualperceptionisbrieydiscussed.
2.1 The role of invariance
The term invariant refers to a representation of a signal that is constant under some
transformation. Forexample,themeasurementT of asignalx isinvariantundertrans-
formationg when[68]
Tfxg=Tfg[x]g: (2.1)
Usinggrouptheorythisdenitioncanbeextended [29,129]:
Assume that there is agroupG which acts upon theset X of possible signals. Let T
denote theactionofmeasurement. Then,if
Tfg[x]g=Tfxgh(g) 8g2G;x2X; (2.2)
T is invariant under the action of the group. The group G can be, for example, the
groupof translations. h(g)isafunction of g alone. If h(g)=1,T is ascalarinvariant
[29], as in (2.1). For example, the Fourier transform is invariant under the group of
translations, while its amplitude is scalar invariant. Invariants can be categorised as
strong and weak, according to whether the degree of the action g can be measured
from the invariant representation. The representation is called strong if it contains a
componentthatexplicitlyencodesthedegreeoftransformation[16]. Thus,accordingto
this denition,scalarinvariantsareweakinvariants.
Incomputervisionandobjectrecognition,invariantsprovideamethodtomeasureimages
while keeping the measurement unaected by known deformations or variations. For
example,inimagedatabases,geometricinvariantscanbeusedforindexingthedatabase,
allowingtheretrievalofanimagedespitegeometrictransformations. Ithasbeenargued
that object recognition is the search for invariants [127]. However, in this work, the
emphasis is not on the construction of invariant features, but on methods that allow
object recognitionin the presence of variations. This distinction is importantbecause
invariantrecognitionispossiblewithoutexplicitinvariants.
There areseveral taxonomiesforinvariantrecognition. Accordingto Wood [129],there
are two broad approaches: (1) Distinct feature extraction and classication; and (2)
parameterised invariants. Inthe rstapproach, each feature is itself invariantand the
classicationcanbeperformedbasedonthefeatureswithoutconsideringthesymmetries
in the features. With parameterised invariants, the parameters of the invariant are
adopted to perform the classication. Invariants can also be classied as global and
local. Global invariantsuse theknowledge of ashape asawhole, while local ones are
based onlocal properties such ascurvature. Mostglobal invariantsbelong to theclass
of distinctfeature extractionandclassication. Whenconsideringinvariantrecognition
of shape, the methods can also be classied as either region- or boundary-based [57].
Region-based methods use the whole region under consideration for determining the
feature whileboundary-basedmethodsconsideronlytheouterboundary. However,this
taxonomyappliesspecicallytotheglobalinvariantsofbinaryimagesandforthatreason
manymethodscannotbeclassiedusingit.
2.2 Variation in natural images
In this thesis, it is assumed that allobjects representinga class are rigid and visually
similar to averyhigh degree. Thus, theintra-classgeometricvariationsare verysmall
compared to the inter-class variations. In practice this means that only rigid objects
are considered. Inaddition,the classesareassumed tobedisjoint,that is, eachobject
canbelongtoasingleclass, andthereis noclass hierarchysimilar tothehumanvisual
system. Basedonthese restrictions,thevariationsarenowexamined.
Theprocessofimaginganobjecthasanumberofsourcesforvariationin theresultim-
age. Wechsler[126]statesthatthevariabilityisduetoperspective,changingorientation
list of variations: pattern translation, pattern distortion, perspective distortion, varia-
tionin background,partialocclusion,changeinpattern size,change inorientation,and
changein lighting. Thevariationscanbecategorisedaccordingtothecause. First,the
three-dimensional position of the imaging equipment with respect to the object. Sec-
ond, the change in environment, primarilythe lighting, but also encompassingchanges
inbackground. Third,noiseintheimagerandintheenvironment. Fourth,theocclusion
of thetargetobjectbyotherobjects. Traditionally,thestudy ofinvariantshasfocused
on thegeometricvariations,but itisimportantto notethat whilegeometric variations
are importantwhen describingthe perspective projectionof image formation,they are
onlyoneclassofimagevariation.
Geometric variationscan befurther categorised basedon theallowedtransforms. The
simplest variation is the translation of the object in the image. Translation together
with rotationformsthegroupofEuclideantransforms. Includingisotropicscalinggives
the group of similaritytransforms. The groupof similarity transformscan be used to
describetheprojectionofaplanarobjectwhichis orthogonaltotheoptical axisof the
imager. Similaritytransformsareextendedbyallowingnon-isotropicscalingandskewing,
forming thegroupofanetransforms. Themostgeneralgroupofgeometrictransforms
isthegroupofprojectivetransforms,whichrepresentsperspectiveprojections.
Changes in environment and the eect of noise are sometimes neglected in the study
of invariant recognition, although they denitely have an eect on applications. The
issue of nonuniform illumination isusually considered separate from other invariances,
but recentlyit hasbeenproposedthat geometric andilluminationinvariantsshould be
treated together[1]. Inthe methods presentedin this thesis, theobjectsare, however,
assumedto besegmentedfromthebackground,asstatedintheintroduction.
In complex environments, invariant recognitionin 3-D is also highly dependent on an
ecient world model. Without a model that can explain the overlap of objects, the
problem ofocclusionisverylikelytobeimpossibletosolve. Also,theworldmodelcan
givevaluablecuesforsegmentation,especiallyin multi-objectcases.
Therobustnessofanobjectrecognitionsystemisanimportantconsideration. Tobee-
cientinapplications,invariantrecognitionhastobestable,general,andcomputationally
ecient. Stabilitymeans thatsmall changesin environmentshould nothavearemark-
able eect on theresult because in naturalscenes several typesof variationsare likely
to occur. Inadditionto invariantrecognition,thesystemshouldbeabletocharacterise
thevariabilitybecausethevariabilityitselfcarriesimportantinformation[126].
2.3 Global features and invariant transforms
The twomost famous invariant transformsin image processing are basedon moments
andtheFourier-Mellintransform. Momentinvariants,presentedbyHuin 1962[39], are
the rst generalinvariants. The geometric moments of order (p+q)of f(x;y) canbe
dened as
m
pq
= Z
1 Z
1
x p
y q
f(x;y)dxdy; p;q=0;1;2;: (2.3)
Byshiftingtheimagecentroidtoorigin,centralmoments
pq
canbedened as
pq
= Z
1
1 Z
1
1 (x x )
p
(y y) q
f(x;y)dxdy (2.4)
where the originis (x ;y) =(m
10
=m
00
;m
01
=m
00
). Central moments can be normalised
to obtainscaleinvarianceas
pq
=
pq
00
; =(p+q+2)=2: (2.5)
Hupresentedthatcertaincombinationsofthenormalisedcentralmomentsareinvariant
under theclassof similarity transforms. TheHu'smomentinvariantsupto thirdorder
are
1
=
20 +
02
2
=(
20 +
02 )
2
+4 2
11
3
=(
30 3
12 )
2
+(
03 3
21 )
2
4
=(
30 +12)
2
+(
03 +21)
2
5
=(
30 3
12 )(
30 +
12 )
h
(
30 +
12 )
2
3(
21 +
03 )
2 i
+
(
03 3
21 )(
03 +
21 )
h
(
03 +
21 )
2
3(
12 +
30 )
2 i
6
=(
20
02 )
h
(
30 +
12 )
2
(
03 +
21 )
2 i
+4
11 (
30 +
12 )(
03 +
21 ):
(2.6)
Geometricmomentinvariantsdonotdecodethedegreeoftransformationandarethere-
fore scalarinvariants. Aneinvariant combinationsof geometric moments were intro-
duced laterin [96]. Atreatiseonmomentinvariantscanbefoundin [108]whereseveral
types of momentsare analysed, in addition to the geometric momentsby Hu. Zernike
and pseudo-Zernikemoments were found to havethe best overall performance [108] in
termsof noisesensitivity, informationredundancy,andimage representationcapability.
Surveysofmomentbasedtechniquesforobjectrecognitioncanbefoundin[95] and[5].
The Fourier-Mellintransform was presented in the context of invariantrecognition by
CasasentandPsaltisin[18]. TheFouriertransformcanbedened as
Fff(x;y)g=F(u;v)= Z
1
1 Z
1
1
f(x;y)e
j2(ux+vy)
dxdy (2.7)
where j istheimaginaryunit. Itistranslationinvariantsince
Fff(x+x
0
;y+y
0 )g=
Z
1
1 Z
1
1
f(x+x
0
;y+y
0 )e
j2(ux+vy)
dxdy
=e j2(ux
0 +vy
0 )
Fff(x;y)g:
(2.8)
Thus,theamplitudeisinvarianttotranslationswhilethephaseshiftcodesthedegreeof
translation. Mappingthe amplitudeintolog-polar coordinates transformsrotationand
scalingintotranslations[126]:
=tan 1
v
t=log
p
u 2
+v 2
: (2.9)
Applying theFouriertransformagain and rejectingthe phasegivesa signaturethat is
invariant to translation, rotation, and isotropic scaling, that is the class of similarity
transforms. This process is known as the Fourier-Mellin transform. Using a log-log
mappingofthefrequencyspaceinsteadofthelog-polarone,
s=logu t=logv (2.10)
makestheresultinvarianttoanisotropicscaling,buttherotationinvarianceislost[123].
MomentinvariantsandFourier-Mellinsignaturesarebothregionbasedmethods of de-
scription. Severalglobalcontourbasedinvariantshavebeenproposed,themostnotable
beingperhapstheFourierdescriptors[133]. Thelistofpointsrepresentingaclosedobject
boundarycanbepresentedasacomplexfunctionofonevariableas
u(n)=x(n)+jy(n); n=0;1;:::;N 1: (2.11)
ThediscreteFouriertransformcoecientsa(k),
a(k)= N 1
X
n=0 u(n)e
j2 k n
N
; k=0;1;:::;N 1; (2.12)
are called theFourierdescriptors. Theyallowrecognitioninvariantto similaritytrans-
forms,sincerotationcausesonlyphaseshift,translationchangesonlytheDC-component
a(0), and scaling scalesthe coecients correspondingly. Fourierdescriptors havebeen
extendedtofullaneinvariantrecognitionin[2]. Otherboundarydecompositionshave
also been used. For example, awavelet decomposition of theboundary hasbeenused
to realisesimilaritytransforminvariantrecognition[111] andfullaneinvariantrecog-
nition [57]. The main problem of theglobal boundarybased methods is that they are
usually restrictedtoobjectswhichhaveasingle,clearouterboundary.
Neuralnetworkscanbealsousedto obtaininvarianceinrecognition. Thereseemtobe
twogeneraltypesofneuralnetworkapproach: (i)usinginvariantfeaturesasinputtoa
neural network,and (ii) obtainingtheinvariance in the network by learning[90]. The
formerapproachdoesnotdiermuchfromconventionalinvariantfeatureextraction,but
the latterone is interestingbecause theinvarianceis inherent in thenetworkstructure
and weights. The Neocognitron model by Fukushima is one of the earliest successful
attempts to perform distortiontolerantrecognition[30]. Theinvariancesdependsolely
onthetraininginputsofthenetworks,andthustheuseoflearnedinvariancesingeneral
purposetasksisnotverystronglyjustied. However,thetangentpropagationalgorithm,
introduced by Simard[103], trainsaneural network to beinvariant to aknown trans-
formation of input by minimising the gradient of the network in the direction of the
transformation. Thus,thenetworkcanbemadeinvarianttoanalytictransformationsof
theinput.
Someproblems arerelatedparticularlytoglobalinvariantfeatures. First,globalinvari-
ants do nottolerate occlusion, asthey assume that the whole object orshapecan be
observed. Second,global invariantsrequiresegmentationifthere areregionsof theim-
agethatdonotbelongtotheobjecttoberecognised. Thislimitation,ofcourse,canbe
overcomebyasegmentation,equivalenttoobjectdetection[47]. However,itisnotclear
ifobjectdetectioncanbeperformedseparatelyfromrecognition. Finally,therecognition
ofespeciallycomplexobjectsisoftensensitivetonoiseanddistortionsasasmallchange
2.4 Local features and matching
The methods for invariant recognition that do not use global invariants will now be
considered. The main body of the research is related to methods that try to achieve
invariance through matching local features to a known model. The methods usually
havetwodistinctparts,featureextractionandmatching. Thesephaseswillbeconsidered
separately astheydonotusuallydepend oneachother.
The rst approach that will be considered, template matching, does not havedistinct
feature extraction since the gray-level values are used directly in the matching. It is
worth mentioning as it is one of the oldest techniques for object recognition [97]. In
templatematching, thecross-correlationbetweenanimageandatemplateiscomputed.
Thelocationofthetemplateisthen foundasthelocationof themaximumcorrelation,
and the goodness of the match is related to the normalised value of the correlation.
Templatematchingdoesnotitselfproduceinvariancetogeometrictransformsotherthan
translation. However,rotationinvariantrecognitionhasbeenrealisedbyconvolvingan
image with a radially symmetric lter to select possible candidates [20]. Also, if the
position of the objectis known, rotationand scale invariant recognitionis possible in
log-polarcoordinates. Inaddition,templatematchingissometimesusedasaverication
stepafterahypothesisoftheobjectposehasbeenestablished. Whiletemplatematching
can be made moreecient using Fourierand principal component transforms[116] or
wavelet decomposition [114], it is nevertheless usually considered computationally too
costlyforinvariantrecognitionbecauseseveraltemplates areoftenneeded.
Principalcomponentanalysis(PCA)hasbeenappliedinobjectrecognition,forexample
in the eigenfacesapproach [105], but PCA isfundamentallyaglobal method, and thus
doesnotoerrobustnessagainstchangesinlocalisedregions. PenevandAtick[89]have
proposed Local Feature Analysis (LFA), which modies principal component analysis
by deriving local topographic receptive elds optimally matched to the second-order
statisticsofinputdata.
Twobasicapproachescanbeusedforthematchingoflocalfeaturesbetweenaninputand
amodel. First,pairwisematchingofeachfeatureontheinputtoafeatureinthemodel,
and second,optimisation of transform parametersthat brings the features in the best
match. Whenthematchingoflocalfeaturesisformulatedasagraphmatchingproblem,
eachfeature intheinputandthemodelareencodedasverticesofagraph[11]and the
matchingcorrespondstondingthesubgraphisomorphismwhereanerrorfunction has
itsminimum. FindingthesubgraphisomorphismisanNP-completeproblem,thusthere
arenoecienttechniquesforthegeneralcase. However,inimageanalysisthegraphsare
usually topological, that is, they representthe two-dimensionaltopology of theimage,
whichreducesthecomplexityofthematching[15]. Thesearchspacecanalsobepruned
usingvariousheuristics,forexample,relativegeometricconstraintssuchasdistanceand
angle betweengraphvertices[11], matchingof theattributesat graphverticesbetween
themodelandtheinput[42],matchingofvertexattributeswithahashlookup[66,67],
oremphasisingthetopologicalcoherence[101].
A limited matching procedure can be used to nd likely candidates for the transform
parameters as in [131] where only the graph vertices on aconvexhull are used in the
preliminarymatching,asthepointsontheconvexhullareinvariantinanetransforms.
Anotherformulationforgeometricinvariantsisanoptimisationofthetransformparam-
eters. Fortheoptimisationofparameters,least-squares[67]andevidencegathering[115]
methods have been used. In Publication II, translation and rotation of an object are
solvedthroughtheminimisationoftheleast-squaresttingerrorbetweenthemodeland
the input. Inevidence gathering, local matchingevidences are accumulatedto makea
global pose hypothesis. Evidence gatheringmethods for objectrecognitioninclude the
Generalised Houghtransform(GHT) [4],poseclustering [106],and evidenceaccumula-
tion[49, 77,88]. Evidencegatheringwillbeconsideredfurther inChapter3,where the
Houghtransformispresented.
Featureextractionisalsonecessaryforanobjectrecognitionsystemthatusesmatching
other than template matching. One of the simplest features to use are edge points.
Theyare oftenusedwithHoughtransformmethods. Interestpoints,such ascurvature
maxima[48]andcornerscanalsobeused[131]. Leeetal.proposetheuseoflinesegments
[72]. InPublication II, line segmentsare also used. More complexfeatures likeGabor
lter responses [15, 88] and local Fourier-Mellin descriptors [34] have also been used.
Lowe [77] proposes to use maxima and minima of adierence-of-Gaussian function in
a multi-scale grid to determineinterest points, and then use thelocal gradientsof the
neighbourhoodasthefeature. Forthelocalfeatures,twodistinctpropertiesaredesired:
distinctiveness againstfalsepositives,androbustnessagainstvariations[88]. Itis clear
that by increasing distinctiveness, robustness is lost, and vice versa. More complex
features oftenprovidemoredistinctivenessbut lessrobustnessand largecomputational
costs. The mostimportant requirementforafeature isthat it should remainthesame
regardlessofthevariations. Inaddition,itshouldprovideinvariantlocalinformationto
allowthepruningofthesearchspace.
Theproblemofpartialocclusioncanbesolvedtosomedegreebyasystemthatachieves
invariancethroughlocalfeaturesand matching. However,areliableworldmodelisstill
necessaryfor 3-Dinference andmissing data handling. Theproblems that arise in the
matchingoflocalfeatures areoften relatedtocomputational complexity. Forexample,
graphmatchingandHoughtransformarebothcomputationallyquiteexpensivebecause
the search space is large. A second problem is the selection of the local features. An
important question is, which kind of features are able to represent a wide variety of
naturalscenes.
In Publication II, acomplete recognitionsystemispresentedthat employsrotationin-
varianttemplate matchingfordeterminationofaninitialestimateofatwo-dimensional
objectposition. Then,localfeatureextractionisperformedbasedontheinitialestimate
to extract straight edges and circular shapes based on the CAD model of the object.
Finally,aleast-squaresttingisperformedforthelocalfeaturesand,basedonthepose
estimate, the dierence between the object and the model is determined for each line
segmentandcircularshape.
2.5 Object recognition in human visual system
Becauseinvariantrecognitionisevidentlyanunsolvedproblem,itisnaturaltotakealook
at oneknownsolution,thehumanvisualsystem. It shouldbenotedthatthestructure
Retinal
Image based
Image−
Processing Processing based
Surface− Object−
based
Processing Processing based Category−
Figure2.1: Fourstagesofvisualprocessing[86 ].
capturethefullpowerofshaperecognitioninhumanvision. Thefollowingtheoriesare,
however,basedonobservationsin cognitivescienceandpsychology.
At leasttwocoincidentstructures aresupposed to appearin thehumanvisual system,
thesequentialvisualperceptionprocessandparallelpathways. Visualprocessingcanbe
divided into four stages: image-based, surface-based, object-based, and category-based
[86]. As illustrated in Fig. 2.1, the stages are sequential but feedback may exist to a
preceding stage. The image-based processing is supposed to extract two-dimensional
primitives of the retinal image. The surface-based processing attempts to recover the
propertiesofsurfacessuchaslocalsurfaceorientationanddepth. Thisstageisnotfully
three-dimensional but employs local cuesaboutthe three-dimensionality, and for that
reasonitis oftencalled 2:5-dimensional. Theobject-basedprocessingis thenconcerned
withthefactthatanobjectisathree-dimensionalentitywhichcanhaveatotallydierent
two-dimensional shape from dierent viewpoints. Finally, the category-based stage is
mainly concerned with recovering the functional properties of an object by placingit
intoknowncategories.
In computervision,theterm classicationcanberelatedto either object-orcategory-
based stages. Thecategories in computervision areusually simpleand disjoint, while
in the human visual system there is essentially a hierarchy of classes. Thus, a single
objectcanbecategorisedatseverallevels. Ithasbeensuggestedthatthereareseparate
processesinvisualsystemfordierentlevels. Itisalsoimportanttonotethat thefocus
of researchinthecategorisationbehaviourofthehumanvisualsystemhasbeenonthe
entrylevelcategories,thatis,thosecategoriesthatcanbedeterminedmostquickly,while
the computer vision has been traditionally related to subordinate level categorisation,
that is, the categorieswhich aremorespecic. For example, atypical computervision
application could bethe classicationof dierenttypes of bottles in abottle recycling
machine. However,foraperson,theentrylevelclassbottle wouldbethemostnatural
one.
In addition to the sequential process framework, there is physiological evidence that
there areparallelpathwaysin earlyvisualprocessing. Theproposalsuggeststhatthere
are separatepathwaysfor such featuresasform,colour,motion, anddepth [75]. It has
beenshownthatsignicantcross-talkexistsamongthepathways[119],butnevertheless,
parallel processingseemstoexist inthelowerlevelsofvisualprocessing.
The issue of object representation in the human visual system is under research but
twomainapproachescanbeseen,structuraldescriptionsandview-basedrepresentations
[107, 86]. Structuraldescriptionsdescribeobjectsascompositionsof three-dimensional
pearance oftheobjectfrom asingleview,while three-dimensionalrecognitionisincor-
poratedbyusing multipleviews.
Thisthesisconcentratesonrecognitionbasedonthetwo-dimensionalshapeandforthat
reasonthe discussionwill benowlimitedto shaperecognition. Palmer[86]approaches
the question of shape recognitionfrom two distinct points of view, shape equivalence
and shape representation. Shape equivalence is the phenomenon that two shapes are
perceived as having the same shape. This theory is mainly concerned with perceived
shapeand doesnotspecifytheprocesses thatare used in determiningtheequivalence.
An important nding is that while the perceived shape often does not depend on the
orientation, in some cases it does [79]. Thus, the human visual system is not totally
invarianttorotationseveninthecategoricallevel.
Shaperepresentationconsidershowtheshapeofobjectsmightberepresentedwithinthe
visual system and how such representations might be compared. Palmer presents the
following as the four major theories of shape representation: a) templates, b) Fourier
spectra,c) featurelists, andd)structuraldescriptions. It should benotedthatnoneof
these hasbeenacceptedgenerallyasthedenitivetheoryofshaperepresentationin the
humanvisualsystem,but theyprovidesomeinsighttotheissueofinvariance.
Templatescorrespondcloselytothetemplatematchingapproachincomputervision. In
biologicalvision, templates areseenasuseful in providing simplelow-levelinformation
aboutobjectpartsbutlackingtheversatilityrequiredfortherepresentationofcomplete
shapes. This is in accord with themethods of local features and matching, where the
localfeaturescanbeextractedusingatemplatematching typemethod.
The Fourier spectrum has been suggested as the shape representation [33, 50]. Early
studies suggestedthe powerspectrum as the representation. This corresponds closely
to the use of Fourier transform in computer vision to obtain translation invariance.
However,laterstudieshaverevealedthat thephasespectrumdominatestheperception
[91]. This resultmay indicate the importance of phase information in Fourierdomain
feature extraction in computer vision. However, the existence of localised frequency
sensitive cells has been shown at the lower levels of processing in mammalian visual
systems[40,41].
Thefeaturelistsapproachstatesthatanobject'sperceivedshapecan bedescribedbya
set offeatures, suchasthenumberoflinesandtheirrelativelengths. Thefeaturesneed
tobeinvarianttovariations,relatingthisapproachtotheuseofglobalinvariantfeatures.
However,thebasicapproachofasetoffeatureshasaawbecausetheinterdependencies
betweenthefeaturescannotbetakeninto account. Tosolvethisproblem,featuremaps
havebeenproposedthatincludeadditionalconstraintstothemutualpositionsoffeatures
[76, 117]. However, pure feature-based approaches are often considered inadequate as
describing the shape of an object often requires not only the components and global
propertiesbutalsothespatialrelationsbetweentheparts.
Theneedforrelativeinformationgivesthemotivation forstructuraldescriptions. They
can becompared to graphsin which verticesrepresent objects parts and labelled arcs
representtherelations[85]. Therepresentationisstrongasthegraphstructureallowsthe
relationstobeseparatefromtheparts. Thestructuraldescriptionsarecloselyrelatedto
thegraphbasedapproachesincomputervision. Asalsoincomputervision,theweakness
sucientlypowerfulprimitives and relations needsto be identied. Forexample, it is
veryhardto ndasetofsimpleprimitivesthatcoulddescribeahuman.
Hough Transform
In 1962, Paul Hough presented amethod for detecting straight line tracks of charged
particles in abubble chamber [38]. In the1970's the method gained popularityin the
computervisioncommunityandbecameknownastheHoughtransform(HT) 1
[27]. In
1981,Deans[25]showedthat themathematicalrootsofHTliein theRadontransform,
as the Hough and Radontransforms are essentially the samein certain circumstances
[94]. TherearenumerousdierentvariantsofHT[44,71,52].
While Paul Hough presented the method as a means to detect straight lines, today
theHoughtransformis consideredmoregenerallyasatoolfordetectingparameterised
shapes. HT processesbinarised images. Usually,thebinaryimageisproducedusing an
edge detectionmethod such as Prewitt [93] or Canny [17] edge detectors. The Hough
transform is based on the principle that each edge point in the image gives positive
evidence for curvesthroughthat point. This evidence is accumulatedforall points. If
there aremanypointsgivingevidenceforaparticularcurve,thenthatcurveislikelyto
exist intheimage.
Inthischapter,theHoughtransformalgorithmforndinglinesegmentsisrstpresented
with its benets and pitfalls. Then, the problem domain is extended to cover more
complex shapes and an application to image matching. Finally, improvements to the
computational complexity are overviewed with emphasis on probabilistic and parallel
methods.
3.1 Hough transform for line detection
The Hough transform is essentially a transform from an image space to a parameter
space. Theparametersofcurvesthatarepresentintheimageappearasmaximain the
parameterspace. Forexample,alinecanberepresentedintheslope-interceptformas
y=ax+b: (3.1)
1
ThetermsHoughtransformandstandardHTareusedtorefertothesameconcept.
y
x y
y b y
a x
x
x
a b
Figure3.1: Projectionofapointisalineintheparameterspace. Thelinerepresents
allthelinesthroughthe pointintheimage. Theintersectionoflinesintheparameter
spacerepresentsthelinedenedbythecorrespondingpoints.
Inthisequation,xandydenotethecoordinatesofapoint,aistheslopeofaline,andb
istheinterceptonthey-axis. Byrearranging(3.1),itcanbeseenthat eachpoint(x;y)
in theimagedenesauniquelineintheparameterspace(a;b)as
b= xa+y: (3.2)
Whenseveralpointslieonthesamelineintheimage,thecorrespondingparameterspace
lines intersect. As illustrated in Fig. 3.1, the line can be detected as the intersection
in the parameter space called the accumulator. Each coordinate pair (a;b) denotes a
specic line in the original image. To nd the intersection in the parameter space, a
procedurecalledaccumulationisnormallyused. Theparameterspaceisrstdiscretised,
that is, it is divided into a nite number of non-overlapping cells covering the whole
parameter space. Then, accumulationis performed by transforming each image point
into a parametric curve and incrementing counters associated with those accumulator
cells forwhich thetransformedlocusintersects the cell. Inthis way,each point in the
image givespartial evidencethat acertainfeature, with certainparameters,ispresent.
Finally, local maxima are searched for in the accumulator. The locations of the local
maxima specify the parameters of apparent lines. In addition, a verication may be
performedwhereeachlocalmaximaischeckedtoassesswhether itreallyrepresentsthe
desiredfeaturein theoriginalimage.
Theaccumulationcanbespeciedmorestrictlyasfollows:
LetDdenotethesetofpointsintheimage. LetAdenotetheaccumulatorandSrepresent
the indices usedtoindex the accumulator. LetT bethe functionthat projects apointin
the image toacurve, thatisasetof indices, inthe parameter space.
1. InitialiseA(s)=0;8s2S.
2. For alldin D
2.1 For eachsin T(d)
2.1.1 Increment A(s)=A(s)+1.
Theslope-interceptparameterisationof(3.1)hasaprobleminthattheparameterspace
isnotbounded,sinceforaverticallinebothparametersapproachinnity. Thisproblem
can be overcomeby using twoaccumulators, one for lines with slope jaj 1and the
other forlineswithjaj>1. Anotherparameterisation,thenormalparameterisation,
=xcos+ysin (3.3)
was introduced by Duda and Hart [27]. As illustrated in (3.2), represents theangle
betweenthenormalofthelineandthex-axisandisboundedtointerval[0;),whileis
thedistance fromtheline totheoriginanditsrangedependsontheimagesize. Other
boundedparameterisationsalsoexistforlines,forexamplethepointsofintersectionswith
the edges ofthe image canbeused [121]. However, theboundedness ofthe parameter
space seems to be the most important restriction for the selection of the parametric
representation.
θ ρ
x y
Figure 3.2: (;)-parametrisationofaline.
Because only positive evidence is considered, HT is very robust against missing data,
i.e. image points. This robustness is considered to be one of the main advantages of
the Hough transform. The main drawbacks of the HT are related to computational
complexity in terms of both time and space. The computation time is related to the
numberofedgepointsintheimagetimesthelargerdimensionoftheaccumulator. Thus,
thediscretisationoftheparameterspaceaectsboththetimeandspacecomplexity. In
addition,thediscretisationcanpresentdicultiesintheimplementationofthealgorithm
3.2 Extending the problem domain
3.2.1 More complexshapes
Kimme et al. were some of the rst researchers to extend the problem domain from
the detection of lines to more complex shapes by presenting a HT based method for
circle detection in 1975 [58]. Inaddition to using morecomplex analytic shapes, they
discoveredthat theedge direction information couldbeused to decrease thecomputa-
tional complexityofthetransform. In1981,Deanssuggestedaframeworkin which the
Hough transformmaybegeneralisedto detect any analyticallydened curves. Thisis
accomplishedbyrstdening HTasanintegraltransformas
H(;)= ZZ
1
1
I(x;y)Æ( xcos ysin)dxdy: (3.4)
Now,I(x;y)istheimagewith edgepointshavingavalue1andallothers value0. Æ()
is the Diracdelta function. Theargumentof the delta function evaluatesto zero only
onthelinewithparameters(;). Deansproposed[25]that (3.4)canbegeneralisedas
H(p;)= ZZ
1
1
I(x;y)Æ(p C(x;y;))dxdy (3.5)
whereI(x;y)isafunctiondened onthe(x;y)-planerepresentingtheimage. Theargu-
ment ofthedelta function denesafamilyof curvesparameterisedbyascalarpanda
vector =(
1
;:::;
n 1
). With two-dimensional parameterspace (asin lines), apoint
(x;y)isprojectedintoacurveparameterisedbyscalar. Inmorethan2dimensions,the
pointisprojectedintoann 1-dimensionalhypersurfaceinthen-dimensionalparameter
space. Foreachpoint(x;y), theaccumulationis performedbycalculatingthevaluesof
p foreach valueof thevector and incrementing thecorresponding accumulator cells.
Now, it is evidentthat thespace complexityof the accumulator is O(m n
) where m is
the number of indices for each dimensionof the accumulator. The time complexityis
O(m n 1
)beingalsoexponential. Finally, nding themaximainvolvessearching in the
n-dimensional accumulator. Altogether, thecomputation becomes unrealistic for large
n. As alreadypresented, additionalinformation such as edge direction canbeused to
constrain thecomputation. Intermsof themethodology ofDeans, this correspondsto
usinginformationbesidesthecoordinatesofasinglepixeltolowerthedimensionalityof
. This canbeformulatedas
H(p;)= ZZ
1
1
I(x;y)Æ(p C(x;))dxdy (3.6)
where pisthevectorofparametersthatcanbesolvedusing x=(x;y;:::)and, that
is, an additionalparameter canbe solved for each additional elementof x. Thus, the
dimensionalityof isreducedwhichreduces thecomputationalcomplexity. Inpractice,
the use of the edge direction is problematic because the estimation of the gradient is
pronetoerrorsduetonoise.
The previous formulation of HT is valid for analytic curves. Nevertheless, already in
1975,MerlinandFarberpresentedaHoughtransformtypealgorithmfordetectingnon-
the edge image and is impractical for real image data. In 1981, Ballard presented the
GeneralisedHoughtransform(GHT), whichallowsthedetectionofarbitraryshapes[4],
giving also the theory for extending the method for rotation and scale invariance. In
GHT, theedge directionis usedtoconstrainthesearch. However,theparameterspace
is still four-dimensional if therotation and scale invariances are allowed. Ballard also
commentsthat pairsof edge points couldalso beused to constrain thesearch but the
commentisleftatthat. GHThasbeenlaterimprovedbyusinglocalcurvatureandslope
[78],higherlevelprimitives[72],relativeangles [110, 56],and edgepoint pairs[100]. A
comparison ofthevariantscanbefoundin [56].
3.2.2 Invariant imagematching
TheaccumulatorofthestraightlineHoughtransformhasalsobeenusedasadescription
of an image content. In Publication III, a translation, rotation, and scaling invariant
imagematchingmethodispresentedwhichusesHTasafeatureextractionmethod. An
invariant feature vector can be constructed as follows: Let the (;) accumulator be
presentedasamatrixA,whereeachrowcorrespondstoaparticularvalueof,andeach
column toavalueof. First,thematrixisthresholdedas
A 0
ij
= (
A
ij
; ifA
ij
>t
0; ifA
ij t
8i=1:::m;j=1:::n (3.7)
wheretisthethresholdvalue. Next,thethresholdedmatrixisshrunktoaone-dimensional
vectorbysumming thematrixacrosscolumnsas
F 0
j
= m
X
i=1 A
0
ij
8j=1:::n: (3.8)
This vectoristhennormalisedwithrespectto themeanas
F
j
=F 0
j 1
n n
X
j=1 F
0
j
8j=1:::n: (3.9)
Intuitively,thevectorcanbethoughtofas anorientationhistogramoflinesexceedingt
in length. Thevectoristranslation invariantbecauseonlyangular information isused
to build it. The normalisation makes the vectorinvariantalso to scaling. Rotational
invariancecanbeobtainedbyusingarotationinvariantdistancetomeasurethesimilarity
betweentwofeaturevectors. IfRandS arethefeaturevectors,theirrotationinvariant
distance canbecalculatedas
d(R ;S)= min
k =1:::n n
X
j=1
R
j S
(k )
j
2
(3.10)
whereS (k )
isacyclicrotationofthevectorsbykstepstotheright. Theproblemofthis
approachisthatitcannotbeusedforlargeandcompleximagesbecausetheorientations
of lines donotcarry asucientamountof informationfor thematching. Themethod
canalsobemodiedtoinclude spatialinformationbyincorporatingthe-dimensionof
3.3 Computational improvements
As statedin theprevioussection,oneofthemain problems ofHTisits computational
complexity. Several lines of research have emerged to address this problem [71, 52].
The improvements may be based on deterministic versions of the original algorithm,
probabilisticmethods,orusingparallelhardware. Deterministicmethodsaimtodecrease
thecomputationtime,forexample,byrecursivepartitioningoftheaccumulatorasinthe
FastHoughtransform[74] andtheAdaptiveHoughtransform[43]. Otherdeterministic
methodsincludeusingconstraintsonparametercalculation,aspresentedintheprevious
section,and multi-stagemethods. However,themostactiveline ofalgorithmicresearch
seemstobetheprobabilisticmethods,whichuserandomsamplingforselectingasubset
oftheinputdata.
3.3.1 Probabilistic methods
Probabilistic Hough transforms are a family of methods that apply random selection
in orderto reduce thecomputational complexity. AlreadyBallard[4] commentedthat
severalpointscanbeusedto restricttheaccumulation. TheRandomizedHoughtrans-
form [130] (RHT) is amethod that isbased onthe selectionof n pointsat randomto
map themintoasinglepointin theaccumulator(see Fig.3.3). However,theuseof all
possiblepointsetswouldleadtoacombinatorialexplosion.Forthisreason,therandom
sampling andaccumulationiscontinuedonlyuntilanaccumulatorcellexceedsaprede-
ned thresholdvalue. Then,thatcellpresentsacandidatefeaturethatisveriedin the
image. Ifafeatureisfound,thepointscorrespondingtoitcanberemovedandtheaccu-
mulationcanbecontinued. This method correspondsto aMonte-Carlo approximation
of the integral in (3.6). Thus, when alarger thresholdis selected, it is moreprobable
that themaximumfoundcorrespondstoamaximumof thestandardHoughtransform.
TheRHTalgorithmcan bepresentedasfollows:
Algorithm2 KernelofRandomizedHoughtransform.
Lett denote the detection threshold.
1. Create the setD of alledge pointsinabinary image.
2. Selecttwopointsd
i andd
j
fromD atrandom.
3. Calculatethe lineparameters (a;b) corresponding tod
i andd
j .
4. AccumulateA(a;b)=A(a;b)+1.
5. If athreshold is notreached, A(a;b)<t, go toStep 2. Otherwise, the parameters
aandb describethe parameters ofthe detectedcurve.
IntheProbabilisticHoughtransformbyKiryatietal.[59],asubsetofpointsissampled
andastandardHoughtransformisthenperformed. Intheframeworkof(3.6),thiscor-
respondsto computingtheintegralonlyoverarandomsubsetofthexy-plane. Because
theHoughtransformtoleratesmissingdata,thismethodperformswellaslongasenough
y b
a x
Figure3.3: Inthe RandomizedHoughtransformfora line,a pairofimagepointsis
mappedtoasinglepointintheaccumulator.
randomseedpointwithallotherpointstoperformtheaccumulationina1-dimensional
accumulator. Thus,iftheseedpointis partof aline, itisfound atthisstage. Because
the line must be throughthe seed point, only a1-dimensional accumulator is needed.
TheDynamicGeneralisedHoughtransform[70]isanextension,whereaconnectedseed
pointisrst determinedandthen aproceduresimilar toRHT isfollowed. IntheCon-
nectiveHoughtransform[132],seed pointsareused with arow-wiseaccumulationof a
1-dimensionalaccumulator.
Probabilisticmethodscanalsobeusedinconjunctionwithlocaldata,suchasthegradient
direction. TheRandomizedGeneralisedHoughtransform[31]combinestheeciencyof
RHTwiththeabilitytodetectgeneralshapesofGHT.AsinGHT,thegradientanglesare
locallyestimatedfromagraylevelimage. IntheWindowRandomizedHoughtransform
[52, 51], awindowis placedovera randomly selected point and a line is tted to the
pointsinsidethewindow. Ifthettingerrorissmall,theaccumulatorcellcorresponding
to theline parametersis accumulated. The ConnectiveRandomized Hough transform
[51](CRHT)is asimilarmethod that alsorestrictsthettedpointsto beconnectedto
thecentrepoint. Inaddition,thesearchforconnectedpointsisperformedasasectored
scan,whichmeanslimitingthesearchdirectionstothethreeadjacentcompassdirections
eachtimetoreject pathswithloopsandsharpbends.
In Publication I,an extensionto CRHT ispresentedthat loosens therequirementof a
connectedlinebyallowinggapsin theline. ThealgorithmfortheExtendedConnective
RandomizedHoughtransform(ECRHT) canbepresentedas:
Algorithm3 KernelofExtendedConnectiveRandomizedHough transform.
1. Create the setD of alledge pointsinabinary edgepicture.
2. Selectapointd
i
randomlyfromthe setD.
3. Create a ww window centred atd
i
, andlocate points connectedtod
i
using the
maximumgapcriterion andthe sectoredscan.
4. If enough connected points are found, t a curve to those points using the least
square sum methodand calculate the line parameters (a;b); Otherwisego to Step
y b
a x
Figure 3.4: In the Extended ConnectiveRandomizedHough transform, local tting
arounda singleimagepoint isusedto present evidenceabout a particularparameter
combination.
5. Ifthe ttingerroriswithinatolerance,accumulatethecellA(a;b)in theaccumu-
lator space; Otherwisego toStep2.
6. IftheA(a;b) isequaltothe thresholdt,theparametersaandbdescribetheparam-
etersof the detectedcurve; OtherwisecontinuetoStep 2.
Theparametersaandb arecalculatedfrom theleast-squaresapproximation
a = n
P
x
i y
i P
x
i P
y
i
n P
x 2
i (
P
x
i )
2
; (3.11)
b = P
y
i a
P
x
i
n
(3.12)
where x
i and y
i
denote the coordinates of pixel i and n denotes thenumberof pixels.
Usingthettingerrore,
e= a
2 P
x 2
i +
P
y 2
i 2a
P
x
i y
i +2ab
P
x
i 2b
P
y+nb 2
n(a 2
+1)
; (3.13)
onlyreliableestimatesofthelocalgradientareaccumulated. Theideaisthataccumulat-
ingonlygoodlocalestimatesdecreasesthetotalnumberofaccumulationscomparedfor
exampletotherandomisedHoughtransform. Thisimprovestheperformanceespecially
withimageswithmanylines. Thelocallinesegmentsdonotneedtobeverylongasthey
donotyetqualifyaslinesbutjustpresentevidenceaboutaline(seeFig.3.4). Compared
toCRHT,themainadvantageofECRHTisitstolerancetomissingdata. Comparedto
RHT, theadvantageisthedecreasedamountofcomputation andstoragebecauselocal
information is used to verify accumulations. The main disadvantageis that if a large
amountofdataismissing,thealgorithmwillfailbecausesucientlocalinformationwill
notbeavailable.
3.3.2 Parallel processing
The parallel Hough transformalgorithms havebeen studied widely for bothdedicated
000 000 000 000 000 000 111 111 111 111 111 111 000 000
000 000 000 000 111 111 111 111 111 111
000 000 000 000 000 000 111 111 111 111 111 111
000 000 000 000 000 000 111 111 111 111 111 111
y
y b
a x
x
a b
Figure3.5: DivisionofthedataandcomputationinparallelHoughtransforms:Either
sourceimageortheaccumulatorcanbedividedbetweenprocessingelements.
canbecategorisedintotwoclassesdependingonthedivisionofthedataandcomputation:
either (a)thesourceimage,or (b)theaccumulatorissplitbetweentheprocessors. This
is illustrated in Fig. 3.5. The dierences in these approaches have been evaluated by
Thazhuthaveetilet al.[109]. An approach tosplit boththe imageandtheaccumulator
hasalsobeenproposed[9].
Special hardware architectures include mesh-connected SIMD processors[98], discrete
logic [23], transputers in ahypercube [22] recongurableprocessorarray [55], recong-
urable multi-ring network [10], and VLSI contentaddressable memory [84, 82]. These
specialisedhardwaresolutionscanoerthehighestperformanceatthetimetheyarebuilt
but therapid advance of computertechnology makesthem growold almost overnight.
Forthisreason,moregeneralparallelarchitecturesareofgreatinterest.
GeneralpurposehardwarearchitecturesfortheHoughtransformincludesharedmemory
multiprocessors[21],distributed memory multiprocessorslike CrayT3D [22],IBM SP2
[60],andFujitsuAP1000[118], andgeneralpurposedigitalsignalprocessors[9,23,63].
An implementation ofthe Houghtransformin adistributed workstationnetworkis re-
ported in Publication VI,where the slowcommunicationenvironmentis takenintoac-
countbyminimising thecommunicationbyusingdivisionoftheaccumulatorinsteadof
theimage. Inadditiontodecreasingthecomputationtime,thisapproachdecreasesthe
storage requirementastheaccumulator isdividedbetweentheprocessingelements. In
addition to the workstation environment, the implementation is examined in a multi-
processorPC,demonstratingthatinexpensivegeneralpurposehardwarecanbeusedto
ImprovedperformancehasbeenobtainedbyusingmodiedHoughTransformtechniques.
TheMultiresolution HoughTransformhasbeensuccessfullyimplementedusing apyra-
mid architecture [3]. Also the Fast Hough Transform [74] has been improved for use
in amultiprocessorarchitecture[36]. Parallelimplementationonadistributed memory
systemreportedin [9]usesdigitalsignalprocessorsandoverlappingsharedmemory.
InPublication VI,algorithmsforparallelRHTarepresentedforashared-memorymul-
tiprocessor. InRHTfrequentcommunicationbetweenprocessingelements(PE) isnec-
essary. Therefore, thealgorithm uses shared memorywhich allowsrelatively easyand
fast interaction betweenPEs compared to messagepassingparallel systems. Thedata
structureforthelistofpointsisanimportantdetermination,sinceitshouldallowparal-
lelcreationandaparallelupdatemechanismwhenlinesarefound. Theimageisdivided
intonparts andeachPE createsthelistof pointscorrespondingto thatpart. The list
update is performedbyeach PEupdating thecorresponding list, regardlessof theline
found. ParallelalgorithmforRHTcanbepresentedas:
Algorithm4 Parallel RandomizedHough Transform.
Letn denotethe numberof processingelements (PE).The identierofa PEisdenoted
by id=f0;1;:::;n 1g. Thesamealgorithm is executedin eachPE.
1. Makea list of the pointsD
id
in the image region whereidx
max
=nx<(id+
1)x
max
=n. x
max
isthe maximumvalueof the x-coordinate.
2. Synchronise.
3. If id=0thencalculatethe total numberofpointsin the listsD
i
;0i<n.
4. Synchronise.
5. Selecttwopointsd
i andd
j
atrandom.
6. Accumulatewiththe lineparameters corresponding tothe points.
7. If threshold tisreached, setthe ag F
found .
8. If theag F
found
isnot set, returntoStep5.
9. Synchronise.
10. Updatethe pointlist D
id
byremovingpointscorresponding tothelinefound.
11. If alllinesarenot found,returntoStep 2.
There are twomain parts in thealgorithm,accumulation(Steps5 to 8) and point list
update (Step9). Synchronisationpointsareneededtoseparatethese. Ateach synchro-
nisationpoint,noPEcan continuebefore alldesignatedPEs havereachedthesynchro-
nisationpoint.
Becausetheoverheadofparallelisationvariesdependingonthevariablenumberofpoints
in theimageandthenumberofprocesses,adynamiccontrolofthenumberofprocesses
is necessaryfor good performance. The overheadof theparallelisation ismeasuredfor
each line sought and the number of processes is controlled to minimise the execution
Algorithm5 DynamicParallel RandomizedHough transform.
Let ndenote the total numberof processing elements (PE) and pdenote the numberof
active PEs. Initiallyp=n.
1. Makealist ofpointsD
id
in theimage region whereidx
max
=nx<(id+1)
x
max
=n. x
max
isthe maximumvalue ofthe x-coordinate.
2. Synchronise.
3. If id=0calculatethe totalnumber ofpointsin listsD
i
;0i<n.
4. Synchronise.
5. Selecttwopointsd
i andd
j
atrandom.
6. Accumulatewiththe lineparameters corresponding tothe points.
7. If threshold tisreached, setthe ag F
found .
8. If theag F
found
isnot set, returntoStep5.
9. Updatepointlists, eachprocess updatesn=plists.
10. If synchronisation time is greater than the eective accumulation time, set ag
F
reduce .
11. If F
reduce
isset,terminateprocesses p=2+1:::pandsetp=p=2.
12. Synchronise. If ag F
reduce
isset,change the number of processes arriving tothe
nextbarrier.
13. If alllinesarenot found,returntoStep 3.
The parallel HT presents very good speedups compared to the sequential algorithm.
Because little communication is needed, the parallel algorithm is scalable and can be
applied ecientlyalsoin widelyparallel environments. However,theeciency of RHT
isremarkablecomparedtoHT, exceedingalsothatof theparallelHTin multiprocessor
workstations. However,theParallelRHTrequires asubstantialamountofcommunica-
tion. Therefore,itcannotbeappliedinastandardnetworkedworkstationenvironment.
Theparallelisationoverhead isthus largerwithRHTthanwith HT,andthescalability
of RHT isnot as good. Byincorporating thedynamic allocationof work,the commu-
nication overheadcanbeconstrainedandthe Parallel RHToutperformsthe sequential
algorithmsaswellastheparallelHT.
Gabor Filtering
Gabor analysis[32]isamethod forcombiningtimeand frequencydomainanalyses. In
computervision,Gaborlteringhasbeenusedforanumberoftasks,includingedgeand
line detection[81, 19],textureclassication[12],and compression[92, 73].
In thischapter,theuseof Gaborltersin thecontext offeature extraction andobject
recognitionisexamined. First,theformulationoftheGaborelementaryfunctionsiscov-
eredintheone-dimensionalcase. Then,theformulationisextendedfortwo-dimensional
images with the focus on lternormalisation, parameter selection, andthe advantages
of Gabor ltering. Next, the applications of Gabor ltering in object recognition are
examined noting theirinvariance properties. Finally, ascale, rotation, and translation
invariantGaborfeature ispresentedforrecognisingbinaryimages.
4.1 Gabor elementary functions
4.1.1 One-dimensionaltime-frequencyspace
In1946,DennisGaborpresentedanewmethodofanalysingsignalsinwhichtimeandfre-
quencyanalysesarecombined[32]. Theeectivewidthsofasignalintimeandfrequency
domains, tandf,aredened basedonthevarianceofthesignalin thecorrespond-
ing domains,whichleadstothedenition ofminimaluncertaintyastheproduct ofthe
eectivewidths,
tf
1
2
: (4.1)
Gaborthenshowsthatthesignalforwhichtheproducttf isminimalandturnsthe
inequalityintoanequality,is
(t)=e
2
(t t0) 2
| {z }
Gaussian e
j(2f0t+)
| {z }
sinusoid
(4.2)
where j istheimaginaryunit. Thefunction representsacomplexsinusoidalwavemod-
ulated by aGaussianprobabilityfunction. ,t
0 , f
0
,and areconstantswhichdenote
f 0 f
t 0 t
Ψ ψ
α 1 α _
Figure4.1:One-dimensionalGaborelementaryfunctionintimeandfrequencydomains.
thesharpnessoftheGaussian,centreoftheGaussianintimedomain,andthefrequency
ofthesinusoidanditsphaseshift. ApplyingFouriertransformto (4.2)gives
(f)= p
e
(
)
2
(f f0) 2
| {z }
Gaussian e
j( 2t0(f f0)+)
| {z }
sinusoid
(4.3)
whichalsoisacomplexsinusoidmodulatedbyaGaussian. Theeectoftheparameters
is illustratedinFig.4.1which showstheGaussianenvelopeof thefunctionin boththe
time and frequency domains. The real and imaginary parts are separately shown in
Fig. 4.2. The function in (4.2)became later known as the Gaborelementary function
(GEF)andcanbeusedtoanalysesignalsbydecomposingthemintoanumberofGEFs.
Gabor motivated this decomposition by the minimal joint uncertainty of the GEF in
time-frequencyspace. ItcanalsobenoticedthatthedecompositionincludestheFourier
analysis andthetime-domain analysisasspecialcases,namely,
!0) (t)!sinusoid)Fourieranalysis
!1) (t)!Dirac-delta)timedescription:
(4.4)
TheGabordecomposition isapredecessorofmultiresolutionandwaveletanalyses. The
problemofthedecompositionisthattheGEFsarenotorthogonal,whichcausescompu-
tationalproblemswhentheyareusedasawaveletbasis.
Instead ofdecomposition,signalscanbeanalysedbyconvolvingthemwith GEFs. The
lterresponse toinput(t),thatistheconvolution,canbedened as:
r
(t)=
Z
1
(t )()d: (4.5)