Local and Global Feature Extraction for Invariant Object Recognition

(1)

LappeenrantaUniversity ofTechnology

Ville Kyrki

LOCAL AND GLOBAL FEATURE EXTRACTION

FOR INVARIANT OBJECT RECOGNITION

ThesisforthedegreeofDoctorofScience(Technology)

tobepresentedwithdue permissionforpublicexami-

nationandcriticismintheAuditoriumoftheStudent

UnionHouse atLappeenranta Universityof Technol-

ogy, Lappeenranta, Finland on the5th of December,

2002,atnoon.

ActaUniversitatis

Lappeenrantaensis

133

(2)

DepartmentofInformationTechnology

LappeenrantaUniversityofTechnology

Finland

Reviewers ProfessorJohnIllingworth

CentreforVision,Speech,andSignalProcessing

DepartmentofElectronicand ElectricalEngineering

UniversityofSurrey

United Kingdom

ProfessorJoukoLampinen

LaboratoryofComputationalEngineering

DepartmentofElectricalandCommunicationsEngineering

Helsinki UniversityofTechnology

Finland

Opponent AcademyProfessorErkkiOja

LaboratoryofComputerandInformation Science

Helsinki UniversityofTechnology

Finland

ISBN 951-764-698-4

ISSN1456-4491

Lappeenrannanteknillinenkorkeakoulu

Digipaino2002

(3)

Preface

TheworkpresentedinthisthesishasbeencarriedoutattheLaboratoryofInformation

ProcessingintheDepartmentofInformationTechnologyoftheLappeenrantaUniversity

ofTechnology,Finland,during theyears1999-2002.

I want to express my gratitude to my supervisor, Professor Heikki Kälviäinen, for his

supportiveguidance aswell asproviding me thefacilities andnancial support for my

research.

I would like to warmly thank all other co-authors of the publications, Professor Pasi

Fränti,MrJoniKämäräinen,MrAlexeyMednonogov,andMrJaniPeusaari. Inaddition,

IwouldliketoexpressmythankstoProfessorJariPorrasandDrJouniIkonenfortheir

contributionintheearlierphasesoftheresearchonparallelHoughtransform.

I thank the reviewers, Professor John Illingworth and Professor Jouko Lampinen, for

criticalreadingof themanuscriptandvaluablecomments.

I wish to thank also the colleagues at the Laboratory of Information Processing at

Lappeenranta University of Technology for a stimulating atmosphere and CSC - Sci-

enticComputingforprovidingparallelcomputingresources.

ThenancialsupportoftheEastFinlandGraduateSchoolonComputerScienceandEn-

gineering(ECSE),KAUTEfoundation(Kaupallistenjateknillistentieteidentukisäätiö),

JennyandAnttiWihurifund,FoundationoftheLappeenrantaUniversityoftechnology,

and Foundation ofthe Advancein Technology(Tekniikanedistämissäätiö) isgratefully

acknowledged.

Finally,I wishtothankmydearwifeAnnaforherendlessencouragements.

Lappeenranta,October2002

VilleKyrki

(4)

(5)

Abstract

Ville Kyrki

LOCALANDGLOBALFEATUREEXTRACTIONFORINVARIANTOB-

JECT RECOGNITION

Lappeenranta,2002

115 p.

ActaUniversitatisLappeenrantaensis133

Diss. LappeenrantaUniversityofTechnology

ISBN 951-764-698-4

ISSN 1456-4491

Perceiving theworld visuallyis abasicact forhumans,but for computers itis still an

unsolved problem. The variability present in natural environments is an obstacle for

eectivecomputervision. Thegoalofinvariantobjectrecognitionistorecogniseobjects

in adigitalimagedespitevariationsin,forexample,pose,lightingorocclusion.

In this study, invariant objectrecognition is considered from the viewpoint of feature

extraction. Thedierencesbetweenlocalandglobal featuresarestudiedwithemphasis

onHoughtransformandGabor lteringbasedfeatureextraction. Themethodsareex-

amined withrespectto fourcapabilities: generality,invariance,stability, andeciency.

Invariant features are presented using both Hough transform and Gabor ltering. A

modiedHoughtransform techniqueis also presentedwhere thedistortion toleranceis

increased by incorporating local information. In addition, methods for decreasing the

computationalcostsoftheHoughtransformemployingparallelprocessingand localin-

formationareintroduced.

Keywords: feature extraction, invariant object recognition, Hough transform, Gabor

ltering,computervision,imageanalysis

UDC 004.93'1: 004.272.2: 004.932.2

(6)

a(k) Fourierdescriptor

g[x] transformationbelongingtogroupG actingonsignalx

H(;) Houghtransformaccumulator

m

pq

geometricmomentoforder(p+q)

r

(t) responseof1-dGaborlter

r

(x;y) responseof2-dGaborlter

Tfxg measurementofsignalx

Fffg Fouriertransformoff

pq

normalisedcentralmomentoforder(p+q)

pq

centralmomentoforder(p+q)

momentinvariant

(t) 1-dGaborelementaryfunction

(x;y) 2-dGaborelementaryfunction

(f) 1-dGaborelementaryfunctionin Fourierdomain

(u;v) 2-dGaborelementaryfunctionin Fourierdomain

(7)

CRHT ConnectiveRandomizedHoughtransform

GEF Gaborelementaryfunction

GHT GeneralisedHoughtransform

HT Houghtransform

LFA Localfeatureanalysis

PCA Principalcomponentanalysis

PE processingelement

RHT RandomizedHoughtransform

SIMD Single-instructionmultiple-data

VLSI Verylargescaleintegration

(8)

I. Kyrki, V., Kälviäinen, H., Combination of Local and Global Line Extrac-

tion,JournalofReal-TimeImaging, Vol.6,No.2,April2000,pages7991.

II. Kyrki,V.,Kälviäinen,H.,HighPrecision2-DGeometricalInspection,Pro-

ceedings 15th International Conference on Pattern Recognition, Barcelona,

Spain,September37,2000,Vol.4,pages779782.

III. Fränti,P.,Mednonogov,A.,Kyrki,V.,Kälviäinen,H.,Content-BasedMatch-

ingofLine-DrawingImagesUsing HoughTransform,International Journal

onDocumentAnalysisandRecognition, Vol.3,No.2,2000,pages117124.

IV. Kyrki,V.,Kämäräinen,J.,Kälviäinen,H., InvariantShapeRecognitionUs-

ingGlobalGaborFeatures,Proceedings ofthe12thScandinavianConference

onImageAnalysisSCIA2001,Bergen,Norway,June1114,2001,pages671

678.

V. Kyrki,V., Kämäräinen,J.-K.,Kälviäinen,H., Content-basedImageMatch-

ingUsingGaborFiltering,AdvancedConceptsforIntelligentVisionSystems,

ACIVS2001,Baden-Baden,Germany,July30August3,2001,pages4549.

VI. Kyrki,V., PeusaariJ.,Kälviäinen,H., Intermediatelevelfeatureextraction

in novelparallel environments, acceptedfor publicationin Machine Vision

andApplications.

InthisthesisthesepublicationsarereferredasPublicationI,PublicationII,Publication

III, Publication IV,Publication V,andPublicationVI.

(9)

1 Introduction 1

2 Invariance in Object Recognition 5

2.1 Theroleofinvariance . . . 5

2.2 Variationinnaturalimages . . . 6

2.3 Globalfeatures andinvarianttransforms . . . 7

2.4 Local featuresandmatching. . . 10

2.5 Objectrecognitioninhumanvisualsystem . . . 11

3 HoughTransform 15 3.1 Houghtransformforlinedetection . . . 15

3.2 Extendingtheproblemdomain . . . 18

3.2.1 Morecomplexshapes . . . 18

3.2.2 Invariantimagematching . . . 19

3.3 Computationalimprovements . . . 20

3.3.1 Probabilisticmethods . . . 20

3.3.2 Parallelprocessing . . . 22

4 GaborFiltering 27 4.1 Gaborelementaryfunctions . . . 27

4.1.1 One-dimensionaltime-frequencyspace . . . 27

4.1.2 Two-dimensionalspatial-frequencyspace. . . 29

4.2 Applicationsinobjectrecognition. . . 32

5 Discussion 37

Bibliography 41

Publications 51

(10)

(11)

Introduction

Eortshavebeenmadeovermanyyearsto equipcomputerswith vision. Objectrecog-

nitionisaclassicalproblemincomputervisionthataimstocategoriseobjectstoknown

objectclassesusing adigitalimage. Itcanbeutilised in avastnumberofapplications

includingindustrial,medical,andscienticones.

Alreadyin 1960,SelfridgeandNeisser presentedin thecontextofcharacterrecognition

that theformulationoftherecognitionasamulti-stageprocessisbenecial[99]. Today,

mostrecognitionsystemsarebasedonasequentialmodelthathasatleasttwomainparts:

feature extraction andclassication. The goaloffeature extractionis to extract useful

information from an input image, which is then utilised to perform the classication.

That is,thedigitalimage isprocessedinorderto ndsomestructure,e.g.edges,which

isthenusedasthecriteriononwhichthecategorisationisbased. Thefeatureextraction

andclassicationarequiteoftenalsocomposedofseveralsub-stages.

The humanvisual system is nowadaysthought to beasimilar multi-stagemodel with

several levels of preattentive feature analysis at the early stages of processing [107].

As in many other tasks, the skills of a computer and those of a human are largely

complementaryin objectrecognition. Humanscanaccuratelydeterminetheclass ofan

object among hundreds of others almost regardless of the viewpoint, illumination, or

partial occlusion. Incontrast,incontrolledenvironmentscomputervisioncantirelessly

ndawsincomplexmanufacturedproductsthatwouldnotappeartoahumanobserver.

The main dierence between the capabilities seems to result from the fact that the

machine is almost unerring in acontrolled environmentwhere it is knownwhat visual

features to seek. On the other hand, in anatural environment the viewing conditions

vary,whichisaremarkableproblemforcomputervision. Forexample,theobjectmaybe

viewedfromadierentposeorthelightingmaychange. Thus,theproblemofinvariant

objectrecognitionistorecogniseobjectsdespitethesevariances.

Not much has changed since 1978 when Granlund [35] referred to feature extraction

methodsasabagoftrickssinceamethodisoftenchosenbasedonexperienceseemingly

without explicit reasons. Nevertheless, the feature extraction method selected has a

(12)

fundamentaleectonthecharacteristicsofarecognitionsystem. Featureextractioncan

playdierentparts in invariant recognition,depending if thefeature itself is invariant,

suchasmomentinvariants[39]ortheFourier-Mellincoecients[18],oriftheinvariance

is a characteristicof the system, asin labelled graphmatching [65]. Regardlessof the

role offeatureextraction, thefollowingrequirementsandobjectivescanbestated: The

system should be general enoughto recognise a large variety of objects, invariant to

naturalvariations,stable againstdistortions,andcomputationallyecient.

Theaimofthisthesisistostudysomefeatureextractionmethodsthatcanbeappliedto

invariantobjectrecognition. Whileingeneral,invariantobjectrecognitionencompasses

variationscausedby thethree-dimensionalstructureof theworld,this thesisis limited

to thestudy oftwo-dimensionalvariations. Inaddition, themethodspresentedassume

thattheobjecttoberecognisedhasbeensegmentedfromthebackground,andthatonly

asingleobjectisconsideredatatime. Twowellknownfeatureextractionmethods, the

Hough transform(HT) andGabor ltering, are examinedand improved to be suitable

forinvariantrecognition. Primarilytheshapeoftheobjectsisinspectedasitissupposed

that theshapeincludes enoughinformation forsuccessfulrecognition. Forthat reason,

thescopeofthethesisislimitedtotwo-dimensionalgraylevelimagesandthemethods

presented are primarily applicable to man-made, geometric objects, where the object

shapecan beecientlymodelled.

The thesis is divided into vechapters. InChapter 2, the role of invariance in object

recognitionisexamined. Next,twofeatureextractionmethodsarepresentedandapplied

toobjectrecognition,namely,theHoughtransforminChapter3,andGaborlteringin

Chapter 4. Finally, thediscussionandconclusionsare presentedin Chapter5.

Summary of publications

In Publication I anew method forlocatingline segments was proposed. The method

is anextensionoftheRandomizedHoughtransform(RHT)whichexploitsconnectivity

for the estimation of local structureof the image. The experimentsindicated that the

methodwasmoretoleranttomissingdatawhencomparedtoitspredecessor. Theauthor

ofthisthesisparticipatedinthedevelopmentofthemethod,performedtheprogramming

andtheexperiments,andwrotethemajorpartofthepublication.

In Publication II a systemof inspecting two-dimensional planarobjectswasproposed.

The inspection is based on aCAD model of the part. The system rst estimates the

positionofthepartfollowedbyalocalmeasurementofeachfeature. Intheexperiments,

itwasfoundthatcalibrationwasthemostlimitingfactorfortheaccuracyofthesystem.

Theauthordevelopedthesystem,performedtheexperiments,andwrotethemajorpart

ofthepublication.

In Publication III two new methods for matching binary images were presented. The

methods use the Hough transform to extract prominent line features which are then

used tondmatchingimages. Theauthorparticipatedin themethoddevelopment,the

experiments,andwritingthepublication.

InPublication IV theuseofGaborltersasdirectionsensitiveedgedetectorswasstud-

ied. Atranslation,rotation,andscalinginvariantdistancemeasurewasproposedforthe

Gabor features. Experimentswere performedonimagesofdigitstostudy therepresen-

tationpowerofthefeatures. Theauthorparticipatedinthedevelopmentofthemethod,

(13)

In Publication V the Gabor feature was applied to matching of binary line-drawing

images. Rotationandscaleinvariancewasdemonstratedtogether withverygoodnoise

tolerance against salt-and-pepper noise. The author of the thesis participated in the

development,programming,experiments,and writingofthepublication.

In Publication VI theperformanceof theHoughtransformwasinspectedin aordable

parallel environments. Aparallel HTalgorithmwas developedsuitablebothforshared

memory multiprocessorsand distributed memory workstation networks. Also, parallel

implementations of the RandomizedHough transformwere proposed. The approaches

improved the performance notablycompared to a sequential algorithm. However, the

approachesdidnotseemtobesuitableforlargescaleparallelisation. Theauthor ofthe

thesis developed the methods, participated in the programming and the experiments,

andwrotethemajorpartofthepublication.

(14)

(15)

Invariance in Object Recognition

The fundamental problem of object recognition is to analyse an image and provide a

high level symbolic representation of its contents. Each separate object in the image

should be localised, i.e., its location should be resolved, and classied, assigned to a

specic category. Inadditionto thethree-dimensionalstructureof anobject,there are

severalotherfactorsthataecttheimageoftheobject,forexample,lighting,perspective

projection, and sensor noise. Invariant object recognition aims to solve the problem

despitethesevariances. Inthiswork,thefocuswillbeonrecognitionofobjectsbasedon

theirshapebecausetheshapeisfairlyindependentof phenomenasuch as illumination,

contrast,andcolour.

In this chapter, invariance in object recognition is discussed. The rst section deals

with therole of invariancein general,concentratingonthe denition,taxonomies,and

advantages. Next, variations occurring in naturalimages are examined to dene more

precisely the types of invariance necessary for object recognition. In order to discuss

invariance, the idea of a class is also considered. After that, two groups of methods

for invariant object recognition are discussed. First, global invariants and invariant

transformsarepresented,formingtheoldestapproachtoinvarianceincomputervision.

Second,theuseoflocalfeaturesandmatchingisconsidered.Finally,theroleofinvariance

in humanvisualperceptionisbrieydiscussed.

2.1 The role of invariance

The term invariant refers to a representation of a signal that is constant under some

transformation. Forexample,themeasurementT of asignalx isinvariantundertrans-

formationg when[68]

Tfxg=Tfg[x]g: (2.1)

Usinggrouptheorythisdenitioncanbeextended [29,129]:

Assume that there is agroupG which acts upon theset X of possible signals. Let T

(16)

denote theactionofmeasurement. Then,if

Tfg[x]g=Tfxgh(g) 8g2G;x2X; (2.2)

T is invariant under the action of the group. The group G can be, for example, the

groupof translations. h(g)isafunction of g alone. If h(g)=1,T is ascalarinvariant

[29], as in (2.1). For example, the Fourier transform is invariant under the group of

translations, while its amplitude is scalar invariant. Invariants can be categorised as

strong and weak, according to whether the degree of the action g can be measured

from the invariant representation. The representation is called strong if it contains a

componentthatexplicitlyencodesthedegreeoftransformation[16]. Thus,accordingto

this denition,scalarinvariantsareweakinvariants.

Incomputervisionandobjectrecognition,invariantsprovideamethodtomeasureimages

while keeping the measurement unaected by known deformations or variations. For

example,inimagedatabases,geometricinvariantscanbeusedforindexingthedatabase,

allowingtheretrievalofanimagedespitegeometrictransformations. Ithasbeenargued

that object recognition is the search for invariants [127]. However, in this work, the

emphasis is not on the construction of invariant features, but on methods that allow

object recognitionin the presence of variations. This distinction is importantbecause

invariantrecognitionispossiblewithoutexplicitinvariants.

There areseveral taxonomiesforinvariantrecognition. Accordingto Wood [129],there

are two broad approaches: (1) Distinct feature extraction and classication; and (2)

parameterised invariants. Inthe rstapproach, each feature is itself invariantand the

classicationcanbeperformedbasedonthefeatureswithoutconsideringthesymmetries

in the features. With parameterised invariants, the parameters of the invariant are

adopted to perform the classication. Invariants can also be classied as global and

local. Global invariantsuse theknowledge of ashape asawhole, while local ones are

based onlocal properties such ascurvature. Mostglobal invariantsbelong to theclass

of distinctfeature extractionandclassication. Whenconsideringinvariantrecognition

of shape, the methods can also be classied as either region- or boundary-based [57].

Region-based methods use the whole region under consideration for determining the

feature whileboundary-basedmethodsconsideronlytheouterboundary. However,this

taxonomyappliesspecicallytotheglobalinvariantsofbinaryimagesandforthatreason

manymethodscannotbeclassiedusingit.

2.2 Variation in natural images

In this thesis, it is assumed that allobjects representinga class are rigid and visually

similar to averyhigh degree. Thus, theintra-classgeometricvariationsare verysmall

compared to the inter-class variations. In practice this means that only rigid objects

are considered. Inaddition,the classesareassumed tobedisjoint,that is, eachobject

canbelongtoasingleclass, andthereis noclass hierarchysimilar tothehumanvisual

system. Basedonthese restrictions,thevariationsarenowexamined.

Theprocessofimaginganobjecthasanumberofsourcesforvariationin theresultim-

age. Wechsler[126]statesthatthevariabilityisduetoperspective,changingorientation

(17)

list of variations: pattern translation, pattern distortion, perspective distortion, varia-

tionin background,partialocclusion,changeinpattern size,change inorientation,and

changein lighting. Thevariationscanbecategorisedaccordingtothecause. First,the

three-dimensional position of the imaging equipment with respect to the object. Sec-

ond, the change in environment, primarilythe lighting, but also encompassingchanges

inbackground. Third,noiseintheimagerandintheenvironment. Fourth,theocclusion

of thetargetobjectbyotherobjects. Traditionally,thestudy ofinvariantshasfocused

on thegeometricvariations,but itisimportantto notethat whilegeometric variations

are importantwhen describingthe perspective projectionof image formation,they are

onlyoneclassofimagevariation.

Geometric variationscan befurther categorised basedon theallowedtransforms. The

simplest variation is the translation of the object in the image. Translation together

with rotationformsthegroupofEuclideantransforms. Includingisotropicscalinggives

the group of similaritytransforms. The groupof similarity transformscan be used to

describetheprojectionofaplanarobjectwhichis orthogonaltotheoptical axisof the

imager. Similaritytransformsareextendedbyallowingnon-isotropicscalingandskewing,

forming thegroupofanetransforms. Themostgeneralgroupofgeometrictransforms

isthegroupofprojectivetransforms,whichrepresentsperspectiveprojections.

Changes in environment and the eect of noise are sometimes neglected in the study

of invariant recognition, although they denitely have an eect on applications. The

issue of nonuniform illumination isusually considered separate from other invariances,

but recentlyit hasbeenproposedthat geometric andilluminationinvariantsshould be

treated together[1]. Inthe methods presentedin this thesis, theobjectsare, however,

assumedto besegmentedfromthebackground,asstatedintheintroduction.

In complex environments, invariant recognitionin 3-D is also highly dependent on an

ecient world model. Without a model that can explain the overlap of objects, the

problem ofocclusionisverylikelytobeimpossibletosolve. Also,theworldmodelcan

givevaluablecuesforsegmentation,especiallyin multi-objectcases.

Therobustnessofanobjectrecognitionsystemisanimportantconsideration. Tobee-

cientinapplications,invariantrecognitionhastobestable,general,andcomputationally

ecient. Stabilitymeans thatsmall changesin environmentshould nothavearemark-

able eect on theresult because in naturalscenes several typesof variationsare likely

to occur. Inadditionto invariantrecognition,thesystemshouldbeabletocharacterise

thevariabilitybecausethevariabilityitselfcarriesimportantinformation[126].

2.3 Global features and invariant transforms

The twomost famous invariant transformsin image processing are basedon moments

andtheFourier-Mellintransform. Momentinvariants,presentedbyHuin 1962[39], are

the rst generalinvariants. The geometric moments of order (p+q)of f(x;y) canbe

dened as

m

pq

= Z

1 Z

1

x p

y q

f(x;y)dxdy; p;q=0;1;2;: (2.3)

(18)

Byshiftingtheimagecentroidtoorigin,centralmoments

pq

canbedened as

pq

= Z

1

1 Z

1

1 (x x )

p

(y y) q

f(x;y)dxdy (2.4)

where the originis (x ;y) =(m

10

=m

00

;m

01

=m

00

). Central moments can be normalised

to obtainscaleinvarianceas

pq

=

pq

00

; =(p+q+2)=2: (2.5)

Hupresentedthatcertaincombinationsofthenormalisedcentralmomentsareinvariant

under theclassof similarity transforms. TheHu'smomentinvariantsupto thirdorder

are

1

=

20 +

02

2

=(

20 +

02 )

2

+4 2

11

3

=(

30 3

12 )

2

+(

03 3

21 )

2

4

=(

30 +12)

2

+(

03 +21)

2

5

=(

30 3

12 )(

30 +

12 )

h

(

30 +

12 )

2

3(

21 +

03 )

2 i

+

(

03 3

21 )(

03 +

21 )

h

(

03 +

21 )

2

3(

12 +

30 )

2 i

6

=(

20

02 )

h

(

30 +

12 )

2

(

03 +

21 )

2 i

+4

11 (

30 +

12 )(

03 +

21 ):

(2.6)

Geometricmomentinvariantsdonotdecodethedegreeoftransformationandarethere-

fore scalarinvariants. Aneinvariant combinationsof geometric moments were intro-

duced laterin [96]. Atreatiseonmomentinvariantscanbefoundin [108]whereseveral

types of momentsare analysed, in addition to the geometric momentsby Hu. Zernike

and pseudo-Zernikemoments were found to havethe best overall performance [108] in

termsof noisesensitivity, informationredundancy,andimage representationcapability.

Surveysofmomentbasedtechniquesforobjectrecognitioncanbefoundin[95] and[5].

The Fourier-Mellintransform was presented in the context of invariantrecognition by

CasasentandPsaltisin[18]. TheFouriertransformcanbedened as

Fff(x;y)g=F(u;v)= Z

1

1 Z

1

f(x;y)e

j2(ux+vy)

dxdy (2.7)

where j istheimaginaryunit. Itistranslationinvariantsince

Fff(x+x

0

;y+y

0 )g=

Z

1

1 Z

1

f(x+x

0

;y+y

0 )e

j2(ux+vy)

dxdy

=e j2(ux

0 +vy

0 )

Fff(x;y)g:

(2.8)

Thus,theamplitudeisinvarianttotranslationswhilethephaseshiftcodesthedegreeof

translation. Mappingthe amplitudeintolog-polar coordinates transformsrotationand

scalingintotranslations[126]:

=tan 1

v

t=log

p

u 2

+v 2

: (2.9)

(19)

Applying theFouriertransformagain and rejectingthe phasegivesa signaturethat is

invariant to translation, rotation, and isotropic scaling, that is the class of similarity

transforms. This process is known as the Fourier-Mellin transform. Using a log-log

mappingofthefrequencyspaceinsteadofthelog-polarone,

s=logu t=logv (2.10)

makestheresultinvarianttoanisotropicscaling,buttherotationinvarianceislost[123].

MomentinvariantsandFourier-Mellinsignaturesarebothregionbasedmethods of de-

scription. Severalglobalcontourbasedinvariantshavebeenproposed,themostnotable

beingperhapstheFourierdescriptors[133]. Thelistofpointsrepresentingaclosedobject

boundarycanbepresentedasacomplexfunctionofonevariableas

u(n)=x(n)+jy(n); n=0;1;:::;N 1: (2.11)

ThediscreteFouriertransformcoecientsa(k),

a(k)= N 1

X

n=0 u(n)e

j2 k n

N

; k=0;1;:::;N 1; (2.12)

are called theFourierdescriptors. Theyallowrecognitioninvariantto similaritytrans-

forms,sincerotationcausesonlyphaseshift,translationchangesonlytheDC-component

a(0), and scaling scalesthe coecients correspondingly. Fourierdescriptors havebeen

extendedtofullaneinvariantrecognitionin[2]. Otherboundarydecompositionshave

also been used. For example, awavelet decomposition of theboundary hasbeenused

to realisesimilaritytransforminvariantrecognition[111] andfullaneinvariantrecog-

nition [57]. The main problem of theglobal boundarybased methods is that they are

usually restrictedtoobjectswhichhaveasingle,clearouterboundary.

Neuralnetworkscanbealsousedto obtaininvarianceinrecognition. Thereseemtobe

twogeneraltypesofneuralnetworkapproach: (i)usinginvariantfeaturesasinputtoa

neural network,and (ii) obtainingtheinvariance in the network by learning[90]. The

formerapproachdoesnotdiermuchfromconventionalinvariantfeatureextraction,but

the latterone is interestingbecause theinvarianceis inherent in thenetworkstructure

and weights. The Neocognitron model by Fukushima is one of the earliest successful

attempts to perform distortiontolerantrecognition[30]. Theinvariancesdependsolely

onthetraininginputsofthenetworks,andthustheuseoflearnedinvariancesingeneral

purposetasksisnotverystronglyjustied. However,thetangentpropagationalgorithm,

introduced by Simard[103], trainsaneural network to beinvariant to aknown trans-

formation of input by minimising the gradient of the network in the direction of the

transformation. Thus,thenetworkcanbemadeinvarianttoanalytictransformationsof

theinput.

Someproblems arerelatedparticularlytoglobalinvariantfeatures. First,globalinvari-

ants do nottolerate occlusion, asthey assume that the whole object orshapecan be

observed. Second,global invariantsrequiresegmentationifthere areregionsof theim-

agethatdonotbelongtotheobjecttoberecognised. Thislimitation,ofcourse,canbe

overcomebyasegmentation,equivalenttoobjectdetection[47]. However,itisnotclear

ifobjectdetectioncanbeperformedseparatelyfromrecognition. Finally,therecognition

ofespeciallycomplexobjectsisoftensensitivetonoiseanddistortionsasasmallchange

(20)

2.4 Local features and matching

The methods for invariant recognition that do not use global invariants will now be

considered. The main body of the research is related to methods that try to achieve

invariance through matching local features to a known model. The methods usually

havetwodistinctparts,featureextractionandmatching. Thesephaseswillbeconsidered

separately astheydonotusuallydepend oneachother.

The rst approach that will be considered, template matching, does not havedistinct

feature extraction since the gray-level values are used directly in the matching. It is

worth mentioning as it is one of the oldest techniques for object recognition [97]. In

templatematching, thecross-correlationbetweenanimageandatemplateiscomputed.

Thelocationofthetemplateisthen foundasthelocationof themaximumcorrelation,

and the goodness of the match is related to the normalised value of the correlation.

Templatematchingdoesnotitselfproduceinvariancetogeometrictransformsotherthan

translation. However,rotationinvariantrecognitionhasbeenrealisedbyconvolvingan

image with a radially symmetric lter to select possible candidates [20]. Also, if the

position of the objectis known, rotationand scale invariant recognitionis possible in

log-polarcoordinates. Inaddition,templatematchingissometimesusedasaverication

stepafterahypothesisoftheobjectposehasbeenestablished. Whiletemplatematching

can be made moreecient using Fourierand principal component transforms[116] or

wavelet decomposition [114], it is nevertheless usually considered computationally too

costlyforinvariantrecognitionbecauseseveraltemplates areoftenneeded.

Principalcomponentanalysis(PCA)hasbeenappliedinobjectrecognition,forexample

in the eigenfacesapproach [105], but PCA isfundamentallyaglobal method, and thus

doesnotoerrobustnessagainstchangesinlocalisedregions. PenevandAtick[89]have

proposed Local Feature Analysis (LFA), which modies principal component analysis

by deriving local topographic receptive elds optimally matched to the second-order

statisticsofinputdata.

Twobasicapproachescanbeusedforthematchingoflocalfeaturesbetweenaninputand

amodel. First,pairwisematchingofeachfeatureontheinputtoafeatureinthemodel,

and second,optimisation of transform parametersthat brings the features in the best

match. Whenthematchingoflocalfeaturesisformulatedasagraphmatchingproblem,

eachfeature intheinputandthemodelareencodedasverticesofagraph[11]and the

matchingcorrespondstondingthesubgraphisomorphismwhereanerrorfunction has

itsminimum. FindingthesubgraphisomorphismisanNP-completeproblem,thusthere

arenoecienttechniquesforthegeneralcase. However,inimageanalysisthegraphsare

usually topological, that is, they representthe two-dimensionaltopology of theimage,

whichreducesthecomplexityofthematching[15]. Thesearchspacecanalsobepruned

usingvariousheuristics,forexample,relativegeometricconstraintssuchasdistanceand

angle betweengraphvertices[11], matchingof theattributesat graphverticesbetween

themodelandtheinput[42],matchingofvertexattributeswithahashlookup[66,67],

oremphasisingthetopologicalcoherence[101].

A limited matching procedure can be used to nd likely candidates for the transform

parameters as in [131] where only the graph vertices on aconvexhull are used in the

preliminarymatching,asthepointsontheconvexhullareinvariantinanetransforms.

(21)

Anotherformulationforgeometricinvariantsisanoptimisationofthetransformparam-

eters. Fortheoptimisationofparameters,least-squares[67]andevidencegathering[115]

methods have been used. In Publication II, translation and rotation of an object are

solvedthroughtheminimisationoftheleast-squaresttingerrorbetweenthemodeland

the input. Inevidence gathering, local matchingevidences are accumulatedto makea

global pose hypothesis. Evidence gatheringmethods for objectrecognitioninclude the

Generalised Houghtransform(GHT) [4],poseclustering [106],and evidenceaccumula-

tion[49, 77,88]. Evidencegatheringwillbeconsideredfurther inChapter3,where the

Houghtransformispresented.

Featureextractionisalsonecessaryforanobjectrecognitionsystemthatusesmatching

other than template matching. One of the simplest features to use are edge points.

Theyare oftenusedwithHoughtransformmethods. Interestpoints,such ascurvature

maxima[48]andcornerscanalsobeused[131]. Leeetal.proposetheuseoflinesegments

[72]. InPublication II, line segmentsare also used. More complexfeatures likeGabor

lter responses [15, 88] and local Fourier-Mellin descriptors [34] have also been used.

Lowe [77] proposes to use maxima and minima of adierence-of-Gaussian function in

a multi-scale grid to determineinterest points, and then use thelocal gradientsof the

neighbourhoodasthefeature. Forthelocalfeatures,twodistinctpropertiesaredesired:

distinctiveness againstfalsepositives,androbustnessagainstvariations[88]. Itis clear

that by increasing distinctiveness, robustness is lost, and vice versa. More complex

features oftenprovidemoredistinctivenessbut lessrobustnessand largecomputational

costs. The mostimportant requirementforafeature isthat it should remainthesame

regardlessofthevariations. Inaddition,itshouldprovideinvariantlocalinformationto

allowthepruningofthesearchspace.

Theproblemofpartialocclusioncanbesolvedtosomedegreebyasystemthatachieves

invariancethroughlocalfeaturesand matching. However,areliableworldmodelisstill

necessaryfor 3-Dinference andmissing data handling. Theproblems that arise in the

matchingoflocalfeatures areoften relatedtocomputational complexity. Forexample,

graphmatchingandHoughtransformarebothcomputationallyquiteexpensivebecause

the search space is large. A second problem is the selection of the local features. An

important question is, which kind of features are able to represent a wide variety of

naturalscenes.

In Publication II, acomplete recognitionsystemispresentedthat employsrotationin-

varianttemplate matchingfordeterminationofaninitialestimateofatwo-dimensional

objectposition. Then,localfeatureextractionisperformedbasedontheinitialestimate

to extract straight edges and circular shapes based on the CAD model of the object.

Finally,aleast-squaresttingisperformedforthelocalfeaturesand,basedonthepose

estimate, the dierence between the object and the model is determined for each line

segmentandcircularshape.

2.5 Object recognition in human visual system

Becauseinvariantrecognitionisevidentlyanunsolvedproblem,itisnaturaltotakealook

at oneknownsolution,thehumanvisualsystem. It shouldbenotedthatthestructure

(22)

Retinal

Image based

Image−

Processing Processing based

Surface− Object−

based

Processing Processing based Category−

Figure2.1: Fourstagesofvisualprocessing[86 ].

capturethefullpowerofshaperecognitioninhumanvision. Thefollowingtheoriesare,

however,basedonobservationsin cognitivescienceandpsychology.

At leasttwocoincidentstructures aresupposed to appearin thehumanvisual system,

thesequentialvisualperceptionprocessandparallelpathways. Visualprocessingcanbe

divided into four stages: image-based, surface-based, object-based, and category-based

[86]. As illustrated in Fig. 2.1, the stages are sequential but feedback may exist to a

preceding stage. The image-based processing is supposed to extract two-dimensional

primitives of the retinal image. The surface-based processing attempts to recover the

propertiesofsurfacessuchaslocalsurfaceorientationanddepth. Thisstageisnotfully

three-dimensional but employs local cuesaboutthe three-dimensionality, and for that

reasonitis oftencalled 2:5-dimensional. Theobject-basedprocessingis thenconcerned

withthefactthatanobjectisathree-dimensionalentitywhichcanhaveatotallydierent

two-dimensional shape from dierent viewpoints. Finally, the category-based stage is

mainly concerned with recovering the functional properties of an object by placingit

intoknowncategories.

In computervision,theterm classicationcanberelatedto either object-orcategory-

based stages. Thecategories in computervision areusually simpleand disjoint, while

in the human visual system there is essentially a hierarchy of classes. Thus, a single

objectcanbecategorisedatseverallevels. Ithasbeensuggestedthatthereareseparate

processesinvisualsystemfordierentlevels. Itisalsoimportanttonotethat thefocus

of researchinthecategorisationbehaviourofthehumanvisualsystemhasbeenonthe

entrylevelcategories,thatis,thosecategoriesthatcanbedeterminedmostquickly,while

the computer vision has been traditionally related to subordinate level categorisation,

that is, the categorieswhich aremorespecic. For example, atypical computervision

application could bethe classicationof dierenttypes of bottles in abottle recycling

machine. However,foraperson,theentrylevelclassbottle wouldbethemostnatural

one.

In addition to the sequential process framework, there is physiological evidence that

there areparallelpathwaysin earlyvisualprocessing. Theproposalsuggeststhatthere

are separatepathwaysfor such featuresasform,colour,motion, anddepth [75]. It has

beenshownthatsignicantcross-talkexistsamongthepathways[119],butnevertheless,

parallel processingseemstoexist inthelowerlevelsofvisualprocessing.

The issue of object representation in the human visual system is under research but

twomainapproachescanbeseen,structuraldescriptionsandview-basedrepresentations

[107, 86]. Structuraldescriptionsdescribeobjectsascompositionsof three-dimensional

(23)

pearance oftheobjectfrom asingleview,while three-dimensionalrecognitionisincor-

poratedbyusing multipleviews.

Thisthesisconcentratesonrecognitionbasedonthetwo-dimensionalshapeandforthat

reasonthe discussionwill benowlimitedto shaperecognition. Palmer[86]approaches

the question of shape recognitionfrom two distinct points of view, shape equivalence

and shape representation. Shape equivalence is the phenomenon that two shapes are

perceived as having the same shape. This theory is mainly concerned with perceived

shapeand doesnotspecifytheprocesses thatare used in determiningtheequivalence.

An important nding is that while the perceived shape often does not depend on the

orientation, in some cases it does [79]. Thus, the human visual system is not totally

invarianttorotationseveninthecategoricallevel.

Shaperepresentationconsidershowtheshapeofobjectsmightberepresentedwithinthe

visual system and how such representations might be compared. Palmer presents the

following as the four major theories of shape representation: a) templates, b) Fourier

spectra,c) featurelists, andd)structuraldescriptions. It should benotedthatnoneof

these hasbeenacceptedgenerallyasthedenitivetheoryofshaperepresentationin the

humanvisualsystem,but theyprovidesomeinsighttotheissueofinvariance.

Templatescorrespondcloselytothetemplatematchingapproachincomputervision. In

biologicalvision, templates areseenasuseful in providing simplelow-levelinformation

aboutobjectpartsbutlackingtheversatilityrequiredfortherepresentationofcomplete

shapes. This is in accord with themethods of local features and matching, where the

localfeaturescanbeextractedusingatemplatematching typemethod.

The Fourier spectrum has been suggested as the shape representation [33, 50]. Early

studies suggestedthe powerspectrum as the representation. This corresponds closely

to the use of Fourier transform in computer vision to obtain translation invariance.

However,laterstudieshaverevealedthat thephasespectrumdominatestheperception

[91]. This resultmay indicate the importance of phase information in Fourierdomain

feature extraction in computer vision. However, the existence of localised frequency

sensitive cells has been shown at the lower levels of processing in mammalian visual

systems[40,41].

Thefeaturelistsapproachstatesthatanobject'sperceivedshapecan bedescribedbya

set offeatures, suchasthenumberoflinesandtheirrelativelengths. Thefeaturesneed

tobeinvarianttovariations,relatingthisapproachtotheuseofglobalinvariantfeatures.

However,thebasicapproachofasetoffeatureshasaawbecausetheinterdependencies

betweenthefeaturescannotbetakeninto account. Tosolvethisproblem,featuremaps

havebeenproposedthatincludeadditionalconstraintstothemutualpositionsoffeatures

[76, 117]. However, pure feature-based approaches are often considered inadequate as

describing the shape of an object often requires not only the components and global

propertiesbutalsothespatialrelationsbetweentheparts.

Theneedforrelativeinformationgivesthemotivation forstructuraldescriptions. They

can becompared to graphsin which verticesrepresent objects parts and labelled arcs

representtherelations[85]. Therepresentationisstrongasthegraphstructureallowsthe

relationstobeseparatefromtheparts. Thestructuraldescriptionsarecloselyrelatedto

thegraphbasedapproachesincomputervision. Asalsoincomputervision,theweakness

(24)

sucientlypowerfulprimitives and relations needsto be identied. Forexample, it is

veryhardto ndasetofsimpleprimitivesthatcoulddescribeahuman.

(25)

Hough Transform

In 1962, Paul Hough presented amethod for detecting straight line tracks of charged

particles in abubble chamber [38]. In the1970's the method gained popularityin the

computervisioncommunityandbecameknownastheHoughtransform(HT) 1

[27]. In

1981,Deans[25]showedthat themathematicalrootsofHTliein theRadontransform,

as the Hough and Radontransforms are essentially the samein certain circumstances

[94]. TherearenumerousdierentvariantsofHT[44,71,52].

While Paul Hough presented the method as a means to detect straight lines, today

theHoughtransformis consideredmoregenerallyasatoolfordetectingparameterised

shapes. HT processesbinarised images. Usually,thebinaryimageisproducedusing an

edge detectionmethod such as Prewitt [93] or Canny [17] edge detectors. The Hough

transform is based on the principle that each edge point in the image gives positive

evidence for curvesthroughthat point. This evidence is accumulatedforall points. If

there aremanypointsgivingevidenceforaparticularcurve,thenthatcurveislikelyto

exist intheimage.

Inthischapter,theHoughtransformalgorithmforndinglinesegmentsisrstpresented

with its benets and pitfalls. Then, the problem domain is extended to cover more

complex shapes and an application to image matching. Finally, improvements to the

computational complexity are overviewed with emphasis on probabilistic and parallel

methods.

3.1 Hough transform for line detection

The Hough transform is essentially a transform from an image space to a parameter

space. Theparametersofcurvesthatarepresentintheimageappearasmaximain the

parameterspace. Forexample,alinecanberepresentedintheslope-interceptformas

y=ax+b: (3.1)

1

ThetermsHoughtransformandstandardHTareusedtorefertothesameconcept.

(26)

y

x y

y b y

a x

x

a b

Figure3.1: Projectionofapointisalineintheparameterspace. Thelinerepresents

allthelinesthroughthe pointintheimage. Theintersectionoflinesintheparameter

spacerepresentsthelinedenedbythecorrespondingpoints.

Inthisequation,xandydenotethecoordinatesofapoint,aistheslopeofaline,andb

istheinterceptonthey-axis. Byrearranging(3.1),itcanbeseenthat eachpoint(x;y)

in theimagedenesauniquelineintheparameterspace(a;b)as

b= xa+y: (3.2)

Whenseveralpointslieonthesamelineintheimage,thecorrespondingparameterspace

lines intersect. As illustrated in Fig. 3.1, the line can be detected as the intersection

in the parameter space called the accumulator. Each coordinate pair (a;b) denotes a

specic line in the original image. To nd the intersection in the parameter space, a

procedurecalledaccumulationisnormallyused. Theparameterspaceisrstdiscretised,

that is, it is divided into a nite number of non-overlapping cells covering the whole

parameter space. Then, accumulationis performed by transforming each image point

into a parametric curve and incrementing counters associated with those accumulator

cells forwhich thetransformedlocusintersects the cell. Inthis way,each point in the

image givespartial evidencethat acertainfeature, with certainparameters,ispresent.

Finally, local maxima are searched for in the accumulator. The locations of the local

maxima specify the parameters of apparent lines. In addition, a verication may be

performedwhereeachlocalmaximaischeckedtoassesswhether itreallyrepresentsthe

desiredfeaturein theoriginalimage.

Theaccumulationcanbespeciedmorestrictlyasfollows:

(27)

LetDdenotethesetofpointsintheimage. LetAdenotetheaccumulatorandSrepresent

the indices usedtoindex the accumulator. LetT bethe functionthat projects apointin

the image toacurve, thatisasetof indices, inthe parameter space.

1. InitialiseA(s)=0;8s2S.

2. For alldin D

2.1 For eachsin T(d)

2.1.1 Increment A(s)=A(s)+1.

Theslope-interceptparameterisationof(3.1)hasaprobleminthattheparameterspace

isnotbounded,sinceforaverticallinebothparametersapproachinnity. Thisproblem

can be overcomeby using twoaccumulators, one for lines with slope jaj 1and the

other forlineswithjaj>1. Anotherparameterisation,thenormalparameterisation,

=xcos+ysin (3.3)

was introduced by Duda and Hart [27]. As illustrated in (3.2), represents theangle

betweenthenormalofthelineandthex-axisandisboundedtointerval[0;),whileis

thedistance fromtheline totheoriginanditsrangedependsontheimagesize. Other

boundedparameterisationsalsoexistforlines,forexamplethepointsofintersectionswith

the edges ofthe image canbeused [121]. However, theboundedness ofthe parameter

space seems to be the most important restriction for the selection of the parametric

representation.

θ ρ

x y

Figure 3.2: (;)-parametrisationofaline.

Because only positive evidence is considered, HT is very robust against missing data,

i.e. image points. This robustness is considered to be one of the main advantages of

the Hough transform. The main drawbacks of the HT are related to computational

complexity in terms of both time and space. The computation time is related to the

numberofedgepointsintheimagetimesthelargerdimensionoftheaccumulator. Thus,

thediscretisationoftheparameterspaceaectsboththetimeandspacecomplexity. In

addition,thediscretisationcanpresentdicultiesintheimplementationofthealgorithm

(28)

3.2 Extending the problem domain

3.2.1 More complexshapes

Kimme et al. were some of the rst researchers to extend the problem domain from

the detection of lines to more complex shapes by presenting a HT based method for

circle detection in 1975 [58]. Inaddition to using morecomplex analytic shapes, they

discoveredthat theedge direction information couldbeused to decrease thecomputa-

tional complexityofthetransform. In1981,Deanssuggestedaframeworkin which the

Hough transformmaybegeneralisedto detect any analyticallydened curves. Thisis

accomplishedbyrstdening HTasanintegraltransformas

H(;)= ZZ

1

I(x;y)Æ( xcos ysin)dxdy: (3.4)

Now,I(x;y)istheimagewith edgepointshavingavalue1andallothers value0. Æ()

is the Diracdelta function. Theargumentof the delta function evaluatesto zero only

onthelinewithparameters(;). Deansproposed[25]that (3.4)canbegeneralisedas

H(p;)= ZZ

1

I(x;y)Æ(p C(x;y;))dxdy (3.5)

whereI(x;y)isafunctiondened onthe(x;y)-planerepresentingtheimage. Theargu-

ment ofthedelta function denesafamilyof curvesparameterisedbyascalarpanda

vector =(

1

;:::;

n 1

). With two-dimensional parameterspace (asin lines), apoint

(x;y)isprojectedintoacurveparameterisedbyscalar. Inmorethan2dimensions,the

pointisprojectedintoann 1-dimensionalhypersurfaceinthen-dimensionalparameter

space. Foreachpoint(x;y), theaccumulationis performedbycalculatingthevaluesof

p foreach valueof thevector and incrementing thecorresponding accumulator cells.

Now, it is evidentthat thespace complexityof the accumulator is O(m n

) where m is

the number of indices for each dimensionof the accumulator. The time complexityis

O(m n 1

)beingalsoexponential. Finally, nding themaximainvolvessearching in the

n-dimensional accumulator. Altogether, thecomputation becomes unrealistic for large

n. As alreadypresented, additionalinformation such as edge direction canbeused to

constrain thecomputation. Intermsof themethodology ofDeans, this correspondsto

usinginformationbesidesthecoordinatesofasinglepixeltolowerthedimensionalityof

. This canbeformulatedas

H(p;)= ZZ

1

I(x;y)Æ(p C(x;))dxdy (3.6)

where pisthevectorofparametersthatcanbesolvedusing x=(x;y;:::)and, that

is, an additionalparameter canbe solved for each additional elementof x. Thus, the

dimensionalityof isreducedwhichreduces thecomputationalcomplexity. Inpractice,

the use of the edge direction is problematic because the estimation of the gradient is

pronetoerrorsduetonoise.

The previous formulation of HT is valid for analytic curves. Nevertheless, already in

1975,MerlinandFarberpresentedaHoughtransformtypealgorithmfordetectingnon-

(29)

the edge image and is impractical for real image data. In 1981, Ballard presented the

GeneralisedHoughtransform(GHT), whichallowsthedetectionofarbitraryshapes[4],

giving also the theory for extending the method for rotation and scale invariance. In

GHT, theedge directionis usedtoconstrainthesearch. However,theparameterspace

is still four-dimensional if therotation and scale invariances are allowed. Ballard also

commentsthat pairsof edge points couldalso beused to constrain thesearch but the

commentisleftatthat. GHThasbeenlaterimprovedbyusinglocalcurvatureandslope

[78],higherlevelprimitives[72],relativeangles [110, 56],and edgepoint pairs[100]. A

comparison ofthevariantscanbefoundin [56].

3.2.2 Invariant imagematching

TheaccumulatorofthestraightlineHoughtransformhasalsobeenusedasadescription

of an image content. In Publication III, a translation, rotation, and scaling invariant

imagematchingmethodispresentedwhichusesHTasafeatureextractionmethod. An

invariant feature vector can be constructed as follows: Let the (;) accumulator be

presentedasamatrixA,whereeachrowcorrespondstoaparticularvalueof,andeach

column toavalueof. First,thematrixisthresholdedas

A 0

ij

= (

A

ij

; ifA

ij

>t

0; ifA

ij t

8i=1:::m;j=1:::n (3.7)

wheretisthethresholdvalue. Next,thethresholdedmatrixisshrunktoaone-dimensional

vectorbysumming thematrixacrosscolumnsas

F 0

j

= m

X

i=1 A

0

ij

8j=1:::n: (3.8)

This vectoristhennormalisedwithrespectto themeanas

F

j

=F 0

j 1

n n

X

j=1 F

0

j

8j=1:::n: (3.9)

Intuitively,thevectorcanbethoughtofas anorientationhistogramoflinesexceedingt

in length. Thevectoristranslation invariantbecauseonlyangular information isused

to build it. The normalisation makes the vectorinvariantalso to scaling. Rotational

invariancecanbeobtainedbyusingarotationinvariantdistancetomeasurethesimilarity

betweentwofeaturevectors. IfRandS arethefeaturevectors,theirrotationinvariant

distance canbecalculatedas

d(R ;S)= min

k =1:::n n

X

j=1

R

j S

(k )

j

2

(3.10)

whereS (k )

isacyclicrotationofthevectorsbykstepstotheright. Theproblemofthis

approachisthatitcannotbeusedforlargeandcompleximagesbecausetheorientations

of lines donotcarry asucientamountof informationfor thematching. Themethod

canalsobemodiedtoinclude spatialinformationbyincorporatingthe-dimensionof

(30)

3.3 Computational improvements

As statedin theprevioussection,oneofthemain problems ofHTisits computational

complexity. Several lines of research have emerged to address this problem [71, 52].

The improvements may be based on deterministic versions of the original algorithm,

probabilisticmethods,orusingparallelhardware. Deterministicmethodsaimtodecrease

thecomputationtime,forexample,byrecursivepartitioningoftheaccumulatorasinthe

FastHoughtransform[74] andtheAdaptiveHoughtransform[43]. Otherdeterministic

methodsincludeusingconstraintsonparametercalculation,aspresentedintheprevious

section,and multi-stagemethods. However,themostactiveline ofalgorithmicresearch

seemstobetheprobabilisticmethods,whichuserandomsamplingforselectingasubset

oftheinputdata.

3.3.1 Probabilistic methods

Probabilistic Hough transforms are a family of methods that apply random selection

in orderto reduce thecomputational complexity. AlreadyBallard[4] commentedthat

severalpointscanbeusedto restricttheaccumulation. TheRandomizedHoughtrans-

form [130] (RHT) is amethod that isbased onthe selectionof n pointsat randomto

map themintoasinglepointin theaccumulator(see Fig.3.3). However,theuseof all

possiblepointsetswouldleadtoacombinatorialexplosion.Forthisreason,therandom

sampling andaccumulationiscontinuedonlyuntilanaccumulatorcellexceedsaprede-

ned thresholdvalue. Then,thatcellpresentsacandidatefeaturethatisveriedin the

image. Ifafeatureisfound,thepointscorrespondingtoitcanberemovedandtheaccu-

mulationcanbecontinued. This method correspondsto aMonte-Carlo approximation

of the integral in (3.6). Thus, when alarger thresholdis selected, it is moreprobable

that themaximumfoundcorrespondstoamaximumof thestandardHoughtransform.

TheRHTalgorithmcan bepresentedasfollows:

Algorithm2 KernelofRandomizedHoughtransform.

Lett denote the detection threshold.

1. Create the setD of alledge pointsinabinary image.

2. Selecttwopointsd

i andd

j

fromD atrandom.

3. Calculatethe lineparameters (a;b) corresponding tod

i andd

j .

4. AccumulateA(a;b)=A(a;b)+1.

5. If athreshold is notreached, A(a;b)<t, go toStep 2. Otherwise, the parameters

aandb describethe parameters ofthe detectedcurve.

IntheProbabilisticHoughtransformbyKiryatietal.[59],asubsetofpointsissampled

andastandardHoughtransformisthenperformed. Intheframeworkof(3.6),thiscor-

respondsto computingtheintegralonlyoverarandomsubsetofthexy-plane. Because

theHoughtransformtoleratesmissingdata,thismethodperformswellaslongasenough

(31)

y b

a x

Figure3.3: Inthe RandomizedHoughtransformfora line,a pairofimagepointsis

mappedtoasinglepointintheaccumulator.

randomseedpointwithallotherpointstoperformtheaccumulationina1-dimensional

accumulator. Thus,iftheseedpointis partof aline, itisfound atthisstage. Because

the line must be throughthe seed point, only a1-dimensional accumulator is needed.

TheDynamicGeneralisedHoughtransform[70]isanextension,whereaconnectedseed

pointisrst determinedandthen aproceduresimilar toRHT isfollowed. IntheCon-

nectiveHoughtransform[132],seed pointsareused with arow-wiseaccumulationof a

1-dimensionalaccumulator.

Probabilisticmethodscanalsobeusedinconjunctionwithlocaldata,suchasthegradient

direction. TheRandomizedGeneralisedHoughtransform[31]combinestheeciencyof

RHTwiththeabilitytodetectgeneralshapesofGHT.AsinGHT,thegradientanglesare

locallyestimatedfromagraylevelimage. IntheWindowRandomizedHoughtransform

[52, 51], awindowis placedovera randomly selected point and a line is tted to the

pointsinsidethewindow. Ifthettingerrorissmall,theaccumulatorcellcorresponding

to theline parametersis accumulated. The ConnectiveRandomized Hough transform

[51](CRHT)is asimilarmethod that alsorestrictsthettedpointsto beconnectedto

thecentrepoint. Inaddition,thesearchforconnectedpointsisperformedasasectored

scan,whichmeanslimitingthesearchdirectionstothethreeadjacentcompassdirections

eachtimetoreject pathswithloopsandsharpbends.

In Publication I,an extensionto CRHT ispresentedthat loosens therequirementof a

connectedlinebyallowinggapsin theline. ThealgorithmfortheExtendedConnective

RandomizedHoughtransform(ECRHT) canbepresentedas:

Algorithm3 KernelofExtendedConnectiveRandomizedHough transform.

1. Create the setD of alledge pointsinabinary edgepicture.

2. Selectapointd

i

randomlyfromthe setD.

3. Create a ww window centred atd

i

, andlocate points connectedtod

i

using the

maximumgapcriterion andthe sectoredscan.

4. If enough connected points are found, t a curve to those points using the least

square sum methodand calculate the line parameters (a;b); Otherwisego to Step

(32)

y b

a x

Figure 3.4: In the Extended ConnectiveRandomizedHough transform, local tting

arounda singleimagepoint isusedto present evidenceabout a particularparameter

combination.

5. Ifthe ttingerroriswithinatolerance,accumulatethecellA(a;b)in theaccumu-

lator space; Otherwisego toStep2.

6. IftheA(a;b) isequaltothe thresholdt,theparametersaandbdescribetheparam-

etersof the detectedcurve; OtherwisecontinuetoStep 2.

Theparametersaandb arecalculatedfrom theleast-squaresapproximation

a = n

P

x

i y

i P

x

i P

y

i

n P

x 2

i (

P

x

i )

2

; (3.11)

b = P

y

i a

P

x

i

n

(3.12)

where x

i and y

i

denote the coordinates of pixel i and n denotes thenumberof pixels.

Usingthettingerrore,

e= a

2 P

x 2

i +

P

y 2

i 2a

P

x

i y

i +2ab

P

x

i 2b

P

y+nb 2

n(a 2

+1)

; (3.13)

onlyreliableestimatesofthelocalgradientareaccumulated. Theideaisthataccumulat-

ingonlygoodlocalestimatesdecreasesthetotalnumberofaccumulationscomparedfor

exampletotherandomisedHoughtransform. Thisimprovestheperformanceespecially

withimageswithmanylines. Thelocallinesegmentsdonotneedtobeverylongasthey

donotyetqualifyaslinesbutjustpresentevidenceaboutaline(seeFig.3.4). Compared

toCRHT,themainadvantageofECRHTisitstolerancetomissingdata. Comparedto

RHT, theadvantageisthedecreasedamountofcomputation andstoragebecauselocal

information is used to verify accumulations. The main disadvantageis that if a large

amountofdataismissing,thealgorithmwillfailbecausesucientlocalinformationwill

notbeavailable.

3.3.2 Parallel processing

The parallel Hough transformalgorithms havebeen studied widely for bothdedicated

(33)

000 000 000 000 000 000 111 111 111 111 111 111 000 000

000 000 000 000 111 111 111 111 111 111

000 000 000 000 000 000 111 111 111 111 111 111

y

y b

a x

x

a b

Figure3.5: DivisionofthedataandcomputationinparallelHoughtransforms:Either

sourceimageortheaccumulatorcanbedividedbetweenprocessingelements.

canbecategorisedintotwoclassesdependingonthedivisionofthedataandcomputation:

either (a)thesourceimage,or (b)theaccumulatorissplitbetweentheprocessors. This

is illustrated in Fig. 3.5. The dierences in these approaches have been evaluated by

Thazhuthaveetilet al.[109]. An approach tosplit boththe imageandtheaccumulator

hasalsobeenproposed[9].

Special hardware architectures include mesh-connected SIMD processors[98], discrete

logic [23], transputers in ahypercube [22] recongurableprocessorarray [55], recong-

urable multi-ring network [10], and VLSI contentaddressable memory [84, 82]. These

specialisedhardwaresolutionscanoerthehighestperformanceatthetimetheyarebuilt

but therapid advance of computertechnology makesthem growold almost overnight.

Forthisreason,moregeneralparallelarchitecturesareofgreatinterest.

GeneralpurposehardwarearchitecturesfortheHoughtransformincludesharedmemory

multiprocessors[21],distributed memory multiprocessorslike CrayT3D [22],IBM SP2

[60],andFujitsuAP1000[118], andgeneralpurposedigitalsignalprocessors[9,23,63].

An implementation ofthe Houghtransformin adistributed workstationnetworkis re-

ported in Publication VI,where the slowcommunicationenvironmentis takenintoac-

countbyminimising thecommunicationbyusingdivisionoftheaccumulatorinsteadof

theimage. Inadditiontodecreasingthecomputationtime,thisapproachdecreasesthe

storage requirementastheaccumulator isdividedbetweentheprocessingelements. In

addition to the workstation environment, the implementation is examined in a multi-

processorPC,demonstratingthatinexpensivegeneralpurposehardwarecanbeusedto

(34)

ImprovedperformancehasbeenobtainedbyusingmodiedHoughTransformtechniques.

TheMultiresolution HoughTransformhasbeensuccessfullyimplementedusing apyra-

mid architecture [3]. Also the Fast Hough Transform [74] has been improved for use

in amultiprocessorarchitecture[36]. Parallelimplementationonadistributed memory

systemreportedin [9]usesdigitalsignalprocessorsandoverlappingsharedmemory.

InPublication VI,algorithmsforparallelRHTarepresentedforashared-memorymul-

tiprocessor. InRHTfrequentcommunicationbetweenprocessingelements(PE) isnec-

essary. Therefore, thealgorithm uses shared memorywhich allowsrelatively easyand

fast interaction betweenPEs compared to messagepassingparallel systems. Thedata

structureforthelistofpointsisanimportantdetermination,sinceitshouldallowparal-

lelcreationandaparallelupdatemechanismwhenlinesarefound. Theimageisdivided

intonparts andeachPE createsthelistof pointscorrespondingto thatpart. The list

update is performedbyeach PEupdating thecorresponding list, regardlessof theline

found. ParallelalgorithmforRHTcanbepresentedas:

Algorithm4 Parallel RandomizedHough Transform.

Letn denotethe numberof processingelements (PE).The identierofa PEisdenoted

by id=f0;1;:::;n 1g. Thesamealgorithm is executedin eachPE.

1. Makea list of the pointsD

id

in the image region whereidx

max

=nx<(id+

1)x

max

=n. x

max

isthe maximumvalueof the x-coordinate.

2. Synchronise.

3. If id=0thencalculatethe total numberofpointsin the listsD

i

;0i<n.

4. Synchronise.

5. Selecttwopointsd

i andd

j

atrandom.

6. Accumulatewiththe lineparameters corresponding tothe points.

7. If threshold tisreached, setthe ag F

found .

8. If theag F

found

isnot set, returntoStep5.

9. Synchronise.

10. Updatethe pointlist D

id

byremovingpointscorresponding tothelinefound.

11. If alllinesarenot found,returntoStep 2.

There are twomain parts in thealgorithm,accumulation(Steps5 to 8) and point list

update (Step9). Synchronisationpointsareneededtoseparatethese. Ateach synchro-

nisationpoint,noPEcan continuebefore alldesignatedPEs havereachedthesynchro-

nisationpoint.

Becausetheoverheadofparallelisationvariesdependingonthevariablenumberofpoints

in theimageandthenumberofprocesses,adynamiccontrolofthenumberofprocesses

is necessaryfor good performance. The overheadof theparallelisation ismeasuredfor

each line sought and the number of processes is controlled to minimise the execution

(35)

Algorithm5 DynamicParallel RandomizedHough transform.

Let ndenote the total numberof processing elements (PE) and pdenote the numberof

active PEs. Initiallyp=n.

1. Makealist ofpointsD

id

in theimage region whereidx

max

=nx<(id+1)

x

max

=n. x

max

isthe maximumvalue ofthe x-coordinate.

2. Synchronise.

3. If id=0calculatethe totalnumber ofpointsin listsD

i

;0i<n.

4. Synchronise.

5. Selecttwopointsd

i andd

j

atrandom.

6. Accumulatewiththe lineparameters corresponding tothe points.

7. If threshold tisreached, setthe ag F

found .

8. If theag F

found

isnot set, returntoStep5.

9. Updatepointlists, eachprocess updatesn=plists.

10. If synchronisation time is greater than the eective accumulation time, set ag

F

reduce .

11. If F

reduce

isset,terminateprocesses p=2+1:::pandsetp=p=2.

12. Synchronise. If ag F

reduce

isset,change the number of processes arriving tothe

nextbarrier.

13. If alllinesarenot found,returntoStep 3.

The parallel HT presents very good speedups compared to the sequential algorithm.

Because little communication is needed, the parallel algorithm is scalable and can be

applied ecientlyalsoin widelyparallel environments. However,theeciency of RHT

isremarkablecomparedtoHT, exceedingalsothatof theparallelHTin multiprocessor

workstations. However,theParallelRHTrequires asubstantialamountofcommunica-

tion. Therefore,itcannotbeappliedinastandardnetworkedworkstationenvironment.

Theparallelisationoverhead isthus largerwithRHTthanwith HT,andthescalability

of RHT isnot as good. Byincorporating thedynamic allocationof work,the commu-

nication overheadcanbeconstrainedandthe Parallel RHToutperformsthe sequential

algorithmsaswellastheparallelHT.

(36)

(37)

Gabor Filtering

Gabor analysis[32]isamethod forcombiningtimeand frequencydomainanalyses. In

computervision,Gaborlteringhasbeenusedforanumberoftasks,includingedgeand

line detection[81, 19],textureclassication[12],and compression[92, 73].

In thischapter,theuseof Gaborltersin thecontext offeature extraction andobject

recognitionisexamined. First,theformulationoftheGaborelementaryfunctionsiscov-

eredintheone-dimensionalcase. Then,theformulationisextendedfortwo-dimensional

images with the focus on lternormalisation, parameter selection, andthe advantages

of Gabor ltering. Next, the applications of Gabor ltering in object recognition are

examined noting theirinvariance properties. Finally, ascale, rotation, and translation

invariantGaborfeature ispresentedforrecognisingbinaryimages.

4.1 Gabor elementary functions

4.1.1 One-dimensionaltime-frequencyspace

In1946,DennisGaborpresentedanewmethodofanalysingsignalsinwhichtimeandfre-

quencyanalysesarecombined[32]. Theeectivewidthsofasignalintimeandfrequency

domains, tandf,aredened basedonthevarianceofthesignalin thecorrespond-

ing domains,whichleadstothedenition ofminimaluncertaintyastheproduct ofthe

eectivewidths,

tf

1

2

: (4.1)

Gaborthenshowsthatthesignalforwhichtheproducttf isminimalandturnsthe

inequalityintoanequality,is

(t)=e

2

(t t0) 2

| {z }

Gaussian e

j(2f0t+)

| {z }

sinusoid

(4.2)

where j istheimaginaryunit. Thefunction representsacomplexsinusoidalwavemod-

ulated by aGaussianprobabilityfunction. ,t

0 , f

0

,and areconstantswhichdenote

(38)

f 0 f

t 0 t

Ψ ψ

α 1 α _

Figure4.1:One-dimensionalGaborelementaryfunctionintimeandfrequencydomains.

thesharpnessoftheGaussian,centreoftheGaussianintimedomain,andthefrequency

ofthesinusoidanditsphaseshift. ApplyingFouriertransformto (4.2)gives

(f)= p

e

(

)

2

(f f0) 2

| {z }

Gaussian e

j( 2t0(f f0)+)

| {z }

sinusoid

(4.3)

whichalsoisacomplexsinusoidmodulatedbyaGaussian. Theeectoftheparameters

is illustratedinFig.4.1which showstheGaussianenvelopeof thefunctionin boththe

time and frequency domains. The real and imaginary parts are separately shown in

Fig. 4.2. The function in (4.2)became later known as the Gaborelementary function

(GEF)andcanbeusedtoanalysesignalsbydecomposingthemintoanumberofGEFs.

Gabor motivated this decomposition by the minimal joint uncertainty of the GEF in

time-frequencyspace. ItcanalsobenoticedthatthedecompositionincludestheFourier

analysis andthetime-domain analysisasspecialcases,namely,

!0) (t)!sinusoid)Fourieranalysis

!1) (t)!Dirac-delta)timedescription:

(4.4)

TheGabordecomposition isapredecessorofmultiresolutionandwaveletanalyses. The

problemofthedecompositionisthattheGEFsarenotorthogonal,whichcausescompu-

tationalproblemswhentheyareusedasawaveletbasis.

Instead ofdecomposition,signalscanbeanalysedbyconvolvingthemwith GEFs. The

lterresponse toinput(t),thatistheconvolution,canbedened as:

r

(t)=

Z

1

(t )()d: (4.5)