Applications in object recognition

Gaborltershavebeenappliedtoseveraltasksinobjectrecognition,including

segmen-tation, representinglocal features, and extracting size information. Some applications

depend onaseparatesegmentation stepbefore recognitioncanbeperformed. Forthat

reason, theuse of Gaborlters in segmentationwill be considered rst before moving

totheiruseintherepresentationofobjects,whichwillformthemostsignicantpartof

this section.

ThepowerofinspectinglocallyprominentfrequencieshasmadeGaborlterspopularin

textureanalysis. Thishasbeenutilised byJainetal.forthesegmentationofobjectsin

complexbackgrounds[47]. Amulti-channeldecompositionofanimageisrstperformed,

followed by aselection of channels based on minimising the reconstruction error. The

responsesaresubjectedto asigmoidnonlinearity,averaged,and thenclustered toform

the segments. A bottom-up merging has also been used to segment images based on

f ₀

1 1 1 _ 1

_ _

16 32 _

8 4

θ

135 90 45 0

Figure4.6: Absolutelterresponsesindierentfrequenciesandorientations.

In addition to segmentation, Gabor lters have been applied as features to represent

objects. Theseapproachescan bedividedinto fourcategoriesdependingonthetypeof

theinformationused: a)responseatasinglepointisused;b)responsesonapredened

gridofpointsareused;c)responsesonadeformablegridareused;d)globalfeaturesof

responsesoveranareaareused.

Gabor lter response on a single point hasbeen used to realiseane invariant

recog-nition of image patches [6, 123, 7]. Full ane invariance was established by alog-log

mappingofthefrequencydomaintogetherwithltersin severalorientations. However,

theimageneedstobesegmentedinordertondthecentresoftheimagepatches. Weber

andCasasentperformeddistortioninvariantrecognitionbyanentirelydierentconcept

usingalinearcombinationofseveralseparatelyoptimisedlters[124,125]. Therelative

lter positions were also optimised in addition to other lter parameters. While the

distortion tolerance of the systemis good, there wasno invarianceto geometric

trans-formations. The author has used Gabor lters to perform distortion tolerant rotation

invariantrecognition of electronic components [54, 53]. The recognitionis based on a

robustestimationofthedimensionsofanobjectindierentorientationsaroundasingle

point. Forasinglesampling point,it ispossibleto realiseinvariancesusing analysisof

thelterresponse.

TheGaborresponsesonarigidpredenedgridhavebeenusedtorecognisehandwritten

numerals [37]. Filter parameters were optimised byhand with a 1-NNclassicationof

theresponsevectors. WalterandArnrichproposetouseagridofninereal(even)Gabor

lters with a backpropagating multi-layerperceptron to position a robot manipulator

[122]. Jain et al. use even symmetric (real) Gabor lters to extract the local ridge

structures ofngerprints[46]. Thengerprintisinspected usingacircular gridaround

adetectedreferencepoint. Foreachcellofthegrid,thedeviationofthelterresponses

for each lteris used to constructthe feature vector. Altogether, noneof the systems

employingrigidsamplingofresponsesachieverotationor scaleinvariances.

Invariancetosmallscalegeometricdeformationshasbeenachievedthroughtheuseofa

deformable graphto representtheGabor lterresponses[15, 14, 65]. Asingle nodeof

agraphrepresentsthelterresponses ofabankoflters,oftencalled aGaborjet[15].

Themodelgraphofanobjectcanbeconstructedeitherusingsamplingonaregulargrid

[15,101]orsamplingininterestlocations[112]. Themodelgraphisthencomparedwith

animagetondthelocationsforeachnodeintheimagethatminimiseboththegeometric

deformation ofthe graphand thedierence oflter responses at these locations. This

approach is invariant to small changes in the locationsof thegraph nodes which may

result,forexample,fromgeometrictransformations. However,fullinvariancetorotation

or scaleis not achieved asthe lterresponses aredirectly compared yet the responses

areinvariantonlytosmall changesinscaleandrotation.Thisrestrictionisalleviatedby

storingseveralmodelgraphsforobjectsofdierentsizes. Theapproachhasbeenapplied

tofacerecognition[15,14,65,128]andclassicationofhandpostures[112,113]. Krüger

and others presenta variantthat utilises Gaborfeatures to extract local line segments

whicharethenfedtoalearningprocessthatoutputsthemodelgraph[61,62]. Parkand

Yang combine the useoflocal Gabor features withRandomizedHough transformtype

evidenceaccumulationtoestimatetheparametersofasimilaritytransform(translation,

rotation, andscaling)[88]. Local Gaborfeaturessimilar to Gaborjets areused tond

matching of local elements is performed in a rotation invariant manner which should

guaranteefullrotationalinvariancewhiletheinvarianceinscaleremainssmall.

Gaborlterresponsescanbecombinedoveranareatoachieveaglobaldescriptorforthat

area. LampinenandOja[69]employclusteringoftheresponsesofaGaborjettoassign

eachpixeltoasinglecluster. Then,thehistogramofclustersoveraGaussianwindowis

used as aninputtoasubspaceclassier. Douville[26] usestheresponsessummedover

the image. Shioyamaet al. [102] compute a histogram of responses over a segmented

image regionas adescription of that segment. They demonstratethe performance by

detecting cars in real trac scenes. The global systems presented are only scale and

translationtolerantanddonotachievefull invariance.

In Publication IV, invariantrecognitionproperties ofGaborltersareexamined in the

contextofdirectionaledgedetectioninbinaryimages. Amethodtoselectlter

parame-tersispresentedwhichassuresthatthecircularGaussianisabletocapturetherequired

number of angles. Translation invariance is achieved by summing the lter responses

overtheimage. Thatis,afeaturevectorGisconstructedfromtheindividualresponses

r(x;y;)as

wherenisthenumberofltersindierentorientations. Becausethelterresponses

rep-resenttheedgesindierentorientations,scaleinvariancecanberealisedbynormalising

thefeaturemagnitudes. LetG 0

Rotation invariantdistance measure for the features canbe presented byrst dening

therotationof thefeature vectoras

wherek=0:::n 1istherotationindex. ThentherotationinvariantsquaredEuclidean

distanceoftwofeaturevectorscanbedenedastheminimumoverallrotations,thatis,

d(G;H)=min

where Gand Hare thefeature vectors. InPublication IV theinvariancesare

demon-strated with an experiment with digits. A preliminary version of Publication IV was

published as[64].

InPublication V,therecognitionmethodisappliedtomatchingofbinarysymbols. The

methodisinspectedusingthelargerdatasetpresentedinmoredetailinPublicationIII.

BoththeGaborlteringandHoughtransformmethodsarebasedontheconstructionof

ahistogramofedgesin dierentorientations. ThenoisetoleranceoftheGaborltering

based image matching is found out to be superior to the Hough transform generated

Discussion

Invariantobjectrecognitionhasbeenrecognisedas oneof themostcentral problemsin

computervision. Featureextractionisanimportantpartofavisionsystemwhich

con-vertsdigitalimagedataintohigherlevelfeatures. Inthisthesis,newfeatureextraction

methods have been presented and analysed. Methods based on Hough transformand

Gaborlteringhavebeenimproved. Thepublicationswillnowbereviewedbasedonthe

objectivesofanobjectrecognitionsystem,andthendiscussedonebyoneconcentrating

onthecontributionsandrestrictions.

In Chapter 1, the objectives of an object recognition system were dened as: The

system should be general enoughto recognise a large variety of objects, invariant to

naturalvariations,stable againstdistortions, and computationally ecient. These can

be used to survey the contents of this thesis. Both the line segments detected by the

Houghtransformand theGabor featurescanbethoughtofasgeneralshapeprimitives

because they are not limited to a single application and are able to represent many

objects,particularlythosewhicharemanmadeandhavegeometricshapes. However,it

seemslikelythattheselowlevelfeaturescannotbeuseddirectlybutmustbeusedinthe

generation ofahigherleveldescription.

It is doubtful if global shape typefeatures can be used for the recognitionof complex

objects. Theshapesofsuch objectscanbecomplexandoftencannotbedescribedbya

singlecontour. Also,theglobalfeaturescanoftenbespatiallyinstable,thatis,theyare

notequally sensitiveoverthewhole shape,and theobjectsneedto besegmentedfrom

thebackgroundinorderto useglobalfeatures. Whilelocalfeatures canovercomethese

problems, at least partially, the question is how local and global information should

be combined. The idea of combining local and global information can be seen both

in the Hough and Gabor approaches. The Hough transformis inherentlyan evidence

gatheringprocessthataccumulateslocalevidencetoexaminetheglobalconguration. In

PublicationI arobustlocalestimationtechniqueispresentedtoestimatetheparameters

of a line segment inside a local window to be used in the evidence gathering. The

inspectionsystempresentedinPublication II utilisesinformationonlocallinesegments

andcentroidsofcirclesttingandcomparesittoaknownglobalmodel. Likewise,Gabor

ltering providesawayto robustlyextract localdirectional edgesand lines, which can

then becombinedtoaglobaldescriptionasin Publication IV andPublicationV.

Translation,rotation,andscaleinvariantrecognitionusingglobalGaborfeaturesis

pre-sented in Publication IV and Publication V. In Publication II the rotation invariant

recognitionwasrealisedusinglocalfeatureswithknownspatialrelationships. TheHough

transformbasedtranslation,rotation,andscaleinvariantfeaturespresentedin

Publica-tionIII werebasedonglobalandlocallineorientations.

Theexistenceoftrulyinvariantfeaturesisquestionableeveninthehumanvisualsystem,

asithasbeenfoundthathumansrecognisecertainviewsofanobjectmorequicklythan

others [87]. Inaddition, thehumanvisualsystemcanbeprimedto certainposesof an

object. Forexample,alphabetic charactersare recognisedmorequicklyin theirnormal

positionthanupsidedown. An importantnoteisthatahumanbeingcanstillrecognise

thesecharactersregardlessoftheposeeventhoughtherecognitiontimechanges. While

this evidenceis contradictorytothe existenceof globalinvariants,researchin invariant

recognitionis certainly warrantedand the evidence may, in addition, provideideas for

further approachestocomputervision. Itisnotverylikelythattheglobalfeatureswill

provideenoughinformation for discriminatingimages in largedata sets. It seemsthat

responsesof Gabor lters arepowerfulasdescriptorsof local parts but thelocal parts

havetobecombinedin aglobaldescription.

Thestabilityagainstdistortionsisoneofthemostimportantqualitiesofacomputer

vi-sionsystemwhichoperatesinnaturalsurroundings. InPublication I thenoisetolerance

of theconnectiverandomised Houghtransformwasimprovedwithanewalgorithm for

thelocalestimation. Theexperimentsin Publication V verifythenoisetoleranceofthe

Gabor lteringbased features. Comparingthe Hough and Gaborapproachesit seems

that thelocalfeatures extractedbyGaborlteringaremorestableagainstsuch

distor-tions as noise and small displacementsof the target. However,the evidence gathering

natureoftheHoughtransformmakesitmoretolerantagainstmissingdata. Thus,itcan

only beconcluded that while thestability of features is desirable,the optimallystable

feature extractionmethoddependsontheapplication.

The Hough transform is known to be computationally demanding. In Publication I a

variantwasproposed that reduces thecomputationalburden byemployinglocal

infor-mation. Whilethe method for local estimation ismore complexthan in thepreceding

method,CRHT,thetotalcomputationtimeisdecreasedinsomecasesduetothebetter

estimates of thelocal structure. Whileseparate analysesof parts of thealgorithm can

revealimportant information, theycannotbecombineddirectly to form atruth about

thewhole.

ThestandardHoughtransformisnormallyimpracticallyburdensome. InPublication VI

itwasshownthatwithmodernparallelenvironmentscomputationcanbespedup

consid-erably. ThefocuswasontheRandomizedHoughtransform,whichhasbeenfoundoutto

outperformthestandardHoughtransform. Aproposedparallelalgorithmwasdiscovered

tobesuperiorbothtosequentialrandomisedandparallelstandardHoughtransformsin

thechosenenvironments. Fromtheresearchtheconclusioncanbedrawnthatalow-scale

parallelisation of known algorithms is often possiblein an ecient way. On the other

hand,inhighlyparallelenvironmentsthealgorithmdesignshouldbestronglyrelatedto

Publication I presents a new method to extract local information to improve the

ro-bustness of evidence accumulation in CRHT. The underlying idea of using the local

informationhasbeenpresentedbyotherauthors,butthenovelcontributionofthepaper

istherobustlocalestimationmethod. Arestrictionofthetechniqueisthatitonlyhelps

in the detectionproblem, that is, it is helpful when arandom subset ofevidences, i.e.,

edgepoints,ismissingcausingshortgapsinlocallinesegments. Inaddition,ifonlyshort

gapsarepresent,themethodcanlowerexecutiontimebecauseofthebetterestimatesof

lineparameters. However,themethodisnotapplicableforlonggapsinthelinesbecause

theexecutiontimegrowsintolerable.

Publication II presentsasystemfor inspectingtwo-dimensionalsheetmetalparts. The

contributionofthepublicationisincombiningexisting methodsintoareal-world

appli-cation, and in the analysisof themeasurementaccuracy. A drawbackof thesystemis

that it is basedon two-dimensional models of the parts. Thus, the calibration is

esti-mated usingtwo-dimensionalsimilaritytransform,the partand theimage planeof the

cameramustlieparallel,andthethicknessofthepartscannotbetakenintoaccount.

Publication III presents new global features based on Hough transform for matching

images. Originally, it was desiredthat these features could beused fordiscriminating

betweendierenttypes ofdocuments such as electrical and architectural drawingsand

maps becauseinthese technicaldrawings,theorientationsoflinesoftendepend onthe

type of the drawing. However, it wasdiscoveredthat the feature vectordid not have

strong correlation to the drawing type. Nevertheless, the features were found capable

of discriminating line drawingsymbols. A major restrictionof the method asaglobal

featureisthatsuccessfulsegmentationmustbeperformedbeforerecognition. For

recog-nising thedrawingtype, itseemsthat amoreeectiveschemecouldbeconstructedby

identifying typical primitives for each type, and performing the recognition based on

these.

PublicationIV presentsaglobal,Gaborlteringbasedfeatureforrecognition. The

pub-lication hastwoprimary contributions,theselectionoflterparametersandtheglobal

feature. Themethod ofparameterselectionis basedonaformulationofaGaborlter

that forces the lter envelopesto be circular Gaussians. In addition, the relative

fre-quency bandwidthis selected. Therefore,thespatial size of altercouldbecontrolled

only using the frequency. However, as stated in Chapter 4, the frequency and

orien-tation bandwidths can be controlled independently using parameters and , which

would seem to bea morereasonable way to solvethe problem of parameter selection.

Thehistogramlikescaling androtationinvariantglobalfeaturepresentedistoauthor's

knowledgeoriginal, but itis closelyrelated tothe energyonacertainband, which has

previously been used and isalso computationally lighter to compute. Thenecessityof

segmentationappliesalsoto thisfeature astheresultof itsglobalnature,and presents

oneofthemajorlimitationsfortheapplicationofthefeature.

PublicationV presentstheapplicationoftheglobalGaborfeaturestosymbolrecognition

and compares theresultsto thosein Publication III. Thenoisetolerance of theGabor

feature is found superior to Houghtransform basedfeatures. Thisis an important

at-tributeofusingGabortypefeatures,asthenoisetoleranceisoftenverygoodbecausethe

use ofalimitedrange ofthefrequencyspectrumsuppressesnoiseonother frequencies.

Itwouldbebenecialtocomparethepresentedfeaturestootherglobalfeaturessuchas

Publication VI presentsparallelHough transformalgorithms. Thecontributionsof the

publication are the parallel algorithm for the Randomized Hough transform and the

assessmentofsuitabilityoftheparallelenvironmentto theHoughtransform. Whilethe

results showthat the useof networkedPCscandecrease thecomputation time ofHT,

it isnot verylikelythat theenvironmentwould beuseful in areal applicationbecause

in thecaseofstandardnetworkhardware,thenetworklatenciesarealwaysconsiderable

and thus real time performance is hard to obtain if even possible. However, the use

of multiprocessor PCs seems to provide a cost ecient platform for image processing

and computervision. A problem with the parallel RHT algorithm is that while good

performance is obtained, the algorithm doesnot scale up very well to alarge number

ofprocessorsbecauseoftherandomunpredictablenature ofthealgorithm. For optimal

performanceinmultiprocessorenvironmentsitcouldbebenecialtostudyotherecient

HT variants,such astheAdaptiveHT, that donothaverandombehaviourand would

bemoresuitabletoaparallelenvironment.

Somedirections forfuture researchcanalreadybeseen. First,incorporatingthehigher

levelstructuretothelocalfeaturesseemstobenecessary,sinceglobalinvariantfeatures

do notseem tobeeectivein discriminating largeand complexdata sets. Thespatial

relationships of the local features should be included in the representation. The

cur-rent graphbased approaches suer from the fact that while the features orthe model

matchingcanbeinvariant,thenon-invariantknowledgeaboutlocalfeatures,suchasthe

orientation,isnotincludedinthematching. Anotherareaofresearchistheapplication

of learning. Researchon articialneuralnetworks hasproduced themeansto

incorpo-rateknowledgethroughtrainingbyexamples. However,thedownsideofneuralnetworks

is that their acquired knowledge is often very hard to interpret after the learning. In

contrast,traditional articial intelligence techniques,such asdecision trees, canbe

in-terpreted, but theyoftendonottolerate uncertaintyand alsocannot describecomplex

relationships. Therefore,futureresearchisclearlynecessaryintheuseoflearning,both

inside thefeatureextractionandclassicationstagesandin theinteractionbetweenthe

stages. In1960,SelfridgeandNeisser[99]wrote: Nocurrentprogramcan generatetest

featuresofitsown. Whilethisisnolongertrueasgenerallow-levelfeaturesareknown,

knowledge about the interaction between the stages of a vision system is still mostly

nonexistent,and itisnotclear,whatkindofgeneralhigherleveldescriptionsshouldbe

In document Local and Global Feature Extraction for Invariant Object Recognition (sivua 42-60)

f 0

1 1 1 _ 1

_ _

16 32 _

8 4

θ

135 90 45 0

f ₀