首页 养生问答 疾病百科 养生资讯 女性养生 男性养生
您的当前位置:首页正文

Proceedings of the International Multiconference on Computer Science and Information Techno

2023-09-14 来源:华佗健康网
ProceedingsoftheInternationalMulticonferenceonComputerScienceandInformationTechnologypp.355–362ISSN1896-7094c2006PIPS󰀁

356JanKwiatkowski,MarcinPawlik,DariuszKonieczny

Thesecondfrequentlyusedmetricisspeedup,whichcapturestherelativebenefitofsolvinggivenprob-lemusingparallelsystem.Therearedifferentspeedupdefinitions.Generallythespeedupisdefinedastheratioofthetimeneededtosolvetheproblemonasingleprocessortothetimerequiredtosolvethesameproblemonparallelsystemwithpprocessors.Dependingonthewayinwhichthesequentialtimeismeasuredwecandistinguishabsolute,realandrelativespeedups.Inthepapertherelativespeedupisusedwiththesequentialtimedefinedasatimeofexecutingaparallelprogramononeoftheprocessorsoftheparallelcomputer.Theoretically,speedupcannotexceedthenumberofprocessorsusedduringprogramexecution,howeverdifferentspeedupanomaliescanbeobserved.Intheoryaspeedupanomalyoccurswhentheprogramdoesnotexecuteinthewaypredictedbytheperformancemodel.Specificallythespeedupcanbelargerthanthenumberofavailableprocessors—socalledsuperlinearspeedup.Therearethreedifferentreasonsoftheabovephenomenon:

–algorithmdependantanomaly—causedbytheinternalalgorithmstructure,inotherwordsthewayhowtheproblemisparallelized,forexampleinsomeparallelsearchalgorithmsifasearchtreecontainssolutionsatdifferentdepths,thenafterdistributionofthistreeamongdifferentprocessors,thesolutioncanbefoundbyexploringfewernumberofnodes,

–hardwaredependantanomaly—causedbysomespecifichardwarefeatures,forexamplebythein-terconnectionnetwork,sizeofcache,internalmemory,etc.The“cacheeffect”isanexampleofthisanomaly,whenaprogramisexecutedonalargenumberofprocessors,theproblemsizeisreducedandallneededdatacanbeplacedintherelativelyfastercachememory,itcausesthereductionofexecutiontimeandefficiencygrowthevenover1,

–executionenvironmentdependantanomaly—causedbytheexecutionenvironment,mainlyoperat-ingsystemfeatureslikeschedulingsystem,forexample,whenduringparallelprogramexecutionprocessorswithdifferentperformanceareused.Theperformancemetricspresentedearlierdonottakeintoaccounttheutilizationofprocessorsintheparallelsystem.Thenextmetriccalledefficiencyofaparallelprogramisdefinedasaratioofspeeduptothenumberofprocessors.Intheidealparallelsystemtheefficiencyisequaltoone.Inpracticeefficiencyoftheparallelsystemsisbetweenzeroandone,howeverwhenexecutionanomaliesoccurs,theefficiencycanbegreaterthanone.Thenextusefulmetricsisthescalabilityoftheparallelsystem[1],[2].Ingeneralitisameasureofitscapanilitytoincreasespeedupinproportiontothenumberofprocessors.Therearetwowaysinwhichscalabilityanalysiscanbecarriedout:withfixedandscaledproblemsize.Thefixedproblemsizescalabilityanalysisanswersthequestion:whatisthefastestIcansolveproblemAoncomputerX[1].Inthiscasedifferent(mainlyhardwaredependant)executiontimeanomaliescanoccur.Intheirpresencethesecondapproach,scaledproblemsizeanalysiscanbeused.Inthiscaseitischeckediftheefficiencycanbekeptatthesamelevelwhentheproblemsizeandthenumberofprocessorsisconcurrentlyincreased.Onecansaythatasystemisscalablewhenforincreasingnumberofprocessorsandasizeoftheproblemtheefficiencyisthesame.Inthepapertheinfluenceofthesetwoapproachesonexistingspeedupanomaliesisshown.Thelastmetricusedinthepaperisthecomputationalthroughput,definedastheamountofdataprocessedbyasingleprocessorinaunitoftime.Duringthethroughputanalysisonlythepartofdatadistributedbetweentheprocessorswiththesizedirectlydependantontheirnumbercanbetakenintoconsiderationiftheremainingdatarepresentstheconstanttimefactor.

3ParallelizationoftheSOMAlgorithm

TheKohonenSelfOrganizingMaps(SOM)[3]arecommonlyemployedtoprocesslargeinputdatabuttheireffectiveworkingabilitiescanbeachievedonlyafteratime-consumingprocessoflearning.Hardwarerequirementsneededforusageofevenmoderatesizenetworkscaneasilyexceedavailableresources.Parallelalgorithmsofferthesolutiontothisproblem.Theydividetheprocessofcomputationsbetweenmanyindependentlyworkingsmallcomputersorutilizethewholecomputationalpoweroflarger,multiprocessormachines.Whileofferingapossibilitytoreducethetimeneededtocreatetheresultingnetwork,workonbiggerdatasamplesorpreparemoredetailedresults,theypreserveimportantpropertiesofsequentialSOMimplementations.

Intheoriginalsequentialalgorithmthewinnersearchandwinnersneighbours’weightsmodificationareperformedon-lineineverylearningstep.Thelearningparametersupdateisrunonlyonceaftereachpresentationofallthelearningsetvectors(ateveryepoch).Theparallelalgorithmvariantsaregenerallybasedondivisionofeitherthelearningset(learningsetparallelization)orthenetwork(networkparallelization)betweentheprocessors[7].Thespeedupofparallelimplementationsutilizingon-line

ParallelProgramExecutionAnomalies357

algorithmislimitedbythefrequentcommunication.Tolowerthecommunicationoverhead,theoff-line(withweightsupdateperformedonceattheendofeachepoch)algorithmcanbeemployed.

4TheoreticalPerformanceModel

Toanalyzetheperformanceofthealgorithmitsperformancemodelshouldbeconstructed.Inthepara-graphsbelowtheperformancemodelofthesingleofflinealgorithmchosenforthefurtherevaluation—thenetworkofflineparallelization[4]ispresented.

Inthealgorithmthewinnersearchprocedureisdividedbetweenpprocessorsbyinstructingthemtofindlocalwinnersinptimessmallerpartofthenetwork(themini-network),approximatelyreducingthesearchtimebyafactorofp.Thewinnerpositionisdeterminedandrememberedineverystepandtheinformationtransferfollowedbyneighbors’weightsmodificationisperformedattheendofeveryepoch.

Theexecutiontimeoftheparallelalgorithmdependsonthecommunicationtimeandthetimeofcomputationsperformedoneachoftheutilizedprocessors.Inthenetworkofflineparallelizationalgorithmthecomputationtimeonthesingleprocessorconsistsofsearchingthewinningneuronandmodifyingthewinneranditssurroundingsweights.Thetimeofthewinnersearchoperationsislinearlydependantonthenumberofneurons.Thenumberofmodificationdependsontheepochindex.Theradiusofthesurroundingschangesinaccordancetotheformular(e)=Sa/(Sb+e),whereSa,Sbareaprioriselectedconstantsusedtocontrolthealgorithmbehaviorandeistheindexofthecurrentepoch.Itcanbeassumedthatfortypicalalgorithmexecutioneveryneuronhasanequalprobabilityofbeingthewinner.Theaveragenumberofneuronsinthewinnersurroundingss(e,n)isdependantontheradiusofthesurroundingsr(e)andthenetworksizen.Theprecisenumberofneuronsinthewinnersurroundingsdependsonthepositionofthewinnerinthenetworkandthenetworksizebutwhentheassumptionoftheequalprobabilityofbeingthewinneramongtheneuronsisassumed,s(e,n)hasanupperboundof(2r(e)+1)2.ForthenumberoflearningsetelementsequalN,itcanbeprovedthatthenumberof

󰀁E−1

neuronsmodifiedduringEepochsisdescribedbytheequationS(e,n)=Ne=0s(e,n).TheshapeoftheS(e,n)functionandtheexperimentalresultsfortheexecutionparametersutilizedforexperimentalevaluation(Sa=16,Sb=2,E=16,N=1152)ispresentedonfig.1.

Fig.1.Numberofmodificationsinnetworkofflineparallelization

TheruntimeofthesequentialSOMalgorithmTseqdependsonthenumberofneuronsn,thetimeoftheelementarysearchoperationτc,thenumberoflearningsetelementsNandthenumberofepochsE.Ifthetimeofmodifyingasingleneuronweightsisτsmandthenumberofneuronsinthewinner’ssurroundingsiss(e),thetimetmodduringwhichtheweightsofthewinneranditssurroundingneurons

󰀁E

aremodifiedisgivenbytheequationtmod=τsme=0s(e).TheresultingequationdescribingtheexecutiontimereadsTseq=nτcNE+tmodN

Intheparallelversionofthealgorithm—thenetworkofflineparallelization—thecommunicationphaseispresentaftereachepoch.Beforestartingcommunication,eachprocessorcomputesandstoresthe

358JanKwiatkowski,MarcinPawlik,DariuszKonieczny

vectorofwinnersfoundforeachofthelearningsetelements.Inthecommunicationphasethisvectorisdistributedbetweenprocessorsusingreductionoperation,resultingindeterminingoneglobalwinnerforeverylearningsetelement.Afterthisphaseeachprocessormodifiesthewinneranditssurroundingsweightsiftheyareinitsmini-network.TheparallelruntimeTpforthisalgorithmisgivenbytheequationTp=(EnτcN/p+EtmodN/p)+(Etr(N,p)),wheretr(N,p)isatimeforcommunicationwithreductionforNwinnersandpprocessors.Theshapeofthecurverepresentingthealgorithmparallelruntimeispresentedonfig.2.

Fig.2.Networkofflinealgorithmexecutiontime

Thenumberofoperationsneededforthewinnersearchisconstantwithrespecttothenumberofprocessors.Withthegrowthofthenumberofprocessors,thenumberofwinnersinthemini-networkofasingleprocessordecreases.Whenthecommunicationoverheadissmall(i.e.forsmallnumberofpro-cessors)thisdecreaseresultsinthedecreaseoftheoverallexecutiontime.Afteritreachesitsminimum,theraisingcommunicationoverheadstartstoresultintheexecutiontimegrowth.

5ExperimentalEvaluationoftheParallelSOMAlgorithm

Toconfirmthecorrectnessofthetheoreticalanalysispresentedinsection4theseriesofexperimentswasperformed.ThetestswereexecutedontheCumuluscluster[5]consistingof36homogenousIBMPCspace-sharedSempron1.7GHz,512MBRAMnodes.Theresultspresentedinthepaperarebasedonthevaluesfromtheseriesofatleastfivetestruns.Ifnotstateddifferently,theaveragedvaluesarepresented.

IntheexperimentsbasedonthefixedproblemtechniqueforlargeandmoderateSOMnetworksizes(48x48andabove)thecharacteristicsofapplicationefficiencycloselymatchedthetheoreticalpredictions.Forsmallernetworksizesthesuperlinearspeedupappeared(fig.3).Theparallelexecutionefficiencylargelyexceeding1suggeststhatforthenumberofnodesequalandlargerthan8theutilizationoftheprocessorcachespeedsupthememoryaccessoperations.Inthetestedalgorithm—thenetworkofflineparallelization—withthegrowingnumberofutilizedprocessorsthenetworksizeallocatedforoneprocessorgetsproportionallysmaller.Inthealgorithmimplementationasingleneuronisrepresentedbythevectorof6178-bytelongfields.TheCPUsutilizedhave128kBL1and256kBL2cachesizes.Ananalysisofthenetworksizevaluesgatheredinthetable1showsthatforthe24x24networktheallocatedmemorygetsclosetothecachesizewhenthenumberofprocessorsreaches8.For48x48networkthisvalueispresentfor32processors(thusthefinalefficiencyraise).When96x96andlargernetworksareused,evenforthelargesttestednumberofprocessors,thenetworksizeistoobigtocausethecacheutilizationrelatedefficiencygrowth.

ParallelProgramExecutionAnomalies359

Fig.3.EfficiencyoftheparallelSOMalgorithmTable1.NetworksizeinkBperprocessor

No.ofproc.

1832

Net.size48x48

111061388347

360JanKwiatkowski,MarcinPawlik,DariuszKonieczny

Fig.4.Executionefficiencyinrelationtothenumberofprocessors

Fig.5.Executionefficiencyinrelationtothedatasizeallocatedonasingleprocessor

Fig.6.Throughputinthecachearea

ParallelProgramExecutionAnomalies361

Fig.7.Throughputatthecachesizelimitsandmemoryarea

dependantanomalyiscausedbybettercachespaceutilization,resultinginahighercachehitratioandtheloweroverallmemoryaccesstime.

Intheseconddatasizearea,wherethedatasizestartsexceedingthecachesize(theupperthreecurvesonfig.7),thethroughputgetslowerwiththedatasizegrowth.Thisbehaviorisnaturalsincewiththedatagrowththeamountofdatathatmustbeaccessedfromthememoryraises,resultingingrowingnumberofcachemisses.Theupperthreecurvesonthefigure7presenttheresultsofasingletestrun.Inthisareathesoftwareenvironmentbehaviorcanchangetheprogramcacheutilizationefficiency,resultingintheexecutionenvironmentdependantanomalies.Whentheresultsareaveragedovertheseriesofexperiments,theseanomaliesareindistinguishable.

Thethirddatasizearea,representingthedatasizesmuchlargerthanthecachesize,showsmuchmoreconsistentthroughputbehavior(thelowersixcurvesonfig.7).Whenthedetailedbehaviorisanalyzed(fig.8)itcanbeseenthatlikeinthecacheareathedatasizegrowthresultsinthethroughputgrowth.Althoughthesourceofthiseffectisthesameasinthecacheareaitssignifficanceismuchsmallerherebecauseforlargerdatasizesmuchsmallerisalsotheinfluenceofthecacheutilization.Inthememoryutilizationarea,asshownintheanalysispresentedinsection4,thelargernumberofprocessorsthesmalleristhesumofcommunicationandmodificationtime.Thisalgorithmdependantanomalyconstitutesthesourceoftheinitialthroughputgrowthwhenthelargernumberofprocessorsisutilized.

6ConclusionsandFutureWork

Inthepaperthreetypesofexecutionanomalies—hardware,algorithmandexecutionenvironmentsde-pendant,wereidentifiedintheparallelSOMalgorithm.Utilizingthescalableproblemsizetechniqueitwaspossibletodescribethedependencebetweentheinputdatasizeandtheunexpectedalgorithmbehavior.Thistechniqueprovedalsotobeavaluabletoolthatcanbeutilizedtoselectthecorrectexecu-tionparameterstoavoidthehardwaredependantanomalies.Thelimitationsofthepapersizepermittedtopresentthediscussionbasedonlyonasinglealgorithm.Inthefutureworksthescalableproblemsizetechniqueutilizationtotheanalysisofbroaderclassofalgorithms,specificallytheoneswithanonlinearcomplexity,togetherwiththeapplicationofthistechniqueintothegranularitybasedevaluation[6]andperformancepredictionmethodswillbepresented.

Acknowklegements.ThisresearchwaspartiallysupportedbybytheEuropeanCommunityFrame-workProgramme6projectDeDiSys,contractNo004152.

References

1.FosterI.:DesigningandBuildingParallelPrograms,Addison-WesleyPub.,1995(alsoavailableathttp://www.mcs.anl.gov/dbpp/text/book.html).

362JanKwiatkowski,MarcinPawlik,DariuszKonieczny

Fig.8.Throughputinthememoryarea(details)

2.GramaA.,GuptaA.,KumarV.:Isoefficiency:MeasuringtheScalabilityofParallelAlgorithmsandArchitec-tures,IEEEParallel&DistributedTechnology,August1993,pp.12–21.

3.Kohonen,T.:TheSelf-OrganizingMap,ProceedingsofIEEE,1985,vol.73pp.1551–1558.

4.KwiatkowskiJ.,PawlikM,Konieczny.D.,Markowska-KaczmarU.:PerformanceEvaluationofDifferentKo-henenNetworkParallelizationTechniques,TobepublishedinProceedingsofParelec’06Intl.Conf.

5.KwiatkowskiJ.,PawlikM,WyrzykowskiR.,KarczewskiK.:Cumulus—DynamicClusterAvailableunderCLUSTERIX,Proc.ofCracowGridWorkshop2005,Cracow2006,pp.82-87.

6.KwiatkowskiJ.:EvaluationofParallelProgramsbyMeasurementofitsGranularity,ProceedingsofPPAM’01InternationalConferenceNaleczow,Poland,September9–122001,LNCS,Springer-VerlagBerlinHeidelberg2002,pp.145–153.

7.LawrenceR.,AlmasiG.S.,andRushmeierH.E.:Ascalableparallelalgorithmforselforganizingmapswithapplicationstosparsedataminingproblems,1999,DataMiningandKnowledgeDiscovery,vol.III,pp.171–95.8.SahniS.,ThanvantriV.:PerformanceMetrics:KeepingtheFocusonRuntime,IEEEParallel&DistributedTechnology,spring1996,pp.43–56.

因篇幅问题不能全部显示,请点此查看更多更全内容