The number of pages within the document is: 12
The self-declared author(s) is/are:
Original authors did not specify.
The subject is as follows:
Original authors did not specify.
The original URL is: LINK
The access date was:
2019-02-19 14:46:31.890471
Please be aware that this may be under copyright restrictions. Please send an email to admin@pharmacoengineering.com for any AI-generated issues.
The content is as follows:
InProceedingsofthe37thInternationalSymposiumonMicroarchitecture,December,2004 BalancedMultithreading:IncreasingThroughputviaa LowCostMultithreadingHierarchy EricTuneRakeshKumarDeanM.TullsenBradCalder ComputerScienceandEngineeringDepartment UniversityofCaliforniaatSanDiego etune,rakumar,tullsen,calder @cs.ucsd.eduAbstractAsimultaneousmultithreading(SMT)processorcanissue instructionsfromseveralthreadseverycycle,allowingitto effectivelyhidevariousinstructionlatencies;thiseffectin- creaseswiththenumberofsimultaneouscontextssupported. However,eachaddedcontextonanSMTprocessorincursa costincomplexity,whichmayleadtoanincreaseinpipeline lengthoradecreaseinthemaximumclockrate.Thispa- perpresentsnewdesignsformultithreadedprocessorswhich combineaconservativeSMTimplementationwithacoarse- grainedmultithreadingcapability.Bypresentingmorevirtual contextstotheoperatingsystemanduserthanaresupported inthecorepipeline,thenewdesignscantakeadvantageofthe memoryparallelismpresentinworkloadswithmanythreads, whileavoidingtheperformancepenaltiesinherentinamany- contextSMTprocessordesign.Adesignwith4virtualcon- texts,butwhichisbasedona2-contextSMTprocessorcore, gainsanadditional26%throughputwhen4threadsarerun together. 1Introduction Theratiobetweenmainmemoryaccesstimeandcore clockratescontinuestogrow.Asaresult,aprocessor pipelinemaybeidleduringmuchofaprogramsexecution.A multithreadingprocessorcanmaintainahighthroughputde- spitealargerelativememorylatenciesbyexecutinginstruc- tionsfromseveralprograms.Manymodelsofmultithreading havebeenproposed.Theycanbecategorizedbyhowclose togetherintimeinstructionsfromdifferentthreadsmaybeex- ecuted,whichaffectshowthestatefordifferentthreadsmust bemanaged.SimultaneousMultithreading[31,30,12,33] (SMT)istheleastrestrictivemodel,inthatinstructionsfrom multiplethreadscanexecuteinthesamecycle.Thisßexibil- ityallowsanSMTprocessortohidestallsinonethreadby executinginstructionsfromotherthreads.However,theßex- ibilityofSMTcomesatacost.TheregisterÞleandrename tablesmustbeenlargedtoaccommodatethearchitecturalreg- istersoftheadditionalthreads.Thisinturncanincreasethe clockcycletimeand/orthedepthofthepipeline. Coarse-grainedmultithreading(CGMT)[1,21,26]isa morerestrictivemodelwheretheprocessorcanonlyexecute instructionsfromonethreadatatime,butwhereitcanswitch toanewthreadafterashortdelay.ThismakesCGMTsuited forhidinglongerdelays.Soon,general-purposemicropro- cessorswillbeexperiencingdelaystomainmemoryof500 ormorecycles.Thismeansthatacontextswitchinresponse toamemoryaccesscantaketensofcyclesandstillprovide aconsiderableperformancebeneÞt.PreviousCGMTdesigns reliedonalargerregisterÞletoallowfastcontextswitches, whichwouldlikelyslowdowncurrentpipelinedesignsand interferewithregisterrenaming.Instead,wedescribeanew implementationofCGMTwhichdoesnotaffectthesizeor designoftheregisterÞleorrenamingtable. WeÞndthatCGMTalone,triggeredonlybymain- memoryaccesses,providesunimpressiveincreasesinper- formancebecauseitcannothidetheeffectofshorterstalls inasinglethread.However,CGMTandSMTcomplement eachotherverywell.Adesignwhichcombinesbothtypesof multithreadingprovidesabalancebetweensupportforhiding longandshortstalls,andabalancebetweenhighthroughput andhighsingle-threadperformance.Wecallthiscombina- tionoftechniques BalancedMultithreading (BMT).Thiscombinationofmultithreadingmodelscanbecom- paredtoacachehierarchy,whichresultsina multithreading hierarchy .Thelowestlevelofmultithreading(SMT)issmall (fewercontexts),fast,expensive,andcloselytiedtothepro- cessorcycletime.Thenextlevelofmultithreading(CGMT) isslower,potentiallylarger(fewerlimitstothenumberof contextsthatcanbesupported),cheaper,andhasnoimpact onprocessorcycletimeorpipelinedepth. Inourdesign,theoperatingsystemseesmore virtualcon- texts thanaresupportedinthecorepipeline.Thesevirtual contextsarecontrolledbyamechanismtoquicklyswitchbe- tweenthreadsonlonglatencyloadmisses.Themethodwe 1
Please note all content on this page was automatically generated via our AI-based algorithm (gDLIc2IvHlRR3MMlFUJv). Please let us know if you find any errors.