Here is your PDF: paper.dvi; Keywords: inproceedingsofthe37thinternationalsymposiumonmicroarchitecture,december,2004 balancedmultithreading:increasingthroughputviaa lowcostmultithreadinghierarchy erictunerakeshkumardeanm.tullsenbradcalder computerscienceandengineeringdepartment universityofcaliforniaatsandiego etune,rakumar,tullsen,calder

The number of pages within the document is: 12

The self-declared author(s) is/are:
Original authors did not specify.

The subject is as follows:
Original authors did not specify.

The original URL is: LINK

The access date was:
2019-02-19 14:46:31.890471

Please be aware that this may be under copyright restrictions. Please send an email to admin@pharmacoengineering.com for any AI-generated issues.

Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

The content is as follows:
InProceedingsofthe37thInternationalSymposiumonMicroarchitecture,December,2004 BalancedMultithreading:IncreasingThroughputviaa LowCostMultithreadingHierarchy EricTuneRakeshKumarDeanM.TullsenBradCalder ComputerScienceandEngineeringDepartment UniversityofCaliforniaatSanDiego etune,rakumar,tullsen,calder @cs.ucsd.eduAbstractAsimultaneousmultithreading(SMT)processorcanissue instructionsfromseveralthreadseverycycle,allowingitto effectivelyhidevariousinstructionlatencies;thiseffectin- creaseswiththenumberofsimultaneouscontextssupported. However,eachaddedcontextonanSMTprocessorincursa costincomplexity,whichmayleadtoanincreaseinpipeline lengthoradecreaseinthemaximumclockrate.Thispa- perpresentsnewdesignsformultithreadedprocessorswhich combineaconservativeSMTimplementationwithacoarse- grainedmultithreadingcapability.Bypresentingmorevirtual contextstotheoperatingsystemanduserthanaresupported inthecorepipeline,thenewdesignscantakeadvantageofthe memoryparallelismpresentinworkloadswithmanythreads, whileavoidingtheperformancepenaltiesinherentinamany- contextSMTprocessordesign.Adesignwith4virtualcon- texts,butwhichisbasedona2-contextSMTprocessorcore, gainsanadditional26%throughputwhen4threadsarerun together. 1Introduction Theratiobetweenmainmemoryaccesstimeandcore clockratescontinuestogrow.Asaresult,aprocessor pipelinemaybeidleduringmuchofaprogramsexecution.A multithreadingprocessorcanmaintainahighthroughputde- spitealargerelativememorylatenciesbyexecutinginstruc- tionsfromseveralprograms.Manymodelsofmultithreading havebeenproposed.Theycanbecategorizedbyhowclose togetherintimeinstructionsfromdifferentthreadsmaybeex- ecuted,whichaffectshowthestatefordifferentthreadsmust bemanaged.SimultaneousMultithreading[31,30,12,33] (SMT)istheleastrestrictivemodel,inthatinstructionsfrom multiplethreadscanexecuteinthesamecycle.Thisßexibil- ityallowsanSMTprocessortohidestallsinonethreadby executinginstructionsfromotherthreads.However,theßex- ibilityofSMTcomesatacost.TheregisterÞleandrename tablesmustbeenlargedtoaccommodatethearchitecturalreg- istersoftheadditionalthreads.Thisinturncanincreasethe clockcycletimeand/orthedepthofthepipeline. Coarse-grainedmultithreading(CGMT)[1,21,26]isa morerestrictivemodelwheretheprocessorcanonlyexecute instructionsfromonethreadatatime,butwhereitcanswitch toanewthreadafterashortdelay.ThismakesCGMTsuited forhidinglongerdelays.Soon,general-purposemicropro- cessorswillbeexperiencingdelaystomainmemoryof500 ormorecycles.Thismeansthatacontextswitchinresponse toamemoryaccesscantaketensofcyclesandstillprovide aconsiderableperformancebeneÞt.PreviousCGMTdesigns reliedonalargerregisterÞletoallowfastcontextswitches, whichwouldlikelyslowdowncurrentpipelinedesignsand interferewithregisterrenaming.Instead,wedescribeanew implementationofCGMTwhichdoesnotaffectthesizeor designoftheregisterÞleorrenamingtable. WeÞndthatCGMTalone,triggeredonlybymain- memoryaccesses,providesunimpressiveincreasesinper- formancebecauseitcannothidetheeffectofshorterstalls inasinglethread.However,CGMTandSMTcomplement eachotherverywell.Adesignwhichcombinesbothtypesof multithreadingprovidesabalancebetweensupportforhiding longandshortstalls,andabalancebetweenhighthroughput andhighsingle-threadperformance.Wecallthiscombina- tionoftechniques BalancedMultithreading (BMT).Thiscombinationofmultithreadingmodelscanbecom- paredtoacachehierarchy,whichresultsina multithreading hierarchy .Thelowestlevelofmultithreading(SMT)issmall (fewercontexts),fast,expensive,andcloselytiedtothepro- cessorcycletime.Thenextlevelofmultithreading(CGMT) isslower,potentiallylarger(fewerlimitstothenumberof contextsthatcanbesupported),cheaper,andhasnoimpact onprocessorcycletimeorpipelinedepth. Inourdesign,theoperatingsystemseesmore virtualcon- texts thanaresupportedinthecorepipeline.Thesevirtual contextsarecontrolledbyamechanismtoquicklyswitchbe- tweenthreadsonlonglatencyloadmisses.Themethodwe 1

Please note all content on this page was automatically generated via our AI-based algorithm (gDLIc2IvHlRR3MMlFUJv). Please let us know if you find any errors.