The length of the document below is: 10 page(s) long
The self-declared author(s) is/are:
cse.buffalo.edu
The subject is as follows:
Subject: Original authors did not specify.
The original URL is: LINK
The access date was:
Access date: 2019-05-31 19:12:37.519539
Please be aware that this may be under copyright restrictions. Please send an email to admin@pharmacoengineering.com for any AI-generated issues.
The content is as follows:
File-AccessPatternsofData-IntensiveWw
ApplicationsandtheirImplicationstoDistributed
Filesystems
TakeshiShibata
UniversityofTokyo
DepartmentofInformationand
CommunicationEngineering
GraduateSchoolof
InformationScienceand
Technology
shibata@logos.ic.i.u-
tokyo.ac.jp
SungJunChoi
UniversityofTokyo
DepartmentofInformationand
CommunicationEngineering
GraduateSchoolof
InformationScienceand
Technology
demyan@logos.ic.i.u-
tokyo.ac.jp
KenjiroTaura
UniversityofTokyo
DepartmentofInformationand
CommunicationEngineering
GraduateSchoolof
InformationScienceand
Technology
tau@logos.ic.i.u-
tokyo.ac.jp
ABSTRACT
Thispaperstudiesereal-worlddataintensiveworkow
applicationsinthesofnaturallanguageprocessing,as-
tronomyimageanalysis,andwebdataanalysis.Datain-
tensivewoowsareincreasinglybecomingimportantap-
plicationsforclusterandGridenvironments.Theyopen
newchallengestovariouscomponentsofwoowexecution
environmentsincludingjobdispatchers,schedulers,esys-
tems,andstagingtools.Thekeystoachievinghighper-
formanceareientdatasharingamongexecutinghosts
andlocality-awareschedulingthatreducestheamountof
datatransfer.Whilemuchworkhasbeendoneonschedul-
ingwoows,manyofthemusesyntheticorrandomwork-
load.Assuch,theirimpactsonrealworkloadsarelargely
unknown.Understandingcharacteristicsofreal-worldwork-
owapplicationsisarequiredsteptopromoteresearchin
thisarea.Tothisend,weanalysereal-worldwoowappli-
cationsfocusingontheiraccesspatternsandsummarize
theirimplicationstoschedulersandesystem/stagingde-
signs.
Keywords
woowapplications,distributedesystems
1.INTRODUCTION
Worw
facilitatesintegrationofindividuallydeveloped
executables,makingparallelprocessingreadilyaccessibleto
domainexperts.Thusithasbecomeanimportantdisci-
plineinvarioussincludingnaturalscience,engineering,
andinformationprocessing.Manysystemsforexecuting
woowshavebeendeveloped[1,2,3].Morerecently,
programmingparadigmsandsystemsspcallydesigned
forlargedataprocessingsuchasMapReduce[4],Hadoop
1
,
andDryad[5]
2
madeitpopulartoutilizeparallelprocess-
ingwithoutaninvolvedrtofparallelprogramming.An
obviouscommongoalofthesesystemsisientexecution
ofwoows.Tothisend,therehavebeensonvarious
componentsofworkowenginesincludingschedulingalgo-
rithms[6,7,8,9,10,11,12],datatransfers[13,14],and
fastdispatchers[15].Therearesfocusingonschedul-
ingwithdatatransfercoststakenintoaccount[16,17,18,
19]Agoodsurveyonschedulingalgorithmisin[20].
Despitetheirimportance,practicalevaluationofwoow
systemshavebeenrareandremaindiult,mainlydueto
lackofcommonlyaccessiblebenchmarks.Evenwithareal
application,translatingtheresultonaparticularplatform
intoagenerallyacceptableobservationonworkowsystems
isdiultbecauseperformanceofawoowdependson
somanycomponentsoftheenvironmentsuchasnodes,net-
works,systems,andsoon.Thisisparticularlysobe-
causewoowstypicallyconsistofmanysequentialexe-
cutableseachofwhichmayhaveunknownsensitivitiesto
theirenvironments.Mostexistingstudiesonschedulingal-
gorithmsthereforehavebeenbasedonsimulationwithsyn-
theticworkloadssuchasrandomlygeneratedtaskgraphs.
Bharathietal.[21]isoneofthefewstudiesthatsystemat-
icallycharacterizesseveralrealworkowapplications.The
presentpapersharesthesamespiritastheirs,butpaysa
specialattentiontoIOeaccess)behaviorsofapplications.
Asingleworkowgenerallyconsistsofasetoftaskseach
ofwhichmaycommunicatewith(i.e.dependsonorisde-
pendedupon)anothertaskinthewoow.Sinceatask
istypicallyasequential(singlenode)application,adata
transferamongtasksisgenerallyhandledbythewoow
system.Datamaybeimplicitlytransferredviaashared
esystemorexplicitlymovedbyastagingsubsystem.Ei-
thercase,theygenerallygothroughasecondarystorageto
ensureacertaindegreeoffaulttolerance|thatawoow
1
2
Please note all content on this page was automatically generated via our AI-based algorithm (BishopKingdom ID: 1xQFYHJIXvt4gZ9Yq5vi). Please let us know if you find any errors.