Here is your pdf: File-access patterns of data-intensive workflow applications and their implications to distributed filesystems

The length of the document below is: 10 page(s) long

The self-declared author(s) is/are:
cse.buffalo.edu

The subject is as follows:
Subject: Original authors did not specify.

The original URL is: LINK

The access date was:
Access date: 2019-05-31 19:12:37.519539

Please be aware that this may be under copyright restrictions. Please send an email to admin@pharmacoengineering.com for any AI-generated issues.

Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab

The content is as follows:

File-AccessPatternsofData-IntensiveWw

ApplicationsandtheirImplicationstoDistributed

Filesystems

TakeshiShibata

UniversityofTokyo

DepartmentofInformationand

CommunicationEngineering

GraduateSchoolof

InformationScienceand

Technology

shibata@logos.ic.i.u-

tokyo.ac.jp

SungJunChoi

UniversityofTokyo

DepartmentofInformationand

CommunicationEngineering

GraduateSchoolof

InformationScienceand

Technology

demyan@logos.ic.i.u-

tokyo.ac.jp

KenjiroTaura

UniversityofTokyo

DepartmentofInformationand

CommunicationEngineering

GraduateSchoolof

InformationScienceand

Technology

tau@logos.ic.i.u-

tokyo.ac.jp

ABSTRACT

Thispaperstudiesereal-worlddataintensiveworkow

applicationsinthesofnaturallanguageprocessing,as-

tronomyimageanalysis,andwebdataanalysis.Datain-

tensivewoowsareincreasinglybecomingimportantap-

plicationsforclusterandGridenvironments.Theyopen

newchallengestovariouscomponentsofwoowexecution

environmentsincludingjobdispatchers,schedulers,esys-

tems,andstagingtools.Thekeystoachievinghighper-

formanceareientdatasharingamongexecutinghosts

andlocality-awareschedulingthatreducestheamountof

datatransfer.Whilemuchworkhasbeendoneonschedul-

ingwoows,manyofthemusesyntheticorrandomwork-

load.Assuch,theirimpactsonrealworkloadsarelargely

unknown.Understandingcharacteristicsofreal-worldwork-

owapplicationsisarequiredsteptopromoteresearchin

thisarea.Tothisend,weanalysereal-worldwoowappli-

cationsfocusingontheiraccesspatternsandsummarize

theirimplicationstoschedulersandesystem/stagingde-

signs.

Keywords

woowapplications,distributedesystems

1.INTRODUCTION

Worw

facilitatesintegrationofindividuallydeveloped

executables,makingparallelprocessingreadilyaccessibleto

domainexperts.Thusithasbecomeanimportantdisci-

plineinvarioussincludingnaturalscience,engineering,

andinformationprocessing.Manysystemsforexecuting

woowshavebeendeveloped[1,2,3].Morerecently,

programmingparadigmsandsystemsspcallydesigned

forlargedataprocessingsuchasMapReduce[4],Hadoop

1

,

andDryad[5]

2

madeitpopulartoutilizeparallelprocess-

ingwithoutaninvolvedrtofparallelprogramming.An

obviouscommongoalofthesesystemsisientexecution

ofwoows.Tothisend,therehavebeensonvarious

componentsofworkowenginesincludingschedulingalgo-

rithms[6,7,8,9,10,11,12],datatransfers[13,14],and

fastdispatchers[15].Therearesfocusingonschedul-

ingwithdatatransfercoststakenintoaccount[16,17,18,

19]Agoodsurveyonschedulingalgorithmisin[20].

Despitetheirimportance,practicalevaluationofwoow

systemshavebeenrareandremaindiult,mainlydueto

lackofcommonlyaccessiblebenchmarks.Evenwithareal

application,translatingtheresultonaparticularplatform

intoagenerallyacceptableobservationonworkowsystems

isdiultbecauseperformanceofawoowdependson

somanycomponentsoftheenvironmentsuchasnodes,net-

works,systems,andsoon.Thisisparticularlysobe-

causewoowstypicallyconsistofmanysequentialexe-

cutableseachofwhichmayhaveunknownsensitivitiesto

theirenvironments.Mostexistingstudiesonschedulingal-

gorithmsthereforehavebeenbasedonsimulationwithsyn-

theticworkloadssuchasrandomlygeneratedtaskgraphs.

Bharathietal.[21]isoneofthefewstudiesthatsystemat-

icallycharacterizesseveralrealworkowapplications.The

presentpapersharesthesamespiritastheirs,butpaysa

specialattentiontoIOeaccess)behaviorsofapplications.

Asingleworkowgenerallyconsistsofasetoftaskseach

ofwhichmaycommunicatewith(i.e.dependsonorisde-

pendedupon)anothertaskinthewoow.Sinceatask

istypicallyasequential(singlenode)application,adata

transferamongtasksisgenerallyhandledbythewoow

system.Datamaybeimplicitlytransferredviaashared

esystemorexplicitlymovedbyastagingsubsystem.Ei-

thercase,theygenerallygothroughasecondarystorageto

ensureacertaindegreeoffaulttolerance|thatawoow

1

http://hadoop.apache.org/

2

Please note all content on this page was automatically generated via our AI-based algorithm (BishopKingdom ID: 1xQFYHJIXvt4gZ9Yq5vi). Please let us know if you find any errors.