ReStore: Reusing Results of MapReduce Jobs

Iman Elghandour ( Alexandria University and Université Libre de Bruxelles )

24Jul
11:30 24th July 2018 ( Trinity Term 2018 )
051

Performing data analytics on huge data sets have been very crucial for many enterprises. It was facilitated by the introduction of the MapReduce programming and execution model and other more recent distributed computing platforms. MapReduce users often have analysis tasks that are too complex to express as individual MapReduce jobs, and therefore they use high-level query languages such as Pig or Hive to express their complex tasks. The compilers of these languages translate queries into workflows of MapReduce jobs. In my talk, I will present ReStore, a system that manages the storage and reuse of intermediate results between MapReduce jobs in such workflows. At the end of the talk, I will briefly discuss related ideas for optimizing queries executed on other distributed computing platforms.

Seminar Series

Data, Knowledge and Action Seminar

Coordinators

Michael Benedikt

ReStore: Reusing Results of MapReduce Jobs

Seminar Series

See also

Coordinators

News & Events