ReStore: Reusing Results of MapReduce Jobs
- 11:30 24th July 2018 ( Trinity Term 2018 )051
Performing data analytics on huge data sets have been very crucial for many enterprises. It was facilitated by the introduction of the MapReduce programming and execution model and other more recent distributed computing platforms. MapReduce users often have analysis tasks that are too complex to express as individual MapReduce jobs, and therefore they use high-level query languages such as Pig or Hive to express their complex tasks. The compilers of these languages translate queries into workflows of MapReduce jobs. In my talk, I will present ReStore, a system that manages the storage and reuse of intermediate results between MapReduce jobs in such workflows. At the end of the talk, I will briefly discuss related ideas for optimizing queries executed on other distributed computing platforms.