Complex Object Querying and Data Science
Supervisor
Suitable for
Abstract
"We will look at query languages for transforming
nested collections (collections that might contain collections).
Such languages can be useful for preparing large scale feature
data for machine learning algorithms. We have a basic implementation
of such a language that we implement on top of the big-data framework Spark. The goal of the project is to extend the
language
with iteration. One goal will be to look at how to adapt processing techniques for nested data to support iteration.
Another, closer to application is to utilize iteration to support additional steps of a data science pipeline,
such as sampling. "