VADA: Value Added Data Systems -- Principles and Architecture
Data is everywhere, generated by increasing numbers of applications, devices and users, with few or no guarantees
on the format, semantics, and quality. The economic potential of data-driven innovation is enormous, estimated to reach as
much as £40B in 2017, by the Centre for Economics and Business Research. To realise this potential, and to provide meaningful
data analyses, data scientists must first spend a significant portion of their time (estimated as 50% to 80%) on "data wrangling"
- the process of collection, reorganising, and cleaning data.
This heavy toll is due to what is referred
as the four V's of big data: Volume - the scale of the data, Velocity - speed of change, Variety - different forms of data,
and Veracity - uncertainty of data. There is an urgent need to provide data scientists with a new generation of tools that
will unlock the potential of data assets and significantly reduce the data wrangling component. As many traditional tools
are no longer applicable in the 4 V's environment, a radical paradigm shift is required. The proposal aims at achieving this
paradigm shift by adding value to data, by handling data management tasks in an environment that is fully aware of data and
user contexts, and by closely integrating key data management tasks in a way not yet attempted, but desperately needed by
many innovative companies in today's data-driven economy.
The VADA research programme will define
principles and solutions for Value Added Data Systems, which support users in discovering, extracting, integrating, accessing
and interpreting the data of relevance to their questions. In so doing, it uses the context of the user, e.g. requirements
in terms of the trade-off between completeness and correctness, and the data context, e.g., its availability, cost, provenance
and quality. The user context characterises not only what data is relevant, but also the properties it must exhibit to be
fit for purpose. Adding value to data then involves the best efort provision of data to users, along with comprehensive information
on the quality and origin of the data provided. Users can provide feedback on the results obtained, enabling changes to all
data management tasks, and thus a continuous improvement in the user experience.
Establishing the
principles behind Value Added Data Systems requires a revolutionary approach to data management, informed by interlinked research
in data extraction, data integration, data quality, provenance, query answering, and reasoning. This will enable each of these
areas to benefit from synergies with the others. Research has developed focused results within such sub-disciplines; VADA
develops these specialisms in ways that both transform the techniques within the sub-disciplines and enable the development
of architectures that bring them together to add value to data.
The commercial importance of the research
area has been widely recognised. The VADA programme brings together university researchers with commercial partners who are
in desperate need of a new generation of data management tools. They will be contributing to the programme by funding research
staff and students, providing substantial amounts of staff time for research collaborations, supporting internships, hosting
visitors, contributing challenging real-life case studies, sharing experiences, and participating in technical meetings. These
partners are both developers of data management technologies (LogicBlox, Microsoft, Neo) and data user organisations in healthcare
(The Christie), e-commerce (LambdaTek, PricePanda), finance (AllianceBernstein), social networks (Facebook), security (Horus),
smart cities (FutureEverything), and telecommunications (Huawei).
EPSRC funding link: http://gow.epsrc.ac.uk/NGBOViewGrant.aspx?GrantRef=EP/M025268/1
Selected Publications
-
Complexity Results for Preference Aggregation over (m)CP−nets: Pareto and Majority Voting
Thomas Lukasiewicz and Enrico Malizia
In Artificial Intelligence. Vol. 272. Pages 101–142. July, 2019.
Details about Complexity Results for Preference Aggregation over (m)CP−nets: Pareto and Majority Voting | BibTeX data for Complexity Results for Preference Aggregation over (m)CP−nets: Pareto and Majority Voting | Link to Complexity Results for Preference Aggregation over (m)CP−nets: Pareto and Majority Voting
-
A Tutorial on Query Answering and Reasoning over Probabilistic Knowledge Bases
İsmail İlkan Ceylan and Thomas Lukasiewicz
In Claudia d'Amato and Martin Theobald, editors, Reasoning Web. Learning‚ Uncertainty‚ Streaming‚ and Scalability — 14th International Summer School 2018‚ Esch−sur−Alzette‚ Luxembourg‚ September 22−26‚ 2018‚ Tutorial Lectures. Vol. 11078 of Lecture Notes in Computer Science. Pages 35–77. Springer. August, 2018.
Details about A Tutorial on Query Answering and Reasoning over Probabilistic Knowledge Bases | BibTeX data for A Tutorial on Query Answering and Reasoning over Probabilistic Knowledge Bases | Link to A Tutorial on Query Answering and Reasoning over Probabilistic Knowledge Bases
-
Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks
Daniel Vasile and Thomas Lukasiewicz
In Hervé Panetto‚ Christophe Debruyne‚ Henderik A. Proper‚ Claudio Agostino Ardagna‚ Dumitru Roman and Robert Meersman, editors, On the Move to Meaningful Internet Systems. OTM 2018 Conferences: Confederated International Conferences: CoopIS‚ C&TC‚ and ODBASE 2018‚ Valletta‚ Malta‚ October 23−24‚ 2018. Vol. 11230 of Lecture Notes in Computer Science. Pages 315−332. Springer. October, 2018.
Details about Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks | BibTeX data for Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks | Link to Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks