Data Integration over NoSQL Stores Using Access Path Based Mappings

Robin Hecht, Olivier Curé, Chan Le Duc, Myriam Lamolle

Due to the large amount of data generated by user interactions on the Web, some companies are currently innovating in the domain of data management by designing their own systems. Many of them are referred to as NoSQL databases, standing for 'Not only SQL'. Their popularity is such that non-web oriented application domains are considering their utilization, e.g. in finance and science. Hence, with their wide adoption will emerge new needs and we consider that integrating data from different NoSQL systems is one of them. In this paper, we adapt a framework encountered for the integration of relational data to a broader context where both NoSQL and relational databases can be integrated. One important extension consists in the efficient answering of queries expressed over these data sources. Typically, queries are expressed over a target schema and translated to the query language fitting each data source. The highly denormalized aspect of NoSQL databases results in varying performance costs for several possible query translations. Thus a data integration targeting NoSQL databases needs to generate an optimized translation for a given query. Our contributions are to propose (i) an access path based mapping solution that takes benefit of the design choices of each data source, (ii) integrate preferences to handle conflicts between sources and (iii) a query language that bridges the gap between the SQL query expressed by the user and the query language of the data sources. We also present a prototype implementation, where the target schema is represented as a set of relations and which enables the integration of two of the most popular NoSQL database models, namely document and a column family stores.

Conference: 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)
Year: 2011
Location: Toulouse, France

