Beyond relational: «neural» DBMS?

BEYOND RELATIONAL: «NEURAL» DBMS? Roberto Reale @ Italian Association for Machine Learning 10 Apr 2019



F. Codd, E. (1970). A Relational Model of Data for Large Shared Data Banks. Commun. ACM. 13. 377-387.



Kraska, T., Beutel, A., Chi, E.H., Dean, J. and Polyzotis, N., (2017). The Case for Learned Index Structures. arXiv preprint arXiv:1712.01208.

RELATIONAL MODEL Can be expressed in first-order predicate logic Data is represented as tuples, grouped into relations Abstraction from physical storage model

INDEX STRUCTURES Needed for efficient data access B-Trees, Hash maps, Bloom filters, ...

Need tuning General data structures, do not take advantage of data patterns

ENTER MACHINE LEARNING Replacing core components of a data management system through learned models

Traditional indexes are already models For efficiency reasons it is common not to index every single key of the sorted records, rather only the key of every n-th record

Using other types of models as indexes can provide benefits

INDEXES ARE CDF MODELS An index is a model that takes a key as an input and predicts the position of the record

A model that predicts the position given a key inside a sorted array approximates the cumulative distribution function F(Key) is the estimated cumulative distribution function for the data to estimate the likelihood to observe a key smaller or equal to the lookup key

ISSUES... Decision trees in general, are really good in overfitting the data with a few operations

A single neural net requires significantly more space and CPU time for the â&#x20AC;&#x153;last mileâ&#x20AC;? B-Trees are extremely cache- and operation-efficient

THE LEARNING INDEX FRAMEWORK (LIF) Given a trained Tensorflow model, LIF automatically extracts all weights from the model and generates efficient index structures in C++ Designed for small models No unnecessary overhead

THE RECURSIVE MODEL INDEX Challenge: accuracy for last-mile search We build a hierarchy of models Each model takes the key as an input and based on it picks another model

THE RECURSIVE MODEL INDEX, 2 We iteratively train each stage with loss Lℓ We separate model size and complexity from execution cost We effectively divide the space into smaller sub-ranges to make it easier to achieve the required “last mile” accuracy

HYBRID MODELS Top-layer: rectified linear unit (ReLU) neural net At the bottom: thousands of simple, inexpensive linear regression models Traditional B-Trees at the bottom if the data is particularly hard to learn

DOES THIS STUFF WORK? Simple NNs can be efficiently trained using stochastic gradient descent

A closed form solution exists for linear multi-variate models The results are promising, but â&#x20AC;&#x153;learned indexesâ&#x20AC;? might not be the best choice in every use case A new way to think about indexing

ROBERTO@REALE.ME

Turn static files into dynamic content formats.

Create a flipbook