DDIA - Data Models and Query Languages

  1. Overview

  2. Relational Model vs Document Model

  3. Query Languages for Data

  4. Graph-like Data Models

1 Overview

Data models are probably the most important part of devloping software for their profound effect on not only how the software is written, but also on how we thinkg about the problem that we are solving. Each layer of the applications is built by one data model on top of another, and each layer hides the complexity of the layers below it by providing a clean data model (abstraction), which allows different groups of people to work together effectively.d

This article contains

  • Some general-purpose data modesl for data storage and querying. For example, the relational model, the document model, and graph-based data models.

  • Different query languages and their use cases

2 Relational Model vs Document Model

Model Type Relational Model Document Model
Description Data is organized into relations (e.g., tables in SQL), where each relation is an unordered collection of tuples (e.g., rows in SQL). NoSQL (not only SQL) databases provide a broad variety of non-relational datastores (polyglot persistence) alongside the usage of relational databases.
Driving Forces Transaction processing and batch processing for business data such as banking transactions, customer invoicing, reporting and so on. 1. Greater scalability for large datasets and high write throughput.
2. Preference for open-source software over commercial database products.
3. Specialized query operations that are currently not well supported in relational databases.
4. The restrictiveness of relational schemas and the desire for a more dynamic and expressive data model.
Examples SQL MongoDB, RethinkDB, CouchDB, Espresso
Advantages 1. Hide the implementation detail of the internal representation of the data in the database behind a cleaner interface.
2. Generalize well from its original use case on business data processing.
3. Better support for joins, many-to-one and many-to-many relationships.
1. Schema flexibility: schema-on-read (the structure of data is implicit, and only interpreted when the data is read) is similar to dynamic (runtime) type checking in programming language. When an application wants to change the format of its data, document databases just need to start writing new documents with the new fields, and have code in the application to handle the case when old documents are read.
2. Better performance due to locality: there is a performance advantage of storage locality when application needs to access the entire document. The advantage often applies when large parts of the document are needed at the same time. So it is generally recommended to keep documents small and avoid writes that increase the size of a document.3. Closer to data structures used by the application
Disadvantages 1. Impedance mismatch: if the data is stored in relational tables, a translation layer is required between the objects in the application code and the database model of tables, rows, and columns. 1. Support for joins is often weak, although joins are not needed for one-to-many structures.
2. Cannot refer directly to a nested item within a document, for example, one needs to say “the second item in the list of positions for user 251”.
3. Data has a tendency of becoming more interconnected as features are added to applications (e.g., add a recommendation feature for each use (document)).

Convergence of relational and document databases

  1. The relational and document databases are becoming more similar over time so applications can use the combination of features that best fits their needs: Most relational databases have supported XML and allow for local modifications and index/query inside the XML documents. More document databases support relational-like joins in its query languages.

  2. When representing many-to-one and many-to-many relationships, relational databases and document databases are no fundamental different: the related item is referenced by a unique identifier (foreign key in relational data model, document reference in document model).

    3 Query Languages for Data

SQL is a declarative lauguage, comared with the imperative queries, it has the following advantages:

  • It is typically more concise and easier to work with than an imperative API

  • It has the implemenation details of the database engine, which makes it possible to introduce performance improvements without requiring any changes of queries.

  • The fact that SQL is more limited in functionality gives the database more room for automatic optimizations.

  • Declarative lanauges have a better chance of getting faster in parallel execution because they only specify the patterns of the results, not the algorithm that is used to determine the results.

4 Graph-like Data Models

It becomes natural to use graph for data modeling for the many-to-many relationships. Examples include the scocial graphs, the web graphs or the railroad networks. The graph databases are targeting use cases where anything is potentially related to everything.

Two kinds of objects for a graph are:

  • Vertices: nodes or entities.

  • Edges: relations or arcs.

The graphs can be used for not only the homogeneous data, but also for data with completely different ypes of objects.

  • Graphs can express things that would be difficult for traditional relational data, for example, different kinds of regional structures in different countries, or varying granularity of data (e.g., current residences is a city, but place of birth is a state).

  • Graphs are flexible for evolvability: when new features are added to the application, a graph can easily extended to accomondate changes to the data structures.