Non-relational data and NoSQL - Azure Architecture Center (2023)

A non-relational database is a database that does not use the tabular schema of rows and columns found in most traditional database systems. Instead, non-relational databases use a storage model that is optimized for the specific requirements of the type of data being stored. For example, data may be stored as simple key/value pairs, as JSON documents, or as a graph consisting of edges and vertices.

What all of these data stores have in common is that they don't use a relational model. Also, they tend to be more specific in the type of data they support and how data can be queried. For example, time series data stores are optimized for queries over time-based sequences of data. However, graph data stores are optimized for exploring weighted relationships between entities. Neither format would generalize well to the task of managing transactional data.

The term NoSQL refers to data stores that do not use SQL for queries. Instead, the data stores use other programming languages and constructs to query the data. In practice, "NoSQL" means "non-relational database," even though many of these databases do support SQL-compatible queries. However, the underlying query execution strategy is usually very different from the way a traditional RDBMS would execute the same SQL query.

The following sections describe the major categories of non-relational or NoSQL database.

Document data stores

A document data store manages a set of named string fields and object data values in an entity that's referred to as a document. These data stores typically store data in the form of JSON documents. Each field value could be a scalar item, such as a number, or a compound element, such as a list or a parent-child collection. The data in the fields of a document can be encoded in various ways, including XML, YAML, JSON, BSON, or even stored as plain text. The fields within documents are exposed to the storage management system, enabling an application to query and filter data by using the values in these fields.

Typically, a document contains the entire data for an entity. What items constitute an entity are application-specific. For example, an entity could contain the details of a customer, an order, or a combination of both. A single document might contain information that would be spread across several relational tables in a relational database management system (RDBMS). A document store does not require that all documents have the same structure. This free-form approach provides a great deal of flexibility. For example, applications can store different data in documents in response to a change in business requirements.

Non-relational data and NoSQL - Azure Architecture Center (1)

The application can retrieve documents by using the document key. The key is a unique identifier for the document, which is often hashed, to help distribute data evenly. Some document databases create the document key automatically. Others enable you to specify an attribute of the document to use as the key. The application can also query documents based on the value of one or more fields. Some document databases support indexing to facilitate fast lookup of documents based on one or more indexed fields.

Many document databases support in-place updates, enabling an application to modify the values of specific fields in a document without rewriting the entire document. Read and write operations over multiple fields in a single document are typically atomic.

Relevant Azure service:

(Video) Azure Essentials: Database options

  • Azure Cosmos DB

Columnar data stores

A columnar or column-family data store organizes data into columns and rows. In its simplest form, a column-family data store can appear very similar to a relational database, at least conceptually. The real power of a column-family database lies in its denormalized approach to structuring sparse data, which stems from the column-oriented approach to storing data.

You can think of a column-family data store as holding tabular data with rows and columns, but the columns are divided into groups known as column families. Each column family holds a set of columns that are logically related and are typically retrieved or manipulated as a unit. Other data that is accessed separately can be stored in separate column families. Within a column family, new columns can be added dynamically, and rows can be sparse (that is, a row doesn't need to have a value for every column).

The following diagram shows an example with two column families, Identity and Contact Info. The data for a single entity has the same row key in each column family. This structure, where the rows for any given object in a column family can vary dynamically, is an important benefit of the column-family approach, making this form of data store highly suited for storing data with varying schemas.

Non-relational data and NoSQL - Azure Architecture Center (2)

Unlike a key/value store or a document database, most column-family databases physically store data in key order, rather than by computing a hash. The row key is considered the primary index and enables key-based access via a specific key or a range of keys. Some implementations allow you to create secondary indexes over specific columns in a column family. Secondary indexes let you retrieve data by columns value, rather than row key.

On disk, all of the columns within a column family are stored together in the same file, with a specific number of rows in each file. With large data sets, this approach creates a performance benefit by reducing the amount of data that needs to be read from disk when only a few columns are queried together at a time.

Read and write operations for a row are typically atomic within a single column family, although some implementations provide atomicity across the entire row, spanning multiple column families.

Relevant Azure service:

  • Azure Cosmos DB for Apache Cassandra
  • HBase in HDInsight

Key/value data stores

A key/value store is essentially a large hash table. You associate each data value with a unique key, and the key/value store uses this key to store the data by using an appropriate hashing function. The hashing function is selected to provide an even distribution of hashed keys across the data storage.

Most key/value stores only support simple query, insert, and delete operations. To modify a value (either partially or completely), an application must overwrite the existing data for the entire value. In most implementations, reading or writing a single value is an atomic operation. If the value is large, writing may take some time.

(Video) Azure Table Storage Tutorial | Easy and scalable NoSQL database

An application can store arbitrary data as a set of values, although some key/value stores impose limits on the maximum size of values. The stored values are opaque to the storage system software. Any schema information must be provided and interpreted by the application. Essentially, values are blobs and the key/value store simply retrieves or stores the value by key.

Non-relational data and NoSQL - Azure Architecture Center (3)

Key/value stores are highly optimized for applications performing simple lookups using the value of the key, or by a range of keys, but are less suitable for systems that need to query data across different tables of keys/values, such as joining data across multiple tables.

Key/value stores are also not optimized for scenarios where querying or filtering by non-key values is important, rather than performing lookups based only on keys. For example, with a relational database, you can find a record by using a WHERE clause to filter the non-key columns, but key/values stores usually do not have this type of lookup capability for values, or if they do, it requires a slow scan of all values.

A single key/value store can be extremely scalable, as the data store can easily distribute data across multiple nodes on separate machines.

Relevant Azure services:

  • Azure Cosmos DB for Table
  • Azure Cache for Redis
  • Azure Table Storage

Graph data stores

A graph data store manages two types of information, nodes and edges. Nodes represent entities, and edges specify the relationships between these entities. Both nodes and edges can have properties that provide information about that node or edge, similar to columns in a table. Edges can also have a direction indicating the nature of the relationship.

The purpose of a graph data store is to allow an application to efficiently perform queries that traverse the network of nodes and edges, and to analyze the relationships between entities. The following diagram shows an organization's personnel data structured as a graph. The entities are employees and departments, and the edges indicate reporting relationships and the department in which employees work. In this graph, the arrows on the edges show the direction of the relationships.

Non-relational data and NoSQL - Azure Architecture Center (4)

This structure makes it straightforward to perform queries such as "Find all employees who report directly or indirectly to Sarah" or "Who works in the same department as John?" For large graphs with lots of entities and relationships, you can perform complex analyses quickly. Many graph databases provide a query language that you can use to traverse a network of relationships efficiently.

(Video) Database vs Data Warehouse vs Data Lake | What is the Difference?

Relevant Azure service:

  • Azure Cosmos DB Graph API

Time series data stores

Time series data is a set of values organized by time, and a time series data store is optimized for this type of data. Time series data stores must support a very high number of writes, as they typically collect large amounts of data in real time from a large number of sources. Time series data stores are optimized for storing telemetry data. Scenarios include IoT sensors or application/system counters. Updates are rare, and deletes are often done as bulk operations.

Non-relational data and NoSQL - Azure Architecture Center (5)

Although the records written to a time series database are generally small, there are often a large number of records, and total data size can grow rapidly. Time series data stores also handle out-of-order and late-arriving data, automatic indexing of data points, and optimizations for queries described in terms of windows of time. This last feature enables queries to run across millions of data points and multiple data streams quickly, in order to support time series visualizations, which is a common way that time series data is consumed.

For more information, see Time series solutions

Relevant Azure services:

  • Azure Time Series Insights
  • OpenTSDB with HBase on HDInsight

Object data stores

Object data stores are optimized for storing and retrieving large binary objects or blobs such as images, text files, video and audio streams, large application data objects and documents, and virtual machine disk images. An object consists of the stored data, some metadata, and a unique ID for accessing the object. Object stores are designed to support files that are individually very large, as well provide large amounts of total storage to manage all files.

Non-relational data and NoSQL - Azure Architecture Center (6)

Some object data stores replicate a given blob across multiple server nodes, which enables fast parallel reads. This process, in turn, enables the scale-out querying of data contained in large files, because multiple processes, typically running on different servers, can each query the large data file simultaneously.

One special case of object data stores is the network file share. Using file shares enables files to be accessed across a network using standard networking protocols like server message block (SMB). Given appropriate security and concurrent access control mechanisms, sharing data in this way can enable distributed services to provide highly scalable data access for basic, low-level operations such as simple read and write requests.

(Video) Choose your database on Google Cloud

Relevant Azure services:

  • Azure Blob Storage
  • Azure Data Lake Store
  • Azure File Storage

External index data stores

External index data stores provide the ability to search for information held in other data stores and services. An external index acts as a secondary index for any data store, and can be used to index massive volumes of data and provide near real-time access to these indexes.

For example, you might have text files stored in a file system. Finding a file by its file path is quick, but searching based on the contents of the file would require a scan of all of the files, which is slow. An external index lets you create secondary search indexes and then quickly find the path to the files that match your criteria. Another example application of an external index is with key/value stores that only index by the key. You can build a secondary index based on the values in the data, and quickly look up the key that uniquely identifies each matched item.

Non-relational data and NoSQL - Azure Architecture Center (7)

The indexes are created by running an indexing process. This can be performed using a pull model, triggered by the data store, or using a push model, initiated by application code. Indexes can be multidimensional and may support free-text searches across large volumes of text data.

External index data stores are often used to support full text and web-based search. In these cases, searching can be exact or fuzzy. A fuzzy search finds documents that match a set of terms and calculates how closely they match. Some external indexes also support linguistic analysis that can return matches based on synonyms, genre expansions (for example, matching "dogs" to "pets"), and stemming (for example, searching for "run" also matches "ran" and "running").

Relevant Azure service:

  • Azure Search

Typical requirements

Non-relational data stores often use a different storage architecture from that used by relational databases. Specifically, they tend toward having no fixed schema. Also, they tend not to support transactions, or else restrict the scope of transactions, and they generally don't include secondary indexes for scalability reasons.

The following compares the requirements for each of the non-relational data stores:

RequirementDocument dataColumn-family dataKey/value dataGraph data
NormalizationDenormalizedDenormalizedDenormalizedNormalized
SchemaSchema on readColumn families defined on write, column schema on readSchema on readSchema on read
Consistency (across concurrent transactions)Tunable consistency, document-level guaranteesColumn-family–level guaranteesKey-level guaranteesGraph-level guarantees
Atomicity (transaction scope)CollectionTableTableGraph
Locking StrategyOptimistic (lock free)Pessimistic (row locks)Optimistic (ETag)
Access patternRandom accessAggregates on tall/wide dataRandom accessRandom access
IndexingPrimary and secondary indexesPrimary and secondary indexesPrimary index onlyPrimary and secondary indexes
Data shapeDocumentTabular with column families containing columnsKey and valueGraph containing edges and vertices
SparseYesYesYesNo
Wide (lots of columns/attributes)YesYesNoNo
Datum sizeSmall (KBs) to medium (low MBs)Medium (MBs) to Large (low GBs)Small (KBs)Small (KBs)
Overall Maximum ScaleVery Large (PBs)Very Large (PBs)Very Large (PBs)Large (TBs)
RequirementTime series dataObject dataExternal index data
NormalizationNormalizedDenormalizedDenormalized
SchemaSchema on readSchema on readSchema on write
Consistency (across concurrent transactions)N/AN/AN/A
Atomicity (transaction scope)N/AObjectN/A
Locking StrategyN/APessimistic (blob locks)N/A
Access patternRandom access and aggregationSequential accessRandom access
IndexingPrimary and secondary indexesPrimary index onlyN/A
Data shapeTabularBlob and metadataDocument
SparseNoN/ANo
Wide (lots of columns/attributes)NoYesYes
Datum sizeSmall (KBs)Large (GBs) to Very Large (TBs)Small (KBs)
Overall Maximum ScaleLarge (low TBs)Very Large (PBs)Large (low TBs)

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

(Video) Azure Cosmos DB Tutorial for Beginners | Globally distributed NoSQL database | K21Academy

Principal author:

Next steps

  • Relational vs. NoSQL data
  • Understand distributed NoSQL databases
  • Microsoft Azure Data Fundamentals: Explore non-relational data in Azure
  • Implement a non-relational data model
  • Databases architecture design
  • Understand data store models
  • Scalable order processing
  • Near real-time lakehouse data processing

FAQs

What is non-relational database in Azure? ›

Non-relational data is a common way for applications to store and query data without the overhead of a relational schema. In Microsoft Azure, you can use Azure Storage and Azure Cosmos DB to build highly scalable, secure data stores for non-relational data.

Which is the NoSQL database of Azure? ›

Azure Cosmos DB for NoSQL.

What are the 2 types of non-relational database? ›

There are five popular non-relational types: document data store, column-oriented database, key-value store, document store, and graph database. Often combinations of these types are used for a single application.

What are the four 4 different types of NoSQL databases? ›

Here are the four main types of NoSQL databases:
  • Document databases.
  • Key-value stores.
  • Column-oriented databases.
  • Graph databases.

What is a non relational NoSQL database? ›

Non-relational databases are sometimes referred to as “NoSQL,” which stands for Not Only SQL. The main difference between these is how they store their information. A non-relational database stores data in a non-tabular form, and tends to be more flexible than the traditional, SQL-based, relational database structures.

What is NoSQL or non relational database? ›

The term NoSQL refers to data stores that do not use SQL for queries. Instead, the data stores use other programming languages and constructs to query the data. In practice, "NoSQL" means "non-relational database," even though many of these databases do support SQL-compatible queries.

What are the 3 NoSQL database properties? ›

NoSQL databases have the following properties: They have higher scalability. They use distributed computing. They are cost effective.

Which is an example of a NoSQL database? ›

Document-oriented NoSQL database solutions include MongoDB, CouchDB, Riak, Amazon SimpleDB, and Lotus Notes.

Which database is most used NoSQL? ›

MongoDB, considered the most popular NoSQL database, is a document-oriented open-source database.

What is the architecture of NoSQL? ›

The NoSQL database approach is characterized by a move away from the complexity of SQL based servers. The logic of validation, access control, mapping querieable indexed data, correlating related data, conflict resolution, maintaining integrity constraints, and triggered procedures is moved out of the database layer.

What architecture does NoSQL follow? ›

NoSQL follows BASE property. Unlike ACID it is not the set of property but simple guidelines. BA-basic availability, S-soft state and E-eventual consistency. Unlike structured database, data in NoSQL is stored in Key-Value format.

What kind of a NoSQL store is Azure table storage? ›

Azure Table storage is a service that stores non-relational structured data (also known as structured NoSQL data) in the cloud, providing a key/attribute store with a schemaless design. Because Table storage is schemaless, it's easy to adapt your data as the needs of your application evolve.

What is the best non relational database? ›

Best NoSQL database #1: MarkLogic

MarkLogic is a multi-model NoSQL database designed to handle complex data integration use cases such as large data sets with multiple different models or in a fast-changing business environment. The database has been designed to have a single platform for data needs.

What is difference between relational database and non relational database? ›

A relational database is structured, meaning the data is organized in tables. Many times, the data within these tables have relationships with one another, or dependencies. A non relational database is document-oriented, meaning, all information gets stored in more of a laundry list order.

What is NoSQL used for? ›

NoSQL databases store data in documents rather than relational tables. Accordingly, we classify them as “not only SQL” and subdivide them by a variety of flexible data models. Types of NoSQL databases include pure document databases, key-value stores, wide-column databases, and graph databases.

Can SQL be used in non relational database? ›

Yes, SQL. It's in NoSQL databases now. Nonrelational databases are turning to the most successful and well-known database language to put it to work on nonrelational data (like JSON).

What is the difference between SQL database and NoSQL database? ›

SQL databases are vertically scalable, while NoSQL databases are horizontally scalable. SQL databases are table-based, while NoSQL databases are document, key-value, graph, or wide-column stores. SQL databases are better for multi-row transactions, while NoSQL is better for unstructured data like documents or JSON.

Why MongoDB is non relational database? ›

Non-relational database for JSON-like documents

MongoDB is a non-relational document database that provides support for JSON-like storage. The MongoDB database has a flexible data model that enables you to store unstructured data, and it provides full indexing support, and replication with rich and intuitive APIs.

What is the difference between a NoSQL and a relational database management system? ›

Relational databases store data according to specific schemas. By contrast, NoSQL systems allow data to be stored using any structure required but provides a way for updating that data when changing that structure.

How many data models are there in NoSQL? ›

In general, there are four different types of data models in NoSQL.

What are two characteristics of a NoSQL database? ›

Here are the 5 key features to look for in a NoSQL database:
  • Support for Multiple Data Models. ...
  • Easily Scalable via Peer-to-Peer Architecture. ...
  • Flexibility: Versatile Data Handling. ...
  • Distribution Capabilities. ...
  • Zero Downtime.
Sep 6, 2017

Which data model is NoSQL? ›

NoSQL databases (aka "not only SQL") are non-tabular databases and store data differently than relational tables. NoSQL databases come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph.

What are some examples of NoSQL architecture? ›

MongoDB, CouchDB, CouchBase, Cassandra, HBase, Redis, Riak, Neo4J are the popular NoSQL databases examples. MongoDB, CouchDB, CouchBase , Amazon SimpleDB, Riak, Lotus Notes are Document-oriented NoSQL databases.

What does NoSQL data look like? ›

Data model.

With NoSQL database systems, data is not modeled as tables with fixed rows and columns, as with a SQL DBMS. Instead, depending on the NoSQL database, data can be modeled as JSON documents, graphs with nodes and edges, or key-value pairs.

What language does NoSQL use? ›

Many NoSQL vendors are still using a variation of SQL. Cosmos DB, Cassandra CQL, Elasticsearch SQL, Cockroach Labs. Even with Mongodb query language, you will find that it is based on the select-join-project construct, which is the foundation of relational algebra that is used in SQL.

Why is NoSQL better than relational database? ›

Relational Database has a fixed schema. NoSQL Database is only eventually consistent. NoSQL databases don't support transactions (support only simple transactions). Relational Database supports transactions (also complex transactions with joins).

Where NoSQL should not be used? ›

If you're not working with a large volume of data or many data types, NoSQL would be overkill. You are constantly adding new features, functions, data types. It's difficult to predict how the application will grow over time. Changing a data model is SQL is clunky and requires code changes.

What is the fastest NoSQL database? ›

1. MongoDB. MongoDB is an excellent database for storing documents in JSON objects. Large companies like Uber and eBay use their services.

What is unmanaged database? ›

Unmanaged database deployments place the burden of support entirely on the developer or infrastructure teams. Often installed as part of a software stack (such as LAMP or LEMP), databases require maintenance, upgrades, and monitoring to ensure reliability and security at the heart of your application.

What is the difference between relational and non relational DB? ›

A relational database is structured, meaning the data is organized in tables. Many times, the data within these tables have relationships with one another, or dependencies. A non relational database is document-oriented, meaning, all information gets stored in more of a laundry list order.

What is the difference between rational and non relational database? ›

A relational database is the database management system in which data is stored in distinct tables from where they can be accessed or reassembled in different ways under user-defined relational tables, whereas a Non-Relational Database is the database architecture that is not built around tables.

What are the different databases in Azure? ›

Azure offers a choice of fully managed relational, NoSQL, and in-memory databases, spanning proprietary and open-source engines, to fit the needs of modern app developers. Infrastructure management—including scalability, availability, and security—is automated, saving you time and money.

What is difference between managed and unmanaged? ›

A managed switch enables better control of networks and the data frames moving through them. Unmanaged switches, on the other hand, enable connected devices to communicate with one another in their most basic form. Below, we compare managed vs.

What is difference between managed and unmanaged resources? ›

Managed resources are those that are pure . NET code and managed by the runtime and are under its direct control. Unmanaged resources are those that are not. File handles, pinned memory, COM objects, database connections etc.

What is the difference between managed and unmanaged system? ›

The Differences Between Managed and Unmanaged Network Switches. On a basic level, an unmanaged switch allows you to immediately plug-and-play devices into your network, while a managed switch allows for greater control over it.

What is an example of a non-relational database? ›

2) Non-relational databases, also called NoSQL databases, the most popular being MongoDB, DocumentDB, Cassandra, Coachbase, HBase, Redis, and Neo4j. These databases are usually grouped into four categories: Key-value stores, Graph stores, Column stores, and Document stores (see Types of NoSQL databases).

How data is stored in non-relational database? ›

NoSQL database technology stores information in JSON documents instead of columns and rows used by relational databases.

What are the issues with non-relational database? ›

security is lesser than relational databases which is a major concern. Like Mongodb and Cassandra both databases have lack of encryption for data files, they have very weak authentication system, and very simple authorization without support of RBAC.

What are 3 common characteristics of NoSQL databases? ›

The three main features of NoSQL databases are scale-out, replication, and flexible data structure (Fig. 1).

What is the most popular non relational database? ›

MongoDB, considered the most popular NoSQL database, is a document-oriented open-source database. It is an accessible and scalable database available in C++ and can double as a file system. In addition, it can serve as the query language.

Videos

1. Data Flow Architecture in Couchbase
(Couchbase)
2. NoSQL Data Modeling Using JSON Documents – A Practical Approach
(DATAVERSITY)
3. Microsoft Azure Data Fundamentals [Exam DP-900] Full Course
(Susanth Sutheesh)
4. AZURE Data Fundamentals for Beginners PART-4 Non RELATIONAL Data Services in AZURE
(Technocraft)
5. Databases in the Microservices World
(Coding Tech)
6. Real world NoSQL design patterns with Azure Cosmos DB
(Azure Cosmos DB)
Top Articles
Latest Posts
Article information

Author: Virgilio Hermann JD

Last Updated: 02/04/2023

Views: 5699

Rating: 4 / 5 (41 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Virgilio Hermann JD

Birthday: 1997-12-21

Address: 6946 Schoen Cove, Sipesshire, MO 55944

Phone: +3763365785260

Job: Accounting Engineer

Hobby: Web surfing, Rafting, Dowsing, Stand-up comedy, Ghost hunting, Swimming, Amateur radio

Introduction: My name is Virgilio Hermann JD, I am a fine, gifted, beautiful, encouraging, kind, talented, zealous person who loves writing and wants to share my knowledge and understanding with you.