MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
documentdb
Search

Microsoft’s new DocumentDB builds on PostgreSQL

Thursday January 30, 2025. 10:00 AM , from InfoWorld
Microsoft’s recent launch of a standalone version of the MongoDB compatibility layer for its global-scale Azure Cosmos DB brought back an old name. Back in 2018, when the company unveiled a public version of the Project Florence database engine that powers much of Azure, they called it DocumentDB. That original name worked well for some of the database’s personalities, but its support for much more than JSON documents soon led to a new, now more familiar name. Cosmos DB has continued to evolve, with its document database capabilities offering a familiar set of MongoDB-compatible APIs.

A recent set of updates introduced the vCore variant of Azure Cosmos DB, which moves from the multi-tenant, cross-region, transparently scalable resource unit-based Cosmos DB to an alternative architecture that behaves more like traditional Azure services, with defined host virtual machines and a more predictable pricing model. The vCore-based MongoDB APIs are the same as those used with the cloud-scale resource unit version, but the underlying technologies are quite different, and moving from one version to the other requires a complete migration of your data.

Last week Microsoft revealed the differences in the two implementations when it unveiled an open-source release of the vCore Cosmos DB engine. Built on the familiar PostgreSQL platform, the new public project adds NoSQL features with the MongoDB APIs. As it focuses purely on storing JSON content, Microsoft decided to bring back the original DocumentDB name.

The new DocumentDB comes with a permissive MIT license and is intended to provide a standard NoSQL environment for your data to reduce the complexity associated with migrating from one platform to another. Choosing to work with PostgreSQL is part of that, as it has long been a popular platform for developers, one that’s had something of a recent renaissance.

A modern NoSQL database with PostgreSQL roots

By open sourcing a tool that’s already widely used in Azure, Microsoft is giving developers the ability to run something that’s already proven to work well. Most of the features we expect to find in a modern NoSQL store are already there, from basic CRUD (create, read, update, delete) operations to more complex vector search tools and the indexes needed to support them. This ensures you will be able to build on and extend a database that can support most scenarios.

DocumentDB sits on top of the existing PostgreSQL platform, which manages storage, indexing, and other key low-level operations. The result is that DocumentDB is implemented using two components: one to add support for BSON (Binary JavaScript Object Notation) data types and one to support the DocumentDB APIs, adding CRUD operations, queries, and index management.

BSON is the fundamental data type used in MongoDB but with implementations in most common languages. If you’re going to build a common NoSQL store based on MongoDB APIs, then BSON will be the way you represent your standard NoSQL data structures, such as key-value pairs and arrays. It’s easy to build JSON documents, but using BSON allows you to store and search content more effectively.

You can think of DocumentDB as a stack. At the bottom is PostgreSQL itself, then the DocumentDB extension that gives the database the ability to work with BSON data. Once installed it lets you parse BSON data and then use the PostgreSQL engine to build indexes, not only using the database engine’s standard tools but also other extensions. The result is the ability to deliver complex indexes that support all kinds of queries.

One useful feature is the ability to use PostgreSQL’s vector index capabilities to build your BSON data into a retrieval-augmented generation (RAG) application or use nearest-neighbor searches to build recommendation engines or identify fraud patterns. There’s a lot of utility in a NoSQL database with many different indexing options; it gives you the necessary foundations for many different application types—all working on the same data set.

Getting started with DocumentDB

This first public release of DocumentDB inherits code already running in Azure, so it’s ready to build and use, hosted on GitHub. The instructions in the project wiki are focused on using VS Code and Docker to build on top of WSL 2.0, though you can use any Linux via VS Code’s remote engine. You build the container, then make, install, and launch the binaries. The DocumentDB container already holds PostgreSQL, so once setup is complete, you can connect to its shell and start experimenting with BSON support.

From the shell, you can embed API calls in select statements. This allows you to experiment with operations before adding them to calls from your code. The shell lets you build collections, add items, and experiment with CRUD operations. Other operations apply filters and support queries, as well as building indexes across one or more fields in a collection. You can find a lengthy list of documented API functions in the project wiki, grouped into common sets of operations.

For now, the GitHub wiki is the main source of documentation for DocumentDB. It’s a little on the thin side and could do with more examples. However, DocumentDB is currently intended for developers who want an alternative to MongoDB, one that’s available with an open source license rather than a source-available license. For now, as there’s no SDK, you’ll need to build your own calls to the API. These are based on MongoDB, so porting applications shouldn’t be too complex.

Why this? Why now?

The reasoning behind the DocumentDB project seems to be the big ambition to deliver a standard NoSQL API and engine, much like that developed for SQL. Microsoft has a lot of experience working in standards bodies, especially building and delivering the essential tests needed to make sure that any implementation of the resulting standard meets the necessary requirements.

We’ve seen Microsoft deliver extensive test suites for protocols and languages, and we can expect this level of tooling to be a key component of any future NoSQL standard. We need common APIs and engine features to help with application and data portability. A common standard will allow NoSQL stores to compete on performance and other business-essential features such as scalability and resilience.

DocumentDB’s layered approach to delivering basic functionality is perhaps the most important part of what Microsoft is doing here. The blog post announcing DocumentDB talks about “a protocol translation layer” on top of the BSON extension, bridging APIs to the document store in a way that makes it possible to have a single store that looks like MongoDB to one set of clients, Aerospike to another, or CouchDB, Couchbase, and more.

A reference for a NoSQL standard

For DocumentDB to be the foundation of a NoSQL standard, it has to be vendor-neutral. By allowing you to switch protocols on top of the same underlying store, you can use the APIs you’re familiar with, no matter their source. Query engine designers can focus on their area of expertise, while the PostgreSQL team can continue to deliver the resilient, high-performance database necessary for modern applications.

One example of this is the latest release of the open source FerretDB NoSQL database. The latest release, FerretDB 2.0, is built using DocumentDB and is getting a considerable performance increase. The FerretDB team can continue to work on its own features, taking advantage of the open source DocumentDB to provide the core BSON support necessary for a MongoDB-compatible NoSQL database. The FerretDB team claims up to 20x better performance. It will continue to use its own Apache 2.0 license in parallel with Microsoft’s MIT license.

Another interesting point shows how much Microsoft has changed in the past decade or so: The first product shipping on the standalone DocumentDB is coming from Ferret, an open source company that’s not Microsoft.

DocumentDB is a project to keep an eye on, especially when Microsoft starts the process of using it as a reference implementation for a new NoSQL standard. With community support, hopefully we’ll then see a rapid rollout of the MongoDB API features that are currently missing—adding them into both the middleware layer to map them to PostgreSQL operations and the API implementation.
https://www.infoworld.com/article/3812630/microsoft-unveils-documentdb-a-standalone-nosql-database-b...

Related News

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network
Current Date
Jan, Fri 31 - 00:31 CET