The Guide to MongoDB Field Level Encryption

Original article was published by Stefan Pfaffel on Artificial Intelligence on Medium

Client Application Setup

Runtime dependencies

To enable client-side encryption, we need to install the required C libraries on our server or in our container first. The required C libraries are libbson and libmongocrypt.

We must also add the respective wrapper library as a dependency to our application. The NodeJS wrapper’s npm package is called mongodb-client-encryption; the Java wrapper is called mongodb-crypt and is available on Maven Central.

Schema objects

Having created the encryption keys, we can proceed with our client application setup. We want to store data encrypted and enforce that specific fields cannot be stored unencrypted. This way, we prevent certain information from being stored in plain text and accessible to anyone who has direct access to the database.

JSON Schema is the recommended means of performing schema validation.

Requirements like encryption for particular fields can be added to MongoDB collections via schema definitions. A schema describes the structure and characteristics of a MongoDB document and can, therefore, define the following:

  • required and optional properties
  • property names and their type
  • min and max values
  • regular expressions, the values must match
  • a set of predefined values in case of an enumeration

MongoDB recommends the usage of JSON Schema to describe documents. A JSON Schema is a JSON object that outlines requirements that will be used for schema validation.

According to the documentation,

“JSON Schema is a vocabulary that allows you to annotate and validate JSON documents.

– Describes your existing data format(s).
– Provides clear human- and machine- readable documentation.
– Validates data which is useful for:
o Automated testing.
o Ensuring quality of client submitted data.”

Our user object contains a unique random id, a name, and an email address. We want all these properties to be mandatory. Additionally, we enforce that:

  • id, name, and email address are set
  • id is a valid UUID
  • name and email address are stored encrypted
The user object with id, name, and email property

The JSON file below shows how a JSON Schema object that contains our requirements looks.

  • Line 2– Line 4: Metadata for this document: The title of the document and the type we’re going to define.
  • Line 4— Line 8: The keyword required defines an array of non-optional properties.
  • Line 9— Line 29: The keyword properties defines an object of known properties.
  • Line 10 — Line 14: The property id must be a string and match the given regular expression.
  • Line 15 — Line 21: The property name must be a string encrypted with the deterministic algorithm.
  • Line 22— Line 28: The property email must be a string encrypted with the non-deterministic algorithm.

The deterministic algorithm ensures that the same value always encrypts to the same output. This is necessary to look up encrypted data because it allows us to reconstruct the encrypted value and therefore use it in database queries.

In contrast, the non-deterministic random algorithm ensures that the encryption of equal values results in different outputs. Because the output changes with every encryption, it’s harder to calculate the input value, compared to the deterministic algorithm. Security-wise, that’s a plus. The drawback is that we cannot query data encrypted with the random algorithm. Nevertheless, we can still query documents by other criteria.

Data encrypted with one of these algorithms can, in any case, be decrypted by the application that has access to the master key. So, regarding the algorithm, we mainly have to decide if we want to use the encrypted data as a key for MongoDB queries. If yes, we have to use the deterministic algorithm. If not, we can use the randomized algorithm, which provides better data security.

Schema validation

The JSON schema shown above has to be added to the MongoDB collection to enable schema validation. In our case, we add the schema after the application started and before the first query is executed. As our application is not running in an elastic environment and we do not expect traffic spikes, that is not a performance problem. Applications running in a high-traffic environment with dynamic scaling should update the schema with dedicated applications/containers to improve performance and remove load from the database.

Anyway, here’s a class we use to create, cache, and retrieve connections to MongoDB instances. This class is also responsible for creating collections and enabling JSON Schema validation.

Line 41– Line 56: Create the collection if it cannot be found in the current set of collections.

Line 58 — Line 64: Look up the schema object from the schemas folder and update the validator of the current collection accordingly. Set the validation level to strict so that all existing and new documents are validated against the updated schema.

Inserts of new documents now fail because we still have to add the actual field encryption. The error message shown on the client side is very generic, but in our case, it’s directly related to the updated encryption requirement.

Our test now exits with the following message:

1) returns a user that was previously added
0 passing (201ms)
1 failing
1) Users
returns a user that was previously added:

MongoError: Document failed validation

Implementation of client-side encryption

The final task. Let’s update the mongodb-connection class, shown in the snippet above, to handle encryption and decryption transparently. Therefore, we need to pass the master key and the encryption key collection name to the ClientEncryption constructor along with an active MongoClient.

this._clientEncryption = new ClientEncryption(this._client, {
keyVaultNamespace: 'encryption.__keys',
kmsProviders: {
local: {
key: Buffer.from(encryptionKey, 'base64')

ClientEncryption is a class from the mongodb-client-encryption package. Instances of ClientEncryption have a method enrypt and decrypt that returns a promise and will resolve with encrypted or decrypted data respectively.

The last thing we need to implement is the actual encryption. This is the easiest step so far, so let’s jump straight to the result.

Line 14 — Line 21: Before name and email are stored, they are encrypted. The encrypted model gets a unique UUID after the encryption.

Line 23 — Line 31: To look up a user by name, we encrypt the name first, because the name is stored encrypted in our database.

The final test results:

√ returns a user that was previously added (262ms)
1 passing (337ms)