Schema-based multitenancy in NestJS with TypeORM

For a project I had to set up multi-tenancy in NestJS and TypeORM, which turned out to be quite an interesting challenge. In this post I'll describe which approach I went with, go through my findings, and give you a fully working setup which you can use and adapt for your own project!

The code I used for this is largely based on an amazing post by Thomas Vanderstraeten: https://thomasvds.com/schema-based-multitenancy-with-nest-js-type-orm-and-postgres-sql, and this post aims to mainly improve upon that by modernizing it, and fixing some real-world problems I ran into. If you want to read more in-depth about the basics of this setup, I would highly recommend reading Thomas’ post.

Want to dive straight into the code? That can be found here: https://github.com/LucaScorpion/nestjs-typeorm-schema-multitenancy

Why not just: `andWhere({ tenantId })` ?

Before ending up at one schema per tenant I tried another simpler approach: add a tenant entity, and create relations to the tenant from all relevant entities. While this does work just fine, I found it to be too easy to overlook a “where”-clause.

With this approach you’d have to make sure that every single query checks for tenant ids, which becomes extra annoying when you’re dealing with entities that are not directly related to a tenant, but via another entity. For example: you could have a “tenant” entity, which has “user” entities, each of which has “cat” entities. When querying the “cat”s for a tenant, you have to check that the owning “user” belongs to the tenant you’re querying. This becomes increasingly annoying as you get further removed from the “tenant” entity. Alternatively you could add the “tenant” relationship to each entity, and while that does simplify things it also clutters up your schema, which - to me - doesn’t feel very nice.

After playing around with this approach for a couple of hours, I found that I simply wasn’t comfortable with this approach. Preventing any cross-tenant data leakage (or changes) is paramount, so I started looking for another solution.

The basis

After some online searching I quickly came across Thomas Vanderstraeten’s post. This was an amazing start, and I was quickly able to adapt my existing code to this approach. I won’t repeat the entire post here, but it provides a good starting point for the application structure, TypeORM configuration, and tenant schema connection management.

Unfortunately, it didn’t take long before I ran into my first issue: deprecated code.

Fixing deprecations

Thomas’ sample project uses TypeORM 0.2.41. At the time of writing, the latest version is 0.3.20, and somewhere between those versions the Connection and ConnectionManager classes got deprecated. This was an unfortunate setback, but let’s see what we can do!

The Connection class can simply be replaced with a DataSource class, which is functionally the same. This deprecation seemed to have been mostly about improving naming. Nice and easy!

However, the ConnectionManager was slightly more confusing. The deprecation comment doesn’t suggest any alternatives, and as far as I could see there isn’t a DataSourceManager with similar functionality. After looking at the code for a bit, I did see one problem: memory leaks. The ConnectionManager simply keeps a Map of connections, but never removes values from it. Apart from that, there’s not much to it, so I decided to write my own improved DataSourceManager:

import { LRUCache } from 'lru-cache';
import { DataSource } from 'typeorm';
import { MAX_TENANT_DATA_SOURCES } from '../orm.config';

class DataSourceManager {
  private readonly cache = new LRUCache<string, DataSource, unknown>({
    max: MAX_TENANT_DATA_SOURCES,

    dispose: async (source: DataSource) => {
      await source.destroy();
    },
  });

  public async getOrCreate(
    name: string,
    createFn: () => Promise<DataSource>,
  ): Promise<DataSource> {
    const existing = this.cache.get(name);
    if (existing) {
      return existing;
    }

    const created = await createFn();
    this.cache.set(name, created);
    return created;
  }
}

export const dataSourceManager = new DataSourceManager();

src/tenancy/data-source.manager.ts

The approach for this manager is slightly different from the original. Instead of keeping a list or map of connections, it functions more like a cache. For the storage I used lru-cache, which works perfectly for this case. You can specify a maximum number of items to keep (coming from MAX_TENANT_DATA_SOURCES in this code), and a dispose function which can destroy data sources (i.e. close connections) when an item is removed from the cache.

With this I could now make a getTenantDataSource function, to easily get a data source based on a tenant id:

import { DataSource } from 'typeorm';
import { TENANT_SCHEMA_PREFIX, tenantedOrmConfigFactory } from '../orm.config';
import { dataSourceManager } from './data-source.manager';

export async function getTenantDataSource(
  tenantId: number,
): Promise<DataSource> {
  const schema = tenantSchemaName(tenantId);

  return dataSourceManager.getOrCreate(schema, async () => {
    const source = new DataSource(tenantedOrmConfigFactory(schema));
    await source.initialize();
    return source;
  });
}

export function tenantSchemaName(tenantId: number): string {
  return `${TENANT_SCHEMA_PREFIX}${tenantId}`;
}

src/tenancy/tenant.data-source.ts

All this function really has to do is derive the schema name from the tenant id, and get (or create) a data source. The tenantedOrmConfigFactory function here simply returns TypeORM configuration.

With this, the deprecations were out of the way, and I could continue my testing!

Using users across tenants

In the original example code there are no relations between any public and tenanted entities. This makes sense if your application is tenanted in such a way that each tenant has its own users, and those users are fully disconnected from any other tenant. However, in my application, users are a more global entity. One user can even be part of multiple tenants. This is where I had to make some adjustments to the TypeORM configuration. In the original setup, the tenanted config only loads the entities from the “tenanted” folder, while in my case the “User” entity is part of the “public” folder. Fixing this was fairly trivial:

import * as path from 'node:path';
import { PostgresConnectionOptions } from 'typeorm/driver/postgres/PostgresConnectionOptions';

export const PUBLIC_SCHEMA = 'public';
export const TENANT_SCHEMA_PREFIX = 'tenant_';

export const MAX_TENANT_DATA_SOURCES = 1000;

export function publicOrmConfigFactory(): PostgresConnectionOptions {
  return {
    type: 'postgres',
    host: 'localhost',
    database: 'postgres',
    username: 'postgres',
    password: 'postgres',
    schema: PUBLIC_SCHEMA,
    synchronize: true,
    entities: [path.join(__dirname, 'public/**/*.entity.{ts,js}')],
    migrations: [path.join(__dirname, 'migrations/public/*.{ts,js}')],
  };
}

export function tenantedOrmConfigFactory(
  schema: string,
): PostgresConnectionOptions {
  return {
    ...publicOrmConfigFactory(),
    schema,
    entities: [path.join(__dirname, '**/*.entity.{ts,js}')],
    migrations: [path.join(__dirname, 'migrations/tenanted/*.{ts,js}')],
  };
}

src/orm.config.ts

The main difference is that the tenanted entities are now loaded from **/*.entity.{ts,js}, instead of tenanted/**/*.entity{.ts,.js}. This does mean you will need to explicitly specify the schema on public entities, like so:

import { Entity, PrimaryGeneratedColumn } from 'typeorm';
import { PUBLIC_SCHEMA } from '../../orm.config';

@Entity({ schema: PUBLIC_SCHEMA })
export class Tenant {
  @PrimaryGeneratedColumn()
  public id: number;
}

src/public/tenants/tenant.entity.ts

In the config file you can also see some other constants, like the public schema name, tenanted schema prefix, and max tenant data sources as used in the DataSourceManager (don’t worry too much about this value just yet). I also chose to change the config into factory functions, which means you could inject the ConfigService to load configuration in a more NestJS-idiomatic way.

With this set up, you can now have a tenanted entity, which belongs to a public entity. For example, a “Cat” which belongs to a “User”:

@Entity()
export class Cat {
  @PrimaryGeneratedColumn()
  public id: number;

  @ManyToOne(() => User)
  public user: User;
}

Finding the limits

With all this done, everything seemed to work really well! The test suite I’d built up was passing consistently, and also any manual testing was looking promising. But then…

A terminal showing the error message: sorry, too many clients already

After running a bunch of tests - all of which create a new tenant - suddenly the application failed to boot. I had run into the PostgreSQL connection limit. The cause was this code, which on boot synchronizes all tenant schemas:

import { NestFactory } from '@nestjs/core';
import { AppModule } from './app.module';
import { EntityManager } from 'typeorm';
import { TENANT_SCHEMA_PREFIX } from './orm.config';
import { getTenantDataSource } from './tenancy/tenant.data-source';

(async () => {
  const app = await NestFactory.create(AppModule);
  const manager = app.get(EntityManager);
  await syncTenantSchemas(manager);
  await app.listen(3000);
})();

async function syncTenantSchemas(manager: EntityManager) {
  const schemas = await manager.query<{ schema_name: string }[]>(`
      select schema_name
      from information_schema.schemata
      where schema_name like '${TENANT_SCHEMA_PREFIX}%'
  `);

  for (const row of schemas) {
    const tenantId = parseInt(
      row.schema_name.replace(TENANT_SCHEMA_PREFIX, ''),
      10,
    );
    const dataSource = await getTenantDataSource(tenantId);
    await syncDataSource(dataSource);
  }
}

src/main.ts

Small side note: synchronizing schemas like this is not recommended for production, you should use migrations instead! But for development purposes it’s pretty handy.

At this point I had well over a hundred “test”-schemas, each of which had to be synchronized on boot. Remember that MAX_TENANT_DATA_SOURCES = 1000 from earlier, that’s used in the DataSourceManager? That defines how many active data sources we’re keeping in the cache. Clearly 1000 is too many, but how many is enough? That question turned out to be a little more complex than I thought.

I turned to the PostgreSQL configuration docs where I found that the maximum number of connections is (typically) 100. However, it also reserves 3 connections for superusers, meaning effectively the maximum number of connections is 97.

Alright, so: we have 97 connections, and we have to share those between one data source for the public entities, and an unknown number of data sources for the tenanted entities. This means we should reserve a fixed number of connections for our public data source, and we can assign the rest to our tenanted data sources. That “rest” number is also the maximum number of items that should be in our data source manager cache. Let’s put this into code:

const MAX_CONNECTIONS = 90;
const MAX_PUBLIC_CONNECTIONS = 10;
export const MAX_TENANT_DATA_SOURCES = MAX_CONNECTIONS - MAX_PUBLIC_CONNECTIONS;

export function publicOrmConfigFactory(): PostgresConnectionOptions {
  return {
    // ...
    extra: {
      max: MAX_PUBLIC_CONNECTIONS,
    },
  };
}

export function tenantedOrmConfigFactory(
  schema: string,
): PostgresConnectionOptions {
  return {
    ...publicOrmConfigFactory(),
    // ...
    extra: {
      max: 1,
    },
  };
}

src/orm.config.ts

We specify MAX_CONNECTIONS, which is the maximum number of connections PostgreSQL will accept (with some margin, just in case). We also specify MAX_PUBLIC_CONNECTIONS, which is the amount of connections for the public data source. The difference is assigned to MAX_TENANT_DATA_SOURCES, which can be used by the data source manager. Here I’ve set each tenanted data source to be a single connection, which simplifies the setup a bit, since that ensures one tenanted data source equals one connection.

To limit how many connections TypeORM will open, we have to pass options to the underlying driver using the extra property. Note that these values are driver-specific, max is for PostgreSQL.

Disclaimer: These numbers are likely to be far from optimal, and the best performance will differ heavily based on your actual use case. So be sure to actively monitor and tweak these as necessary!

A slightly more complex setup with more knobs to tweak could look like this:

const MAX_CONNECTIONS = 90;
const MAX_PUBLIC_CONNECTIONS = Math.max(1, Math.floor(MAX_CONNECTIONS * 0.1));
const MAX_TENANTED_CONNECTIONS = MAX_CONNECTIONS - MAX_PUBLIC_CONNECTIONS;
const MAX_CONNECTIONS_PER_TENANT = 2;
export const MAX_TENANT_DATA_SOURCES = Math.floor(
  MAX_TENANTED_CONNECTIONS / MAX_CONNECTIONS_PER_TENANT,
);

Here we also start by specifying the max number of PostgreSQL connections, and then we take a percentage of that for the public data source (10% in this case). Then we calculate how many connections are left for tenanted data sources, and based on the max number of connections per tenanted data source, we calculate how many of those we can have.

With these limits in place, the application booted again, but the real problem is still not quite solved. Namely: it’s possible for any number of requests to come in from different tenants, which would all initialize a tenanted data source. If this number is greater than the number of items the DataSourceManager is configured to keep, some of those sources will be disconnected, which will cause queries (and thus requests) to fail! To prevent this we would need to limit the amount of requests that we process at any one time. This could potentially be done by a load balancer, but I decided to build it into the code, so we can take advantage of the configuration.

Building this ended up being fairly straightforward. Using p-queue I set up a middleware, which adds incoming requests to the queue, with the concurrency being based on the MAX_TENANT_DATA_SOURCES:

import { Injectable, NestMiddleware } from '@nestjs/common';
import PQueue from 'p-queue';
import { MAX_TENANT_DATA_SOURCES } from './orm.config';
import { NextFunction, Response } from 'express';

@Injectable()
export class LimitMiddleware implements NestMiddleware {
  private readonly queue = new PQueue({
    concurrency: MAX_TENANT_DATA_SOURCES,
  });

  public async use(
    _: Request,
    res: Response,
    next: NextFunction,
  ): Promise<void> {
    await this.queue.add(
      () =>
        new Promise<void>((resolve) => {
          res.addListener('finish', () => resolve());
          next();
        }),
    );
  }
}

src/limit.middleware.ts

One annoying thing about this package is that it’s only packaged as an ESM module, and not CommonJS, which doesn’t work with NestJS out of the box. There are a couple of ways around this, but for simplicity I’ve chosen to use Node 22 and set the --experimental-require-module flag.

Now for every request that comes in, we add a promise wrapping the next() function to the queue. This next() function is what triggers the rest of the request. That function doesn’t wait for the request to actually be done though, so for that we have to add a listener for the “finish” event on the response object. When that triggers, the request is done, meaning this queue item can be resolved.

A downside of this approach is that it accounts for the worst-case scenario: N requests for N different tenants. If all of those requests came in for the same tenant, they could all be processed at the same time, since that would only use a single tenanted data source. This could potentially be optimised, but that would come at the cost of a lot of complexity, and - if done incorrectly - at the cost of fairness: the guarantee that every request will eventually be processed.

It’s working!

With those last steps out of the way, everything works! To verify, I created a couple of scripts to create tenants and tenanted entities, and to retrieve all entities from all tenants concurrently. The request limiter seems to be doing its job nicely, and all requests resolve without any errors.

A note on TypeORM support

Before ending up with this setup, I of course checked if TypeORM has a supported - or at least standard - way of setting up schema-based multitenancy. I came across a handful of issues on GitHub (e.g. this one and this one) asking for something similar (or proposing approaches to this), but the takeaway seems that it’s simply not a built-in feature, and not something they plan on adding explicit support for. While I’d love to see this being a real feature, I can also understand that the majority of people using TypeORM wouldn’t actually be using it, so it makes sense that they don’t want to spend resources on it.

Something that would also be very interesting is to see if and how other ORMs like Prisma or MikroORM could be used for schema-based multitenancy. But since I have a lot of experience with TypeORM, I decided to stick with that for now. Maybe in a future project though.

Alternative approaches

There are probably various other approaches you could take, but listed here are the main (seemingly viable) alternatives I came across.

PgBouncer

As an alternative to request limiting, I played around with PgBouncer. PgBouncer is a connection pooler for PostgreSQL, which - in theory - should be able to achieve similar results as using a request limiter. I experimented with this using the bitnami/pgbouncer Docker image, and while functionally it did seem to work, it added significant delays to almost all requests. I haven’t spent too much time on this, but my working theory is that it keeps idle connections open for too long, which results in other tenant connections being delayed. If properly configured it might work as well as (or better than) the approach described in this post, but that would require more investigation.

Row security policies

Row security policies is a PostgreSQL feature which allow you to restrict operations on rows on a per-user basis. I haven’t looked into this very much, but from a brief look this seems like it could potentially provide some of the same benefits as schema separation. The main downside of this approach would be that you have to connect using different users, which means you’ll run into the same connection management issues as with schema separation.

In conclusion

All things considered, I really like this approach to multitenancy! The setup required quite a bit of investigation and debugging, but now that I have set it up and know how it works, it does work really well. Some things remain to be seen though, especially how well performs with a real-world load, and how the long-term maintenance will go, but I’m optimistic!

If you want to use this setup for yourself, or if you think something can be improved, the code can be found on GitHub: https://github.com/LucaScorpion/nestjs-typeorm-schema-multitenancy