How to Handle Multiple Webhooks At Scale 📈

In the recent years there has been a trend moving from using APIs where you have to constantly request data from a third party service, towards Webhooks where the third party service you are using instead will send a POST request to your url of choice whenever there is any change you subscribed to (eg. a new Github issue being created, a new comment…).

That’s what we are doing at 🌟Gitstart🌟as well, and here’s how we are building our infrastructure to run webhooks from multiple sources, at scale.

For simplicity purposes we will be using Github webhooks as an example 👀

Most code examples are in NodeJS & Typescript. We are using Hasura as our backend GraphQL engine to scale our PostgreSQL database, allowing our webhooks solution to be truly scalable!

A key point about Hasura is that it allows us to utilise subscriptions which is “essentially a query where we receive an update whenever the value of any field changes upstream”.

0. How to Webhooks…?

As we mentioned earlier, webhooks basically allow you to subscribe to events you’re interested in from third-party APIs or services.

And here’s a quick overview on “How to Webhooks”:

Why do we save webhook events to database? Can’t we just process the webhook events directly? We chose to save events to an webhook_events table before processing them because in case any errors while processing events, we can safely retry without losing any events. It’s also a good way of balancing out our internal systems in case there is a spike of webhook_events that compromise the performance of other workflows.

An alternative and even more scalable way would be to have an extra step before the saving webhook_events into the database by utilising a message queue. 📩

1. Designing the Infrastructure for Webhooks

Database Infrastructure

- id             // unique id for Github or other sources
- source
- id
- eventType
- eventSubType
- isIgnored
- mergedAt
- data
- webhookId

ThirdParty App Webhooks Infrastructure

2. Dealing with Webhooks, 1 at a time

For each webhook_event that comes through, say you are dealing with Webhooks from Github, Jira, Zapier, Zoom… etc. You need to first divert them into different buckets to be able to apply custom actions to them in parallel. Before we store the incoming webhook_event into our webhook_eventstable, for security reasons, you should:

With that in place, you can insert the following into the webhook_events table (Payload of a push event on Github can be found here):

Why do we do this step? By doing this, we are creating a backlog of webhook_events which are all secure events that are using the right secret token. This is useful for in case your event processing pipeline was down, or in case there was an error processing your realtime webhook_events, all the webhook_events are still stored in the database and you can pick it up again when you fixed the processing part.

3. How to... processThoseWebhookEvents()

Now that all the webhook events are stored in our webhook_events table, before we start processing those webhook events, we need to first query (/listen to) the database for all the events, filter through and choose the webhook events we are going to process, and then write the respective processing code.

Why do we do this step? Oftentimes our webhooks will receive more events than we need, or at different stages of development, we would like to first ignore some less important events, etc.

const SOURCE_PROCCESSOR: {  
[key: string]: { [key: string]: (data: WebhookEvent) => Promise<ProcessorResponseType>; };} = {
github,
};

Note: each source points to its own file where we specify all the webhook subevents that we will process, or ignore.

const consumer = subscribe_or_query_to_your_webhook_events_table();
// filter done here

Here is some pseudocode to help illustrate our design 🎨. In the code you can see, we are processing different sources in parallel, which really speeds up lots of things.

4. Structure all eventTypes you will be working with in a single file for the source (eg: Github)

export default {
pull_request: processPR,
// processPR refers to the file where we actually, finally, design how to process events related to the PR!
issues: processIssue, // ignored events
team: async () => ({ isIgnored: true }),
}

Why do we do this step? You can easily manage all the eventTypes you’re dealing with in a single file, link the right processing files, control what type of events we want to fetch from the database and also make note of what are the eventTypes to skip.

5. Finally we can process the event!

Say we are in the processPR file — under this eventType, there are a few different eventSubTypes, which you can first have a handler to filter through each webhook_event and only work on the specifiedeventSubTypes (eg. created, closed etc).

As mentioned earlier, you can decide how you want to process each event. A common one is to store the data of that event into the respective table you have created in your database for further manipulation. Or you can trigger an internal hook, or an action!

6. Final Notes

Credits to Arslan & Hamza @ Gitstart for the webhooks infrastructure design & implementation 💡

Thanks to team members @ Gitstart for taking their time to review this article! 🥰

If you’re interesting in learning more about how Gitstart can help accelerate your company’s tech team efforts autonomously — check us out at gitstart.com

i write code and throw parties. currently coding @gitstart.com and writing a bilingual mental health newsletter