How to Handle Multiple Webhooks At Scale đ
In the recent years there has been a trend moving from using APIs where you have to constantly request data from a third party service, towards Webhooks where the third party service you are using instead will send a POST request to your url of choice whenever there is any change you subscribed to (eg. a new Github issue being created, a new commentâŠ).
Thatâs what we are doing at đGitstartđas well, and hereâs how we are building our infrastructure to run webhooks from multiple sources, at scale.
For simplicity purposes we will be using Github webhooks as an example đ
Most code examples are in NodeJS & Typescript. We are using Hasura as our backend GraphQL engine to scale our PostgreSQL database, allowing our webhooks solution to be truly scalable!
A key point about Hasura is that it allows us to utilise subscriptions which is âessentially a query where we receive an update whenever the value of any field changes upstreamâ.
0. How to Webhooks�
As we mentioned earlier, webhooks basically allow you to subscribe to events youâre interested in from third-party APIs or services.
And hereâs a quick overview on âHow to Webhooksâ:
- First you create a webhook on your third party service and point it to end endpoint (eg. https://your.url/subscription/webhooks/[some_id]).
- Now that the webhook is initialised on the service (Jira/ Github/ Zapier/ Zoom⊠you name it!), we need to first list out all the different webhooks we are going to work with
- then we will deal with the webhook events coming in to our url (Github issue being created, new Github branch being created etc). Documentations of the webhook event payloads often are available for the services that support webhooks (hereâs Githubâs!).
- You can choose to save all those events to your database temporarily, then address them separately (which is how we are doing it at Gitstart).
- Lastly you need to decide how you would like to process each event, either upserting each entry according to action to a respective database table, or trigger a hook or an actionâŠ
Why do we save webhook events to database? Canât we just process the webhook events directly? We chose to save events to an webhook_events
table before processing them because in case any errors while processing events, we can safely retry without losing any events. Itâs also a good way of balancing out our internal systems in case there is a spike of webhook_events
that compromise the performance of other workflows.
An alternative and even more scalable way would be to have an extra step before the saving webhook_events
into the database by utilising a message queue. đ©
1. Designing the Infrastructure for Webhooks
Database Infrastructure
- A table for all Webhooks, example fields include but not limited to:
- id // unique id for Github or other sources
- source
- A table for all Webhook Events, example fields include but not limited to:
- id
- eventType
- eventSubType
- isIgnored
- mergedAt
- data
- webhookId
- (Optional): Check the types for the data you may want to capture from the third party services. Look at @octokits/types for all pre-defined types for Github.
ThirdParty App Webhooks Infrastructure
- Setup a Webhook Url pointing from the third party appâs settings: eg. https://your.url/subscription/webhooks/[some_id]
- (Optional but highly recommended) : Setup Webhook Secret Token
2. Dealing with Webhooks, 1 at a time
For each webhook_event that comes through, say you are dealing with Webhooks from Github, Jira, Zapier, Zoom⊠etc. You need to first divert them into different buckets to be able to apply custom actions to them in parallel. Before we store the incoming webhook_event into our webhook_events
table, for security reasons, you should:
- Setup Github Webhooks Secret Token here đ€«
- Make sure the header of the Response to include secret signature eg
x-hub-signature
With that in place, you can insert the following into the webhook_events
table (Payload of a push event on Github can be found here):
- unique webhookId for Github (as you are designing for multiple datasources, this is important)
- eventType (eg
push
) - eventSubType (eg
added
) - data (aka the part of the
Response
payload that is useful data)
Why do we do this step? By doing this, we are creating a backlog of webhook_events which are all secure events that are using the right secret token. This is useful for in case your event processing pipeline was down, or in case there was an error processing your realtime webhook_events, all the webhook_events are still stored in the database and you can pick it up again when you fixed the processing part.
3. How to... processThoseWebhookEvents()
Now that all the webhook events are stored in our webhook_events
table, before we start processing those webhook events, we need to first query (/listen to) the database for all the events, filter through and choose the webhook events we are going to process, and then write the respective processing code.
Why do we do this step? Oftentimes our webhooks will receive more events than we need, or at different stages of development, we would like to first ignore some less important events, etc.
- At Gitstart we are using Hasura for our backend GraphQL, where we can subscribe to the table and have a stream (Observables) of
webhook_events
coming in instead of querying it once) - We need to set the source to specify the webhook sources we are dealing with. In our example, we only have Github so far, but our infrastructure is extendable with other sources like Gitlab, Bitbucket, Zoom, etc.
const SOURCE_PROCCESSOR: {
[key: string]: { [key: string]: (data: WebhookEvent) => Promise<ProcessorResponseType>; };} = {
github,
};
Note: each source points to its own file where we specify all the webhook subevents that we will process, or ignore.
- We will filter through the
webhookEvents
when we query for (/subscribe to) the database with the keys ofSOURCE_PROCESSOR
.
const consumer = subscribe_or_query_to_your_webhook_events_table();
// filter done here
- Of all those events we will first
processWebhookFromVariousSources
then go toprocessWebhook
(for each source) ⊠see in code snippet below - We will then check if each
webhook_event
has been process already, which we can have a helper function to track the flagtimeOfLastEventProcessed
to hav. - Later we will
processEvent
individually according to the differenteventTypes
and actions under the Github file. - Lastly, upsert back to
webhook_events
table to show that youâre done with this event (eg with fields likeupdatedAt
,mergedAt
,isIgnored
âŠ)
Here is some pseudocode to help illustrate our design đš. In the code you can see, we are processing different sources in parallel, which really speeds up lots of things.
4. Structure all eventType
s you will be working with in a single file for the source (eg: Github)
export default {
pull_request: processPR,
// processPR refers to the file where we actually, finally, design how to process events related to the PR! issues: processIssue, // ignored events
team: async () => ({ isIgnored: true }),
}
Why do we do this step? You can easily manage all the eventTypes
youâre dealing with in a single file, link the right processing files, control what type of events we want to fetch from the database and also make note of what are the eventTypes
to skip.
5. Finally we can process the event!
Say we are in the processPR
file â under this eventType
, there are a few different eventSubType
s, which you can first have a handler to filter through each webhook_event
and only work on the specifiedeventSubTypes
(eg. created
, closed
etc).
As mentioned earlier, you can decide how you want to process each event. A common one is to store the data of that event into the respective table you have created in your database for further manipulation. Or you can trigger an internal hook, or an action!
6. Final Notes
- During development you may want to point the URL endpoint to expose your localhost with something like serveo.net or localtunnel
Credits to Arslan & Hamza @ Gitstart for the webhooks infrastructure design & implementation đĄ
Thanks to team members @ Gitstart for taking their time to review this article! đ„°
If youâre interesting in learning more about how Gitstart can help accelerate your companyâs tech team efforts autonomously â check us out at gitstart.com