What are Webhooks? How do I use one?

Photo by Ketut Subiyanto: https://www.pexels.com/photo/city-man-people-woman-4963437/

Payments - A Little Story

It’s a rainy evening in June, around three years ago, I’ve just gotten off my laptop after a coding sprint for a feature I was working on for the guys at Unergia. My phone buzzes, it’s a message on WhatsApp, fighting the urge to check it later after a break, I pick up the phone and see a message that reads “Devesh, we will likely need to set up Razorpay on our website, could you look into it?”. For some context, at that point, we were building a solar aggregation platform, i.e: Installers would put up their proposals for solar projects we had aggregated and the customers could choose the best one. That was the online-only part, and everything else, including the entire money collection part, had been manual.

I had never worked with payments before, and the fact that I would need to build a feature that would be directly responsible for users’ money was a daunting task. In hindsight, a lot of you reading this might know that modern payment gateways like Razorpay or Stripe make integration super simple and very well documented. But that’s the beauty of hindsight, it tells you things you would have been better off knowing right at the beginning.

Being the only developer there at the time, it took a little bit of researching and a lot of convincing myself to say “Okay, it might not be that hard, let’s get started.” And get started I did. I started integrating Razorpay’s standard checkout SDK into a new payment flow that was required, their API is sufficiently documented and my initial thought was that it’s going to be pretty simple encompassing the following steps:

If you’ve integrated payment services in your app or have ever made any payments online, you will notice a fundamental issue with the above flow, payments are not black or white, they can have multiple states, i.e: A payment could be put in a pending state because one of the intermediaries was down, the clearinghouse was down, the receiver bank was down, NPCI itself was down.

We’re very aware of the situations where we make a payment online, the money gets deducted from our accounts but we get confirmation of success much later or straight up a refund and the transaction is cancelled. This is exactly the case that I missed in my above flow, you might say this is a common rookie mistake of jumping in without fully analyzing all possible flows and you will be right. Given the time constraints, I had a hard time fully understanding all that could go wrong.

Nonetheless, I didn’t hit a wall with this issue because Razorpay or any payment gateway’s sandbox environments (Where you run tests as if you’re making actual payments minus the money) have no “in-progress” status either fail or pass. And I skipped the part of the documentation where they tell everyone to use webhooks or some mechanism to handle payments that might get stuck due to any of the numerous aforementioned factors.

Payments go live: After around 3-4 days of work and testing in a sandbox environment, I was ready to go ahead and push the changes to production. With a deep breath, I got ready for the first payment from a customer the next day. The night passed in constant fear, ticking off all that could go wrong, I did have the “in-progress” payments thing in mind, but I assumed it won’t go that wrong and most likely the payment will go through as expected. But I probably forgot the most important law that exists, Murphy’s law:

Anything that can go wrong will go wrong.

The next day came, and I was on call for the first automated payment to come through that our backend would have to reconcile with Razorpay’s backend the moment their SDK told us the payment was completed, how hard could it be and what could go wrong? I eased in a little.

I got a call that afternoon, “Devesh, the customer didn’t get any payment confirmation, his money got deducted though, can you check Razorpay?” I hurried to the Razorpay dashboard and to my horror, the first payment we got, was in the dreaded “In Progress” or “Pending” state, our backend had assumed that after the SDK gave the response that the payment went through, Razorpay’s backend would give a successful “capturing” response immediately, which was not the case.

We quickly manually captured the payment, turns out it took around a minute extra for the confirmation of that payment to kick in and sent the customer a success message manually.

Enter Webhooks

Consider this scenario, you’re in a dine-in restaurant, and you place an order. Unless you’re really hungry and desperate for food, you wouldn’t go to the counter every 5 minutes and keep asking whether your food is ready. Instead, someone will bring your order to you or at least tell you when the food is ready, you just place an order and go on doing whatever else you wanted to do in the meantime.

Similarly, in the case of the payment example, I highlighted above, it would have made much more sense if there was a way for our app to know when a payment’s status changes without having to ask Razorpay every 30-40 seconds which would not only be resource-consuming but also impractical. In such cases, enter Webhooks, they’re automated messages the external service will send us when some event we’re interested in occurs.

Webhooks are not just useful in payments, but also in a lot of other scenarios, like listening to your user’s activities on other platforms. For example: When you submit a form on Slack, there is an external server that receives a request with all the event information and sends back an appropriate response for Slack to work with. Similarly, Calendly uses webhooks to tell external apps that there’s a meeting scheduled on the link they shared so the apps can send emails to them and so on.

There are multiple ways Services offer webhooks, a few of them include:

app.get("/webhookendpoint", (request, response) => {
    const { event } = request.body;
    if (event === 'user_submitted_form')
        // ... Do one thing
    else if (event === 'user_updated_profile')
        // ... Do something else
});

As you might guess, discovering webhooks (Ironically, from the documentation itself) was a “Eureka!” moment for me, and the flow for managing payments now became:

This has been a great gift and I would say, webhooks have been great overall at enabling communication between external systems and our systems.

We later set up a similar but much larger infra for handling Payments for all our clients at Solar Ladder, all handled by a Server Less Cloud function for listening to payment confirmations from Razorpay.

We even went one step ahead, and disabled manual checks to Razorpay for payment successes and instead set up real-time listeners between our frontend and our database, the webhook handler does the processing, updates the database and then the users see those payment statuses reflect real-time.

Authentication

When you’re working with Webhooks, you might encounter a security issue, where you might wonder “How do I ensure that the request I get is from the service I want?” And for this, there are a few approaches, but most of them work with the concept that you provide the service with a key that only your backend knows, and to verify that they’re the ones hitting the server, they’ll send that key in the request’s headers or use that key to encrypt the request body on their end, and you can use the key to decrypt the request body when it reaches your server. Pretty simple communication overall.

The flow looks like this:

image.png

Idempotency

From MDN’s website:

An HTTP method is idempotent if an identical request can be made once or several times in a row with the same effect while leaving the server in the same state.

Sometimes, the services you subscribe to might have timeouts, that if your server doesn’t respond in 5 seconds or just outright gives a failure response (Any status code other than 200-399), they’ll do one of the following:

The issue of idempotency arises in the case of the second kind of webhook, picture this, you have a server that first takes a second to spin up (Probably even more if you have a Serverless Function like AWS Lambda or Google Cloud Functions that have cold start times) and then say 3.5 seconds to process the data given a lot about the application depends on payments, and then another second to send a response back. In such a scenario even though you performed all the operations as expected and promptly returned a response, the external service will mark it as having failed given the response time was over 5 seconds.

You might have guessed what might go wrong now, the external service retries the request but now, your server starts processing data that it already has previously, i.e: There is a possibility of inconsistency in your application if your server doesn’t handle retried requests. To mitigate this, idempotency has to be introduced.

Two simple approaches to avoid data inconsistency from webhooks are: