As the WONDR platform has grown, so too has the requirement to constantly consider whether the features that were fit for purpose yesterday will be fit for purpose tomorrow.
When our messenger first debuted back in 2018, its purpose was to convey single messages between individuals. It operated a bit like e-mail: as a user you log in, navigate to your messages, check out what’s new, then leave.
Over time, we realised that our users expected something closer to an instant messenger. So we made incremental adaptations: we added push notifications, we polled for new messages, we updated our database’s character set to support all manner of emojis, and we built a sleek chat interface in our new mobile app.
But under the hood it remained the same old code that was designed for a much slower, far more infrequent form of communication. And, as our user numbers continued to grow, we found that the compute power required to keep the legacy messenger operational simply wasn’t scalable.
Enter the gopher
The legacy WONDR API is comprised primarily of a PHP monolith. While this application works quite well for the CRUD operations that make up most of the WONDR experience (creating, organising and viewing posts, for example), when it comes to high traffic endpoints that require a low latency, it sometimes falls short.
Over time, we began to experience this with our messenger endpoints. With an average amount of conversations, we were seeing some requests take up to seven seconds to load.
Noticing the trend towards conversations, we took the decision to rebuild our messenger from scratch as a Golang microservice. We decided to intentionally keep things simple and avoid unnecessary overhead, meaning that we might end up writing more code, but it would be extremely efficient.
Our goal was to support around 100x the amount of active users, utilising considerably less compute power on both our servers and database to do so.
So how did we do it?
Eliminating the unnecessary
General purpose frameworks are great. They allow you to quickly scaffold ideas and get them in front of users. For a startup there is no doubt that they are a sensible choice. But they come with a trade-off: once real users hit the code, the solution tends not to scale well.
The problem is that a general purpose framework doesn’t know the exact purpose of your application, so it anticipates any and all common operations you might ask of it. It does this so you can rely on the magic it performs behind the scenes, and write the least possible code.
You can sometimes sculpt the framework to be more efficient, but there comes a point where you spend more time hunting down and fixing inefficiencies than coding new features.
This was the case with the legacy WONDR messenger. So when we decided to rebuild, it was vitally important that no unnecessary overhead or magic crept in.
Store and query as little as possible
Those seven-second load times were often due to unnecessary data retrieval. Of a user, we only really need to know their name, UUID and the path to their profile picture. But our legacy messenger was retrieving and returning their full profiles, once for each message the user had sent. And as that data was spread across multiple tables, hundreds or even thousands of magic database queries were being run to piece it all together.
A great timesaver when bootstrapping an MVP, but problematic at scale.
If you can imagine a conversation that contains only a hundred or so messages, you can appreciate how large that payload would become, and how many database queries would be run.
To avoid this issue in our new messenger, we opted for a simplified database structure. A chat (in our new messenger, named a channel) needs only a name, an optional list of UUIDs to describe the spaces and/or communities to which it belongs, and the dates on which it was created and last updated. Similarly, a message only requires the message text, dates, an optional list of linked assets (e.g. uploaded photos), the UUID of the channel to which it belongs and the UUID of the user who sent it.
You may notice that we only identify users by UUID — nowhere in our new messenger do we store user details. This enables us to retrieve all relevant data with a single query, and more importantly enforces separation of concerns: the messenger API shouldn’t have to know anything about users, because the primary WONDR API already holds that information.
To further reduce the need for unnecessarily heavy queries, we opted to lazy-load messages on scrollback and deliver incremental updates (new and edited messages) to the UI, rather than polling for the entire channel contents every few seconds.
The moment of truth
When we launched, the stats spoke for themselves. Alpha testing on a replica of production data, the seven-second latencies shrunk to around 40ms. And most importantly, no data had been sacrificed. We were simply retrieving exactly what we needed, when we needed it, and only once.
When real users began to hit the new messenger microservice, the endpoint latencies did increase a small amount, but maintained the two-orders-of-magnitude improvement that we had achieved in our siloed tests.
There will come a point where we need to make even more aggressive optimisations to the microservice to cope with increased load and higher user numbers, but for now we can stop worrying about capacity issues and focus on adding new and improved features, like custom channels, message reactions and image uploads!