Splitting Light: Season 2 - Episode 21


Splitting light

Season 2 Episode 21

All nighter

If you are no longer interested in the newsletter, please unsubscribe

As we were moving forward, in mid June 2018, we hit a point where we needed to be able to check the logs of the cluster as a whole. The way we had done it until then was manually connecting to the machines and opening the right files to look inside. This was no longer viable.

Scaleway’s monitoring team had done a metric stack which we already used heavily, but their logging stack wasn’t ready yet. So, one day, we decided to build our own. I looked at the existing solutions and quickly went for Elasticsearch plus Fluentd. Spent the day working on the automation and doing the right configurations. As the day came to an end, I stayed at the office determined to finish this task to have this “feature” that we had been waiting for so long.

By the early night, I was ingesting my first logs. By the middle of the night, I was accurately extracting the different elements in the logs. By the early morning we had a basic dashboard and logs streaming in a single location where we could query and display information.

Then came the knob tweaking. The most important element of the product was S3. If a user could not store or fetch data it was useless to them. How you connect elements together matters. Had I decided to send logs in TCP, I could get congestion and this would have a negative impact on the product. If the log ingestion was not working correctly it would break other elements. It would be tightly coupled systems.

But if the monitoring part did not work for whatever reasons, did I really want the product to be impacted? Did I really want requests not going through if the logs were broken? Not in our context. For me, for us, we would rather have a functional product and loose logs than break the product and still have logs.

There is a simple way of decoupling those elements. Send the logs over UDP. They were emitted from the source and if they cannot not be handled on the receiving side, they were simply dropped. Lost. No extra software. No queues. No added complexity. Just using a specific transport protocol. This fixed the issue.

I switched the knobs to the right settings. Added the right log configuration elements and that was it. We could now fix more issues because, simply, we could see them. We could also search issues and see logs that previously were spread over several machines, now aggregated in a single location.

Then, we wanted to backup the logs, so I looked at an Elasticsearch feature to backup in S3. At the time the implementation was missing a few knobs to work with a non AWS S3. I dove into the code, added the right logic in the backup plugging and rebuilt the plugging. Plugged it in and we started backuping logs in our S3.

Very quickly though, we were getting too many logs. It was overwhelming the system.

(1) Photo by Quentin

If you have missed it, you can read the previous episode here

To pair with :

  • Chella Ride - Dog Blood
  • Energiya-Buran: The Soviet Space Shuttle by Bart Hendrickx, Bert Vis

Vincent Auclair

Connect with me on your favorite network!

Oud metha, Dubai, Dubai 00000
Unsubscribe · Preferences

Symbol Sled

Business, tech, and life by a nerd. New every Tuesday: Splitting Light: The Prism of Growth and Discovery.

Read more from Symbol Sled

Splitting light Season 2 Episode 28 Side quests If you are no longer interested in the newsletter, please unsubscribe After shipping the hardware to Amsterdam, we quickly launched private and public beta. We were the first product to launch to public beta in November 2018. Database as a service was not far behind. Image of the Object Storage in public beta! (1) Théo (a) had instructed the customer success team to forward almost all support tickets to us. We did level 1 (L1) support. Every...

Splitting light Season 2 Episode 27 OpenIO festival If you are no longer interested in the newsletter, please unsubscribe La maison Iena was a very big office. In October 2018 we hosted OpenIO for a full day conference.The OpenIO festival. They came from Lille and invited customers and users. There were talks and demos. OpenIO Summit 2018 at La Maison (1) I remember sitting in a few of these talks. By October 2018, we had already significantly dived into the code. We had upstreamed a few...

Splitting light Season 2 Episode 26 Entrepreneurs inside Scaleway If you are no longer interested in the newsletter, please unsubscribe Around September 2018 What we were doing as the storage team at Scaleway was a product of both the context and the time we were in. We were almost acting as a startup within an incubator. We did whatever was necessary to move on. Launching in Amsterdam instead of Paris first was one of the examples. We wanted the product to be live so customers could use it....