Splitting Light: Season 1 - Episode 3


Splitting Light

Season 1 Episode 3

one ring to connect them all

If you are no longer interested in the newsletter, please unsubscribe

It was now time to qualify the last critical element in the hardware set. It needed to be done quickly as we wanted to send the manufacturing order as soon as possible. I was tasked with accomplishing that last qualification. The rest of the team was already working full time on the next generation of hardware. This seemingly simple task thought to only take me a few days ended up being a three month long ordeal.

I started to change the software as the 50,000 page long PDF chip documentation indicated. I added a few lines of code in my editor. Executed a few commands and restarted the fifteen blades already present in the chassis. I watched the LEDs turn green one by one. I connected to one or two blades and checked the software version. Everything looked right.


I did the last step. I pushed the last blade in. The LEDs didn't turn green. I waited a bit. Still nothing. I went back to my laptop to connect to the blade. As the unexpected started to add up, I had this slow realization that it was not working. The last clue was that the previously plugged blades were no longer reachable. I was stumped.

I went backwards. I check the version, then the code, then the documentation. Everything was configured as expected yet it did not work. I reproduced the setup. Blade by blade, waiting for the green LEDs, checking the software version and some other checks. Blade after blade until the least one was in my hands. I rechecked everything and pushed it in. I watched the LED, it stayed red. This time, I noticed that the network activity LEDs were blinking like crazy everywhere else. This was the clue I needed to know where to dive in.

A check on the uplink network equipment revealed the issue. Pieces of the puzzle clicked together. By sliding the last blade I had closed the gap in the internal network. That had caused network traffic to loop at full speed inside. I understood the problem. Now I needed to find a solution to that problem.


Hours transformed into days. I was pouring through the PDFs, control+F was my best friend. Days transformed into weeks. I poured into the vendor code, checking configuration and values. Weeks transformed into a month. The pressure was building. Each of my tests and fixes did not solve the issue. I was stuck.

Greg sent an email to our vendor support engineer to introduce me. I replied with my issue and the steps I had done. We exchanged a few times and I jumped on a solution he had suggested. After understanding how to write the code, I typed it and tried again. I got the same result, it wasn't working either.

We exchanged more, he suggested a number of solutions. Each of his suggestions required more and more arcane code. Not only was the code arcane but the places where I had to insert it were also becoming weirder and weirder. I felt I was going behind the matrix and seeing symbols that I did not fully understand. I didn’t understand it because it was a domain I had never worked on.

If you have missed it, you can read the previous episode here

Question :

Have you worked on a problem that not matter how much effort you put into it, nothing fixes it?

To pair with :

  • Session - Linkin Park
  • The brothers Karamazov by Fyodor Dostoevsky

Vincent Auclair

Connect with me on your favorite network!

Oud metha, Dubai, Dubai 00000
Unsubscribe · Preferences

Symbol Sled

Business, tech, and life by a nerd. New every Tuesday: Splitting Light: The Prism of Growth and Discovery.

Read more from Symbol Sled

Splitting light Season 2 Episode 23 Beat the cluster to a pulp If you are no longer interested in the newsletter, please unsubscribe With proper observability we could now push the cluster even further. This was the final set of tests that we would perform before wiping everything and going to beta after a new setup. We huddled and concocted a strategy. Picked up our tools and went on the field to beat the cluster to a pulp one last time. Our goal was explicitly to overwhelm the cluster as...

Splitting light Season 2 Episode 22 Too many logs If you are no longer interested in the newsletter, please unsubscribe I’ve rarely seen people talk about this effect. The effect being the amplification of requests. This effect can overwhelm your system. We had to deal with it. The object storage, at least OpenIO, was a collection of distributed services. You might call them micro services if you want. That had implications. When a request comes in, from the user perspective, it’s a single...

Splitting light Season 2 Episode 21 All nighter If you are no longer interested in the newsletter, please unsubscribe As we were moving forward, in mid June 2018, we hit a point where we needed to be able to check the logs of the cluster as a whole. The way we had done it until then was manually connecting to the machines and opening the right files to look inside. This was no longer viable. One of the main office rooms (1) Scaleway’s monitoring team had done a metric stack which we already...