Splitting Light: Season 1 - Episode 7


Splitting light

Season 1 Episode 7

Bright red boards

If you are no longer interested in the newsletter, please unsubscribe

The bright red boards of the second generation hardware started to come in. Why red? Simple. If the board was green we could share photos, if it was red it was forbidden to share any pictures or details about it. Simple enough.

I was busy performing the last handover bits of the first generation hardware to the cloud team while Greg was doing the first power up of the main board. I was eager to work on the board but I did not have the skills (yet) to do the checks and first power up. He would test different positions on the board with a multimeter to make sure there were no short circuits or other defects.

Once he got the board power up for the first time, I jumped in and started working towards the first boot. This time, I was completely by myself. The Internet was useless to get help. The vendor CPU for the board was under NDA. The only information that I had was from our schematics, the vendors documentation and the vendor source code. Some of our tools could also be helpful to detect signal or code errors.

To make that first boot, I had to modify a version of Linux that was already very heavily modified by the vendor. I had to dive into the schematics to know how he had wired some of the chip pins, then cross reference the pins in the pin datasheet and then edit the code accordingly to add devices, functionality or change settings. Back and forth between PDFs, vendor source code and the operating system code. I was building the software, testing the expected hardware values and following the chip setup sequence.

The chip in question was a specialized chip, an ASIC, that did hardware networking. There was a simple power up sequence to get the operating system running, but then you had to power up the network part of it which required following a detailed multi step sequence. It was a dance between our code, the multi platform vendor code and the operating system code. One flag set wrongly, one value not written correctly and it just didn't work. No errors, no crashes, it just didn't work.

Every component was standard and documented. What made the difference was how you wired everything together and configured the components. Because of the developpement cycle of hardware, the most important thing at that time, was to validate every electrical signal on the board. Every feature that was software only and didn't require a dedicated track on the board could wait.

Any mistake in how the components were connected together had to be found. Or at least, we needed to find as many issues as we could. We would add small wires and solder some tracks and components differently to fix the issues we needed to continue to test and at the same time Greg would patch the design in the EDA tool. The bug was labeled and only closed when we got the next revision of the board a few months later and after we had checked that the patch was correct.

As we had tested as much as we could, we slowly started to write actual software features. I knew how networking worked basically, but I did not expect that my next focus would be a great dive into how networking worked.

If you have missed it, you can read the previous episode here

To pair with :

  • Sunburn (Timo Maas'Sunstroke Remix) - Muse, Timo Maas
  • Mockingbird by Walter Tevis

Vincent Auclair

Connect with me on your favorite network!

Oud metha, Dubai, Dubai 00000
Unsubscribe · Preferences

Symbol Sled

Business, tech, and life by a nerd. New every Tuesday: Splitting Light: The Prism of Growth and Discovery.

Read more from Symbol Sled

Splitting light Season 2 Episode 23 Beat the cluster to a pulp If you are no longer interested in the newsletter, please unsubscribe With proper observability we could now push the cluster even further. This was the final set of tests that we would perform before wiping everything and going to beta after a new setup. We huddled and concocted a strategy. Picked up our tools and went on the field to beat the cluster to a pulp one last time. Our goal was explicitly to overwhelm the cluster as...

Splitting light Season 2 Episode 22 Too many logs If you are no longer interested in the newsletter, please unsubscribe I’ve rarely seen people talk about this effect. The effect being the amplification of requests. This effect can overwhelm your system. We had to deal with it. The object storage, at least OpenIO, was a collection of distributed services. You might call them micro services if you want. That had implications. When a request comes in, from the user perspective, it’s a single...

Splitting light Season 2 Episode 21 All nighter If you are no longer interested in the newsletter, please unsubscribe As we were moving forward, in mid June 2018, we hit a point where we needed to be able to check the logs of the cluster as a whole. The way we had done it until then was manually connecting to the machines and opening the right files to look inside. This was no longer viable. One of the main office rooms (1) Scaleway’s monitoring team had done a metric stack which we already...