Business, tech, and life by a nerd. New every Tuesday: Splitting Light: The Prism of Growth and Discovery.
Share
Splitting Light: Season 1 - Episode 7
Published about 1 year ago • 3 min read
Splitting light
Season 1 Episode 7
Bright red boards
If you are no longer interested in the newsletter, please unsubscribe
(Very) Rough electronic schematics
The bright red boards of the second generation hardware started to come in. Why red? Simple. If the board was green we could share photos, if it was red it was forbidden to share any pictures or details about it. Simple enough.
I was busy performing the last handover bits of the first generation hardware to the cloud team while Greg was doing the first power up of the main board. I was eager to work on the board but I did not have the skills (yet) to do the checks and first power up. He would test different positions on the board with a multimeter to make sure there were no short circuits or other defects.
Once he got the board power up for the first time, I jumped in and started working towards the first boot. This time, I was completely by myself. The Internet was useless to get help. The vendor CPU for the board was under NDA. The only information that I had was from our schematics, the vendors documentation and the vendor source code. Some of our tools could also be helpful to detect signal or code errors.
To make that first boot, I had to modify a version of Linux that was already very heavily modified by the vendor. I had to dive into the schematics to know how he had wired some of the chip pins, then cross reference the pins in the pin datasheet and then edit the code accordingly to add devices, functionality or change settings. Back and forth between PDFs, vendor source code and the operating system code. I was building the software, testing the expected hardware values and following the chip setup sequence.
The chip in question was a specialized chip, an ASIC, that did hardware networking. There was a simple power up sequence to get the operating system running, but then you had to power up the network part of it which required following a detailed multi step sequence. It was a dance between our code, the multi platform vendor code and the operating system code. One flag set wrongly, one value not written correctly and it just didn't work. No errors, no crashes, it just didn't work.
Software sandwich between the chip and the features
Every component was standard and documented. What made the difference was how you wired everything together and configured the components. Because of the developpement cycle of hardware, the most important thing at that time, was to validate every electrical signal on the board. Every feature that was software only and didn't require a dedicated track on the board could wait.
Any mistake in how the components were connected together had to be found. Or at least, we needed to find as many issues as we could. We would add small wires and solder some tracks and components differently to fix the issues we needed to continue to test and at the same time Greg would patch the design in the EDA tool. The bug was labeled and only closed when we got the next revision of the board a few months later and after we had checked that the patch was correct.
As we had tested as much as we could, we slowly started to write actual software features. I knew how networking worked basically, but I did not expect that my next focus would be a great dive into how networking worked.
If you have missed it, you can read the previous episode here
To pair with :
Sunburn (Timo Maas'Sunstroke Remix) - Muse, Timo Maas
Splitting light Season 2 Episode 23 Beat the cluster to a pulp If you are no longer interested in the newsletter, please unsubscribe With proper observability we could now push the cluster even further. This was the final set of tests that we would perform before wiping everything and going to beta after a new setup. We huddled and concocted a strategy. Picked up our tools and went on the field to beat the cluster to a pulp one last time. Our goal was explicitly to overwhelm the cluster as...
Splitting light Season 2 Episode 22 Too many logs If you are no longer interested in the newsletter, please unsubscribe I’ve rarely seen people talk about this effect. The effect being the amplification of requests. This effect can overwhelm your system. We had to deal with it. The object storage, at least OpenIO, was a collection of distributed services. You might call them micro services if you want. That had implications. When a request comes in, from the user perspective, it’s a single...
Splitting light Season 2 Episode 21 All nighter If you are no longer interested in the newsletter, please unsubscribe As we were moving forward, in mid June 2018, we hit a point where we needed to be able to check the logs of the cluster as a whole. The way we had done it until then was manually connecting to the machines and opening the right files to look inside. This was no longer viable. One of the main office rooms (1) Scaleway’s monitoring team had done a metric stack which we already...