Business, tech, and life by a nerd. New every Tuesday: Splitting Light: The Prism of Growth and Discovery.
Share
Splitting Light: Season 1 - Episode 6
Published about 1 year ago • 3 min read
Splitting light
Season 1 Episode 6
Seemingly impossible to do
If you are no longer interested in the newsletter, please unsubscribe
Before joining online.net’s lab, I had mostly worked as a backend software engineer. Because I had changed only a few months back, I could see the stark contrast in the developpement cycle of hardware and software.
The C1 based hardware had been in the works for several years already, I had joined just before the finishing line. The next generation, the C2 based hardware, had been in the works for more than a year and the only result at that time was some EDA and CAD models. I remember this phrase from Greg : “In software, when you want to test your code, you hit a few commands and seconds later the software is running and you can test it. In hardware, to test, you hit a few commands, wire 10,000E and a few months later you get the hardware delivered by UPS and then you can start testing”. To be able to accomplish this efficiently you needed a completely different workflow than for writing software. That workflow, sometimes, meant that we blindly wrote hundreds of lines of code not knowing if it would work until months later when you actually got the prototypes on your desk.
Hardware pressure point, you have to pinch it
For the C2 hardware, we wanted to reduce the custom software and protocols that had to be implemented by the cloud team and the baremetal team, which in turn would reduce the time to market. To accomplish this, Greg explained to me in a meeting room that I would have to write a TCP stack by hand. I looked at him puzzled and told him that this was something that code wizards would write and that it was very complicated. He answered back that it was just a state machine and that I would be fine.
Still skeptical about the simplicity of the code, I started to work on it. With a website explaining each state, I wrote the code in the vendor editor. I wrote state after state, the only check that I could do was that the syntax was correct.
After a few days, we got an evaluation board with a chip of the same family that we were going to use in the hardware. With that board I was able to start testing. To test, I had to solder half a dozen very thin wires on the board and use a digital oscilloscope. I had to analyze the data coming in and out using several tools.
Hardware GDB
It was a precision job. The data had to be correct down to the individual bit. That meant moving my fingers across the screen checking each packet of data byte by byte. At last, I was able to establish a connection. Then I was able to terminate one appropriately. Then I typed in code to handle errors and unexpected data.
Greg then asked me to implement an HTTP server. I remembered the protocol from a university project where we had written the same exact software. I thought about it and wrote an extremely simple version of it. I tested, and fixed the issues and was eventually able to have a page from the chip that opened on my browser.
Reflecting back, writing that piece of software was not hard because it was a complex piece of software, it was hard because it was complicated to test. With the right tools and method it became possible. From this I gained the insight that things might seem impossible but it depends on how you look at them.
If you have missed it, you can read the previous episode here
Splitting light Season 2 Episode 23 Beat the cluster to a pulp If you are no longer interested in the newsletter, please unsubscribe With proper observability we could now push the cluster even further. This was the final set of tests that we would perform before wiping everything and going to beta after a new setup. We huddled and concocted a strategy. Picked up our tools and went on the field to beat the cluster to a pulp one last time. Our goal was explicitly to overwhelm the cluster as...
Splitting light Season 2 Episode 22 Too many logs If you are no longer interested in the newsletter, please unsubscribe I’ve rarely seen people talk about this effect. The effect being the amplification of requests. This effect can overwhelm your system. We had to deal with it. The object storage, at least OpenIO, was a collection of distributed services. You might call them micro services if you want. That had implications. When a request comes in, from the user perspective, it’s a single...
Splitting light Season 2 Episode 21 All nighter If you are no longer interested in the newsletter, please unsubscribe As we were moving forward, in mid June 2018, we hit a point where we needed to be able to check the logs of the cluster as a whole. The way we had done it until then was manually connecting to the machines and opening the right files to look inside. This was no longer viable. One of the main office rooms (1) Scaleway’s monitoring team had done a metric stack which we already...