Splitting light

Season 1 Episode 28

Juggling code

If you are no longer interested in the newsletter, please unsubscribe

Three years into my tenure in the lab, we had launched many products. From the revolutionary C1 where we had 900 nodes per rack to the cold storage product. We had two compute devices production, one network device and one storage device. That did not count one project that had been terminated and the SCADA devices which I did not work on. There were also two upcoming devices, the C3 and another network device where we would punch above the 40gb link speed.

Both for the C2 and C3 we had multiple configurations of the individual nodes and two customers. The cloud team which provided billed by the hour resources and had been formed when I had been recruited. There was also the “dedicated” team which rented hardware billed by the month and was the oldest team in the company. Both teams worked independently and most importantly worked differently. Between them, you had the network team which handled the network equipment.

When we handed over the product, we had to continue doing the support and fix bugs that the teams could not handle. This took some time from us. Yet we were still building more products, each more complex than the preceding one. We reached for more compute out of every watt and more compute out of every cm3.

It was a tough job. Jumping from product to product, team to team, handling Carbon14, learning new concepts and soldering boards… I had to juggle all this and learn to prioritise even better than I had been doing before. It was a never ending push forward.

This was possible because of how the work was organized. The sequenced dance of steps to design, test, code, manufacture and add features was complex but well mastered. The code bases were structured identically and files named identically across devices. It was less hard to switch in and out of each device.

Another component was the gradual ramp up in skill and knowledge. Debugging with an oscilloscope was hard and complicated to set up the first time but after five or six times, it was “just” a procedure I would apply. Same for patching the linux kernel or interacting with the ASIC’s memory.

Even with all this structure and the ease of skill it was still hard. I liken this to a mental burden. Each part has its own constraints and specificities. Each time I had to jump into something, I had to load those parts into my brain. Focus back on the details, on the code, on the wires in the PCB because those were the important details. Those had to be loaded and unloaded again and again. Each time losing concentration and mental energy. I did not fully understand the mechanism until a few years later but it was intuitive at that time. I would organize the work to be similar things to work on together, to make the switching easier.

Of course, no matter how well you organize things, when there’s issues in production, those take priority. Over time, the operators could use more and more diagnostics features and tools but they needed help to understand how and when to use them. To do that, documentation was needed and how it was done mattered.

If you have missed it, you can read the previous episode here

To pair with :