Splitting Light: Season 1 - Episode 33


Splitting light

Season 1 Episode 33

First power up to network up

If you are no longer interested in the newsletter, please unsubscribe

At that point we had three generations of compute, one generation of storage and one generation of network. The third generation compute, on my side, wasn’t much work. The screen code was quickly finalized and small bits of code were written to make the communication. The rest was mostly adding special case code paths and adding configuration to make the network ASIC work. Over time we had already built many abstractions that made us go faster.

Right around this time, we received the second generation network device. My oh my, it was a beast. I handled the red board in my hands. It was very heavy. The router had 48 ten gigabit ethernet ports as well as 2 ports able to handle up to 100 gigabit of traffic each. At the time, in 2017 it was close to bleeding edge. The board was heavy. It was heavy because many chips now required cooling radiators. The speed and power consumption of these required the chips to be cooled unlike the previous generation.

For the first time, I was handed a complicated board to start. I was told to make it work. I checked the circuit for shorts, then fired the IDE for the chip that controlled the power up sequence, copied the previous version of the code and started making some edits to adapt it to the board schematics.

Nothing happened at the first firmware flash. After fiddling a bit I asked Greg who immediately saw my mistake. I had not done a modification to a specific memory register and this could have caused the board to burn up. I fixed it immediately and had the first power up. With a few more modifications, I had my first boot. After checking everything that was not network related, I pivoted on making the network features work.

I immediately hit a wall. The vendor code we used was bleeding edge when we had built the C2, the second generation compute, but now, a few years later, it was too old. The network ASIC on this board required the new bleeding edge. It was a new generation for the manufacturer which required a new version of both the operating system and the vendor code. The chip was actually a test sample and not a production chip.

Unfortunately for us, we had extensively patched the vendor code over the years. Fixing mistakes here and there. We had marked those fixes with a simple code comment but we had not upstreamed these fixes. This meant we had to pull all these fixes onto the new code base. Maintaining one version was cumbersome, maintaining two different versions was worse. We decided that we would upgrade all the previous generations once we had done the version upgrade. It would cost us more initially but the total cost over time would be lower. Once I was able to get the new version running and patched basically, I started to be able to work on the network part.

Unlike the previous generation where we had both the cpu and the network chip on the same die, this time it was two dies, two chips for which we had to handle the power up sequence and configuration. After updating the kernel, the vendor code, our code, I was able to get a first packet moving through the device. Then I had to make sure each existing software feature we had worked on the new device. Then it was walking through every 10 gigabit port and lastly making the 100 gigabit ports link at 40 gigabit speed. The 100 gigabit speed would have to wait a bit.

One issue that we started having was to be able to generate such a huge amount of traffic to stress test the board and chip. We had to resort to pseudo random binary sequence (PRBS) generators that were available in the chip. After implementing the code, we were able to generate line rate and test the links. To do that we would check the packet counters. We watched for errors, packet checksum corruption, and we would compare emitted and received packets to find discrepancies.

Almost four years in, I did things I didn’t even know could exist before I had joined. I was working on, what was to me, very interesting devices and code. Increasing the scope and depth of my knowledge. I was pushing the limits of what I could do. I could feel the next steps. The next steps of working with EDA tools. Designing my first circuits. I thought it was close.

If you have missed it, you can read the previous episode here

To pair with :

  • Tightrope - Above & Beyond, Marty Longstaff
  • The Chronoliths by Robert Charles Wilson

Vincent Auclair

Connect with me on your favorite network!

Oud metha, Dubai, Dubai 00000
Unsubscribe · Preferences

Symbol Sled

Business, tech, and life by a nerd. New every Tuesday: Splitting Light: The Prism of Growth and Discovery.

Read more from Symbol Sled

Splitting light Season 1 Episode 34 Importance of upstreaming If you are no longer interested in the newsletter, please unsubscribe Once the second generation network device was working, the work to backport the new code library back to every other device began. That meant updating the codebase everywhere. This wasn’t too hard as the code was unified, only the configuration changed dynamically by looking at hardware flags or through communication with it using built-in protocols. Gen2 router...

Splitting light Season 1 Episode 32 Excel sheets to configure If you are no longer interested in the newsletter, please unsubscribe Assignment of fixed amount of TCAM memory to subsystems One of the specificities I thought the lab lab’s team had was that we didn’t try to be too smart. We used simple battle tested tools and common systems. The complexity was not in the tools we used but in hiding the complexity in the final output as much as we could. We used excel for multiple things. We used...

Splitting light Season 1 Episode 31 Piercing the abstraction layers If you are no longer interested in the newsletter, please unsubscribe We build abstraction layers in software. We build them to hide the underlying complexity or limits. Operating systems are done to hide the complexity of interfacing with different devices. A SaaS hides the complexity of running and enhancing the software in exchange for a fee. From that you can derive multiple other advantages. On one side, the layers make...