We build abstraction layers in software. We build them to hide the underlying complexity or limits. Operating systems are done to hide the complexity of interfacing with different devices. A SaaS hides the complexity of running and enhancing the software in exchange for a fee. From that you can derive multiple other advantages. On one side, the layers make it easy to have scale economics and on the other side you can raise the number of features that are possible to implement within a fixed time window.
For me, it’s a good sign of expertise and seniority when the abstractions are done at the right place within the context of the program, team or maturity of the code base. A lot of effort put there can resolve a great many problems. However, sometimes, with all the goodwill or expertise some things can pierce through multiple layers. Sometimes coming all the way down from the chip manufacturing process itself. To understand that, you have to understand how a chip is manufactured.
A chip is essentially a surface with transistors layered on it. The smaller the transistor, the more you can put on a fixed size surface. Sometimes you can use advanced logic to build features, sometimes it’s a pattern you repeat over and over. Memory is one of the later cases. To have more memory, you need more surface. But more surface can become a manufacturing issue. The bigger the chip surface, the higher rate of a statistical defect you can have because of the process itself. The bigger the surface the fewer number of individual chips you can put on an individual wafer. Essentially, the bigger the chip’s surface area the more memory it has but also the more expensive it becomes. If you produce less chips on a wafer and they have a higher rate of defects, someone needs to bear the cost of the defective chips.
Even if your rate of error in the manufacturing process is extremely low at each stage and at the global level, you can hit another barrier. The size of the surface area itself. To manufacture at extremely low feature size, the area of the chip becomes smaller and you can’t make it bigger. You have to literally align the transistors of two half chips, which has its own set of issues. At current feature size the size is 676 mm2, at the next size it will be half that amount. Changing this size requires at least a decade of investment and R&D from tool makers and the chip foundries.
This impacted us on network chips (ASICs). These chips handled the routing, the filtering and other “dynamic” features. Those features are implemented using a special kind of memory called ternary content-addressable memory (TCAM). The more routes and rules you want, the more TCAM you need. The more TCAM you need, the more surface area you need. The more surface area, the more expensive the chip has to be. Sometimes even, it’s just not possible to manufacture those chips. For this case, the memory has to be on the chip, inside that critical area. To my knowledge it cannot be deported elsewhere.
This very far away distant manufacturing constraint had an impact on what we could ourselves expose to the teams and ultimately to the clients. We had to allocate that memory very carefully. Our code logic could handle a great many things but the underlying chips could not. We had to choose what to be able to filter or route. How many rules we could have depended on the types of rules we had. Did we want to filter L1, L2, L3 or L4 up headers? Did we want to enable IPv6? Did we enable jumbo ethernet frames?
It was a multi dimension optimization problem. These dimensions needed to be adjusted to fit our own customers and the services we provided to them. Both for the cloud and dedicated services. To be able to split and optimize, we would use a tool that most dev find horrible. It worked for us. It was simple enough. I can hear the screams of devs just by saying it’s name. Excel.
If you have missed it, you can read the previous episode here
To pair with :
- Evergreen - Seba, Manos
- The Hunchback of Notre-Dame (Notre-Dame de Paris) by Victor Hugo