The C1 server which I was able to start the operating system with, was part of a bigger device. You could not use it without other devices. Either you plugged it into a “test bench” which you used to be able to test and develop the device or you plugged it in another set of hardware. That specific hardware was actually what made it very interesting. The small credit sized C1 device was fitted on another board with 18 other devices like it. We called that board a blade because of the way it slid in an ACTA chassis. It was almost like Matryoshka dolls. The chassis had been chosen because the company belonged to Iliad. One of the sister companies was Free, a major European telecommunication operator, and they also used this chassis for their custom hardware.
C1 blade and C1 chassis. Not a sword but for sure a box
All told, in a single rack, we filled close to a thousand physical servers. You had three chassis, each with sixteen blades, each with nineteen servers. A grand total of 912 servers. In 2013, the maximum you could fit in a single 42U rack with publicly available hardware was about 405 physical servers using HP Moonshot 1500. Not only was it at least a twofold increase in the number of servers, it would use significantly less electricity as well. One full rack of C1 servers would use less than one full HP Moonshot chassis. You could really sense the nascent soft power of ARM devices.
I remember being in the server room of the laboratory and seeing all these LEDs blink. Each pair of lights was a server. Not a very powerful one but still an individual physical server. The amazing thing was that with just the hardware in that room, it surpassed the total number of servers I had seen in my life. I had been in data centers already but this one chassis replaced a dozen racks. It was truly amazing.
My tasks at the time were doing the last qualifications tests of the whole assembled system. Each part had been tested individually, now everything needed to be tested together. It was a lot of manipulation work such as seating the devices, sliding the blades in or out. Checking internal values in the firmware, operating system or software. Checking that the LEDs blinked according to those states.
One of the memorable moments was when I was testing a feature to get the individual output of a C1 device. I slid the blade in, turned on the device and configured the chassis to forward the output. Nothing happened. No output for eighteen devices. I double checked the internal states, then I turned to the firmware software. I used an internal feature to check some data. There was an unexpected value. Not knowing what to do I asked Gregoire.
He fired up the electronic design automation (EDA) tool. Directly zoomed in to a specific place on the board and followed on a particular copper trace. The screen displayed different colors and lines as he moved through the design. He found the error, the bug was a trace in a middle layer of the PCB.
He took one of the blades, compared the locations to the screen, then told me that was going to do a “hardware patch”. He took out a cutter and chose carefully the location and started scratching the PCB until he hit the trace to cut it. He handed it back to me, I plugged it and this time it worked. I was dumbfounded. This was magic to me. Little did I know that over the next months and years, this would become routine for me. In the meantime, it was such an eye opener. I could feel I was going to really enjoy working in the lab.
Have you had a situation where a seemingly unrelated action fixes your problem? How do you connect the dots afterwards?