Business, tech, and life by a nerd. New every Tuesday: Splitting Light: The Prism of Growth and Discovery.
Share
Splitting Light: Season 1 - Episode 21
Published 2 months ago • 3 min read
Splitting light
Season 1 Episode 21
1+ meter track length
If you are no longer interested in the newsletter, please unsubscribe
This next assignment was to become the premise of a career pivot even though I could not know it at the time. Greg has these interesting ideas that I can only fully appreciate now that I am much more experienced and more battle tested. He did the hardware designs in a modular way. There was a project to reuse the C1 node to package it as a raspberry pie but it fizzled out because handling mass market hardware is a very different world than handling datacenter hardware. From that project, on which I had done a bit of software qualification, was born a storage board. A new type of hard drive had just come out. It was higher capacity and 25% less expensive per gb. It was the SMR hard drives. However they had one flaw or constraint depending on how you saw it. The performance for random read/write was very bad.
What if you could hedge this? Greg spun a new design where you plugged 56 3.5 inch drives vertically on a large squarish PCB. With a C1 node slotted in on a corner. There was a maze of lanes and small components to route and switch the SATA data lanes to the node.
The storage board
Once voltage tests were done, I was handed the board and I started to qualify the 56 slots. I wrote a bit of python to help me but 90% of the process was manual. I would plug in an SSD drive in the slot, power it up from the terminal, check that the drive would link up with the operating system, then check for data errors while transferring some data back and forth, then power down the drive and aim for the next slot.
Right away, I found that half of the slots were not working. After checking the tracks and the schematics, the issue was found. For simplicity, one of the sata buses had some of the SerDes signal pairs inverted. Which meant that, the system on chip (SoC) was expecting negative voltage where it was receiving positive and vice versa for the other wire. It was documented that we could invert the lanes by configuring the SATA PHY but the specific configuration was nowhere to be found in our documents. I sent an email to our chip support engineer for assistance.
Differential signaling pair
Continuing on the working SATA bus, as I was powering up a new slot I heard a big mechanical snap. A few seconds later I smelled burnt plastic and my device was not responding anymore. I turned right away to the test bench. The differential circuit breaker had done its job in preventing a fire. We slowly disconnected everything and started inspecting the board.
We turned it around, looked everywhere, looked at the schematics but couldn't find anything. Eventually Greg found the issue. To understand it, you have to understand how a SATA connector is seated. It’s actually a sort of bridge which is seated or soldered but the place where the connector connects with the line pads on the board is open and visible. In the sata slot I had just turned on, underneath the connector, were several solder bubbles. They were the cause of the short circuit. We had tested a less expensive manufacturer and they had not seen the defect but neither did we until it burned a few components. I cleaned the bubbles with a soldering braid, replaced the burnt components and continued the tests.
Our support engineer had responded with the memory register configuration and after patching the operating system a bit we had the second sata bus working. Continuing on the tests, I found that some of the slots were not reachable. The components that controlled them did not respond to my commands. The digital oscilloscope came to the rescue. My first straps at the exit of the C1 node didn’t show anything unusual. Greg suggested I strap the component directly. I did and low and behold, there was something unusual. One of the things that the presenter in “Indistinguishable from magic” had said was that digital is analog. The resistance of the copper on the length of the track had diminished the signal enough that it was out of the specs for the components. We increased the power of the signal and it became very digital (square) again.
After testing every single slot, I wrote some python code to make it more manageable as well as documentation for the hardware. I was ready to hand it over. But, that would not happen…
To pair with :
Megumi The Milkway Above - Connan Mockasin
Magician: Apprentice by Raymond E. Feist
If you have missed it, you can read the previous episode here
Splitting light Season 1 Episode 29 Hardware documentation If you are no longer interested in the newsletter, please unsubscribe We gave a lot of documentation to the cloud team or the dedicated team for each new hardware. This documentation was to help them implement support for the hardware, adapt their information system and operate it. Their day to day was managing the hardware and we tried to make it as simple as possible for them. There is a reason why computer engineering is layer upon...
Splitting light Season 1 Episode 28 Juggling code If you are no longer interested in the newsletter, please unsubscribe Scaleway's first custom router Three years into my tenure in the lab, we had launched many products. From the revolutionary C1 where we had 900 nodes per rack to the cold storage product. We had two compute devices production, one network device and one storage device. That did not count one project that had been terminated and the SCADA devices which I did not work on....
Splitting light Season 1 Episode 27 From components to a display screen If you are no longer interested in the newsletter, please unsubscribe We had used very simple LEDs to display the status of the nodes and management system for the first two compute generations. You could not do any actions except power down or reset the system with buttons. We eventually got feedback from the team who managed big quantities of these devices. It worked great but, sometimes, it was hard to handle in the...