Homebrew floppy controller repair

A while back, I built a couple of DiY interface cards to help with restoration and testing of old XT/AT machines. One of those, the XT_IDE card worked perfectly and is now permanently installed in the DTK 286. The other, while it worked initially, developed a mysterious issue that would cause it to glitch and no longer work. At first I thought this might just be a bad controller IC as it was a used vintage part with unknown history. I bought a pair of replacements from different sellers in China, but neither one solved the issue.

I went down a rabbit hole testing and probing on the board, eventually finding a strange issue that, oddly enough was the clue to the real problem. (only I didn’t realize it at the time) The bottom of the board has 2 identical 74LS logic ICs in neighboring sockets that are hooked up to some of the data & interrupt lines of the ISA bus. When probing with a multimeter, I discovered one of the chips wasn’t connected to ground. I checked the board and at the time didn’t see any damage, so I assumed one of the sockets had a bad contact. As a temporary fix, I added a bodge wire to link the ground pin on the affected chip to the functioning ground of the other chip. Sadly, this didn’t improve the situation at all.

I had thought for sure the bus interface was the issue, so when that didn’t pan out, I decided to see what I could learn by tapping the various signals between the floppy drive and the controller and seeing if I saw anything weird on the oscilloscope. At the time, the controller would sort-of work for a few minutes on cold start, but would then degrade and eventually stop reading disks all together. After several sessions and still not seeing anything definitive on the scope I gave up for a bit. Nothing I’d tried had worked.

Recently, I sat back down at the workbench and decided to go over the board in detail and see if there was something simple I’d missed. I brought my reading glasses and used my craft light with its big magnifying glass to go over the PCB in detail. (sadly my eyes aren’t what they used to be) The first thing I noticed was the flux residue on the board. This hasn’t been a problem with previous projects, so I’d left it in place previously, but I decided to remove it just in case. I cleaned all of the flux residue off the board with 90% IPA and a toothbrush and then absorbed the mixture into a paper towel to remove it.

Once the residue was removed, I found the cause of the original issue. The chip with the ground problem had a cold solder joint that had broken loose. That was an easy fix, but not believing it to be the only issue I kept searching. The more I looked the more problems I found, mostly with bad solder joints, excessive solder, or spatter in some of the small gaps between connections that could have caused a short. I’m pretty sure a lot of this is due to the Ryobi soldering iron I use that enters a power saving mode periodically. It’s an annoying feature that only serves to cause poor soldering. I probably spent 30-45 minutes inspecting every connection and trace and touching up/cleaning anything that looked off. In the end, I’m not sure if it was just the floating ground on the 3-to-8 decoder/demultiplexer, or something else I fixed, but when I hooked up a floppy drive to the controller it appeared that it might be working. A few tweaks to the ROMs and controller configuration and I was able to confirm that it was indeed fixed. In the end, I probably could have saved myself the trouble if I’d been more patient and checked my work when building the board in the first place, however it was a good opportunity to practice some low-level troubleshooting.

Intel J1900 flaw causing early failure for embedded devices

This morning I woke up, made some coffee and went to the computer room to work on some projects when I noticed that the shared folders from my NAS weren’t showing up on my desktop. That’s odd I thought. I went over to the NAS and my fears were confirmed, not only was it offline, but it had an error light. It would power on, but it wouldn’t start up.

Hoping that this was some simple fault like a bad stick of memory, I disconnected the unit, pulled out the drives and brought it over to my test bench. I confirmed the power supply was good and swapped the memory in/out, but with no effect. Ouch, this thing is really dead I thought. Before consigning it to the e-waste bin, I thought I’d search around just to make sure and stumbled on a thread from 2020 about the CPU on these devices having a flaw. Not only that, there was a possible fix! (if, perhaps only a temporary one)

At the time the flaw was discovered Intel posted an addendum to their CPU specification update for the J1900 and related CPUs. (Intel has since removed these docs from the public facing side of their website and requires a CNDA account to access them. Thankfully the wayback machine has an archive of them linked above) Unfortunately, the problem lies in the silicon of the CPU itself and is not repairable.

The fix documented in the forum link above and in a similar Reddit thread a year later both outline connecting a 100-200ohm resistor to pull the LPC clock signal to ground. Thankfully this signal is exposed on a pin header that also supplies a ground pin on the NAS model I have. I first hooked up my oscilloscope to the clock pin and verified that it was operating out of spec. I rummaged through my component collection and found a 180ohm resistor that would work. I had some jumper wires with dupont connectors for another project and used that to make a dongle that would jump these 2 pins. I put power into the unit and it started right up. Amazing!

Sadly the problem with the J1900 CPUs is only going to get worse over time. It’s possible that I could be able to keep the unit running for some time, possibly by changing out for different resistors in the future as the circuit continues to degrade. However the real solution is to start planning a migration from this device to something new.

If you have an embedded device powered by a J-series, N-series or similar and it’s been operating 24×7 for several years, you’re likely on borrowed time. Get a good backup of your data and start planning your migration now.

VMWare aquisition chaos continues

Broadcom has terminated all VMWare partners and won’t say if they’re going to be let back in. This move has angered customers and partners alike.

It’s sad to to see this company being so horribly mismanaged. Broadcom should slow down and carefully consider their next actions before they alienate their customers and risk losing much of the value they hoped to extract from the company.