We have recently been preparing for the deployment of a big number of new desktop computers. The plan was to follow the Microsoft ways using WDS and/or MDT, whose procedures generally include setting up a referencing computer and customizing it, then capturing the image to deploy.
The process went along very well during our early stage tests on virtual machines, until two samples of the physical machines were delivered to us for further preparation, when things began to get complicated.
The model was DELL Optiplex 3240 AIO, an all-in-one computer, and the problem was most of the time when Sysprep was run to generalize the OS, it turned out a blue screen of death, with Bug Check error type WHEA_UNCORRECTABLE_ERROR, no matter if firmware was set in UEFI mode or legacy BIOS mode.
Performing postmortem analyses on the Minidump files, I found out the error code was 0x124, and the first parameter was always 0x4, which means an uncorrectable PCI Express error occurred. After running
!analyze -v, WinDbg reported the problem was probably caused by GenuineIntel, but I highly doubted it was the CPU to blame.
Most of the documents, including MSDN and the Microsoft community Wiki, says that this Bug Check is almost always caused by physical hardware failures, especially overclocking. In spite of that, I still could not convince myself. Obviously, all of the two sample machines were showing the same symptom, and if it were a hardware issue, the quality would be disastrous. After all, it’s DELL.
Finally, I googled ‘sysprep bsod 124’ and found one case on Microsoft TechNet forum, titled ‘BSOD when sysprep Windows 10‘, which was nearly identical to our situation: all-in-one computer, same Bug Check error, same type number, only the brand is HP. In the replies, someone provided another kernel debugger helper command,
!errrec, which shows detailed WHEA error records given the address, i.e. the second parameter in the Bug Check.
Later, I found the command was already provided in the MSDN document of Bug Check 0x124. The solution could come much earlier, had I been patient enough to follow the document!
In the output of
!errrec the related Device Ids were revealed.
- 10ec:8168, Realtek Gigabit Ethernet
- 8086:24f3, Intel Dual Band Wireless-AC 8260
All of them are network adapters.
Since it’s not practical to remove any hardware from inside of an all-in-one computer, we decided to firstly try uninstalling all network adapters from Device Manager before doing Sysprep, and it worked!
When the network devices are removed manually from Device Manager one by one, with Sysprep run immediately afterwards, blue screens are never observed again. The method really worked throughout our subsequent generalization and capturing processes.
But why does such a serious problem happen in such a fundamental procedure? Obviously, the scenario has never been tested by the manufacturers, neither DELL nor Realtek, Intel, etc. Don’t DELL’s teams use Sysprep any more? Guess so and not.
- Sysprep may be still in use. When doing the search, I saw some comments on Sysprep, which recommend building a reference computer offline, to prevent background tasks, esp. Windows Updates from altering the system without solicitation. Maybe DELL make their images in a closed lab environment where our scenario never happens.
- Sysprep may be fading out. With prevailing DISM techniques, servicing an image offline might be no longer a question but perhaps a best practice. When altering an image offline, device drivers never sneak in so our scenario will never happen.