A deep dive into Apple Touch Bar – Part I

Recent Apple MacBook Pro models are usually equipped with long OLED touch display that officially named “Touch Bar” that substitutes the traditional Function (Fn) key row. Internally, this thing is called Dynamic Function Row (DFR) according to the naming prefix of related system components. In Boot Camp Windows installations, the Touch Bar acts as basic media and function keys without advanced graphical capabilities.

Given the fact that the Touch Bar is implemented by either Apple T1 or T2 chip, it is certain that one or more connection technology such as USB are utilized for the host communication. However, the communication protocol remains unknown for a long time due to the availability(cost) of device and the nature of typical technology users.

Third-party developers have developed applications that simulates Touch Bar on the host (x86) system. A quick inspection through the source code reveals a library that generates the continuous stream that contains the graphical content on the Touch Bar. It suggests that the content on Touch Bar is pre-rendered on the x86 system and gets transferred to the ARM side (T1/T2). Since T1 and T2 are both connected on the USB bus, packet captures were conducted over the USB bus.

Packet capture over the USB bus

During packet capture sessions, the pattern of periodic large packet transfers were observed. However, the size is not fully deterministic, but mostly it is larger than 54000 bytes. More image with certain patterns (single color, rainbow, etc.) were transferred to help analysis the binary structure. With sufficient samples, the following characteristics and structs were concluded:

  • A semi-static header is present for each frame update.
  • Raw pixels are transferred in BGR24 format.
  • There was a static padding at the end.
typedef struct _DFR_GENERIC_REQUEST_CONTENT {
	UINT32 RequestKey;
	UCHAR Reserved[8];
	UINT32 End;
} DFR_GENERIC_REQUEST_CONTENT, *PDFR_GENERIC_REQUEST_CONTENT;

typedef struct _DFR_GENERIC_REQUEST {
	DFR_GENERIC_REQUEST_HEADER Header;
	DFR_GENERIC_REQUEST_CONTENT Content;
} DFR_GENERIC_REQUEST, *PDFR_GENERIC_REQUEST;

typedef struct _DFR_UPDATE_FB_REQUEST_CONTENT {
	UINT16 Reserved0;
	UINT8 FrameId;
	UINT8 Reserved1;
	UINT32 Reserved2;
	UCHAR Reserved3[24];
	UINT16 BeginX;
	UINT16 BeginY;
	UINT16 Width;
	UINT16 Height;
	UINT32 BufferSize;
} DFR_UPDATE_FB_REQUEST_CONTENT, *PDFR_UPDATE_FB_REQUEST_CONTENT;

typedef struct _DFR_UPDATE_FB_REQUEST {
	DFR_GENERIC_REQUEST_HEADER Header;
	DFR_UPDATE_FB_REQUEST_CONTENT Content;
} DFR_UPDATE_FB_REQUEST, *PDFR_UPDATE_FB_REQUEST;

Bootstrapping the blue Windows when you have random AArch64 devices in the backyard

A Nintendo Switch showing the "Windows Blue Logo"

Last year I mentioned my attempt to bootstrap Windows 10 on Dragonboard 410c. This year I ported EDK2 to Nintendo Switch and successfully booted Windows 10 arm64 installation ramdisk (rs4, rs5, and 19H1 tested as of writing time). I will briefly introduce a common way to port EDK2 with existing codebase (e.g. U-Boot), as well as cases of booting in EL2 (hypervisor).

Background

While this article applies to most ARM SoCs, the following content will use Tegra as the example. NVIDIA developed a few solutions for Windows on ARM in Windows 8 era: Tegra 3 (Tegra30) and 4 (Tegra114). No further model have official Windows BSP (Board Support Package) released publicly due to low market acceptance of those Windows RT products.

Despite of that, general AArch64 processors are capable to run Windows 10 without additional HAL extension library if the following conditions are satisfied:

  • Architecture Timer with ACPI GTDT table description. Either CP15 or MMIO clock is okay.
  • Generic Interrupt Controller v2/v3 (we are not yet aware of v4 support) with ACPI MADT (APIC) table description, or Broadcom Interrupt Controller
  • AArch64 instruction set (crypto extension is not required)
  • ARM Performance Monitor Unit with ACPI MADT (APIC) table description

One noticeable exception the initial generation of Qualcomm Kryo (Snapdragon 820, 821) due to the faulty cache design in large core cluster. Windows removed the required erratum for it due to the complication of patch maintenance.

In the case of Tegra X1, it satisfied all conditions outlined above. I used an old-bootrom Nintendo Switch as my experiment platform since it is much cheaper than Jetson TX1. Additionally, there is verified CoreBoot and U-Boot source code for these Tegra X1 devices including Nintendo Switch.

I assume you are familiar with the NVIDIA RCM Exploit (Fusee-Gelee) as well as Tegra Boot flow. If you are not familiar with Tegra Boot flow, please refer to Tegra Technical Reference Manual available on NVIDIA developer site.

Port U-Boot Code to EDK2

There are a few environment assumptions that need to be addressed while porting U-Boot device/driver code to EDK2:

  • While U-Boot runs in AArch64 context, it only utilizes little amount of memory at the memory bottom in most circumstances. EDK2/TianoCore loads everything as high as possible per UEFI specification. Certain peripheral operations are not 64-bit addressing aware. It’s okay to force converting 64-Bit pointers to 32-Bit without data loss in the U-Boot assumption, but in EDK2 this might lead to issues. One case is SDMA (single operation DMA). Tegra SDHCI controller SDMA operations are not 64-bit addressing aware. To address the issue, I slightly modified the DMA bounce buffer allocation library (also ported from U-Boot) to allocate bottom memory instead.
  • Syntax styles. U-Boot observes the Linux naming convention for functions and types; EDK2 observes the Windows style. It might be a good idea to write a shim to provide functions like readl/writel as well as udelay/mdelay.
  • There is probably no need for porting generic classes (e.g. udevice). You might not need them in EDK2 context.

To save myself some time bootstrapping the microSD slot, I ported the clock and device framework from U-Boot to EDK2. Here are a few suggestions while porting U-Boot code to EDK2:

  • Address issues mentioned above.
  • Put device specific definitions into “Include” directory, use PCD database when necessary.
  • Install these code services as DXE driver whenever possible. Invoke them using protocols.
  • For board/machine-dependent code library (e.g. mach-tegra), depends on the usage to integrate them with driver or use additional library instead.

From Device Tree to ACPI

Device Tree is the de-facto standard in ARM to describe the system and peripheral hierarchy. Windows RT introduces the intensive use of ACPI on ARM platforms. I will cover some required tables for a success Windows startup on ARM platforms. For tables such as CSRT and DSDT, check out the Microsoft documentation.

GTDT (Generic Timer Description Table)

For SoC with architecture timer, ARM defines GTDT table to describe platform timer information. In the device tree, an architectural timer may looks like this:

timer {
	compatible = "arm,armv8-timer";
	interrupts = <GIC_PPI 13 (GIC_CPU_MASK_SIMPLE(4) | IRQ_TYPE_LEVEL_LOW)>,
	             <GIC_PPI 14 (GIC_CPU_MASK_SIMPLE(4) | IRQ_TYPE_LEVEL_LOW)>,
	             <GIC_PPI 11 (GIC_CPU_MASK_SIMPLE(4) | IRQ_TYPE_LEVEL_LOW)>,
		     <GIC_PPI 10 (GIC_CPU_MASK_SIMPLE(4) | IRQ_TYPE_LEVEL_LOW)>;
	interrupt-parent = <&gic>;
};

And it looks like this in ACPI GTDT table:

....

[024h 0036   8]        Counter Block Address : FFFFFFFFFFFFFFFF
[02Ch 0044   4]                     Reserved : 00000000

[030h 0048   4]         Secure EL1 Interrupt : 0000001D
[034h 0052   4]    EL1 Flags (decoded below) : 00000002
                                Trigger Mode : 0
                                    Polarity : 1
                                   Always On : 0

[038h 0056   4]     Non-Secure EL1 Interrupt : 0000001E
[03Ch 0060   4]   NEL1 Flags (decoded below) : 00000002
                                Trigger Mode : 0
                                    Polarity : 1
                                   Always On : 0

[040h 0064   4]      Virtual Timer Interrupt : 0000001B
[044h 0068   4]     VT Flags (decoded below) : 00000002
                                Trigger Mode : 0
                                    Polarity : 1
                                   Always On : 0

[048h 0072   4]     Non-Secure EL2 Interrupt : 0000001A
[04Ch 0076   4]   NEL2 Flags (decoded below) : 00000002
                                Trigger Mode : 0
                                    Polarity : 1
                                   Always On : 0
[050h 0080   8]   Counter Read Block Address : FFFFFFFFFFFFFFFF

...
  • If your platform does not have MMIO architectural timer, write the address as 0xFFFFFFFFFFFFFFFF.
  • If you boot from EL2, you are required to supply all timer values. Otherwise only EL1 timers are needed.
  • PPI starts at 16. Plus 16 for all interrupt numbers you have in the device tree. The four interrupts are Secure EL1, Non-secure EL1, virtual timer and hypervisor in sequence.
  • You may have platform watchdog, supply it in the GTDT table too (see Qualcomm example). It is not mandatory for booting Windows though.

MADT (Multiple APIC Description Table)

Most AArch64 SoC systems have one or more GIC-compatible interrupt controllers. Windows has inbox GIC support, all needed is supplying proper information in the MADT table. The table also describes ARM Performance Monitor Unit information for system’s reference. In device tree, GIC and PMU look like this:

gic: interrupt-controller@50041000 {
	compatible = "arm,gic-400";
	#interrupt-cells = <3>;
	interrupt-controller;
	reg = <0x0 0x50041000 0x0 0x1000>,
	    <0x0 0x50042000 0x0 0x2000>,
            <0x0 0x50044000 0x0 0x2000>,
	    <0x0 0x50046000 0x0 0x2000>;
	interrupts = <GIC_PPI 9 (GIC_CPU_MASK_SIMPLE(4) | IRQ_TYPE_LEVEL_HIGH)>;
	interrupt-parent = <&gic>;
};
arm_pmu: arm-pmu {
	compatible = "arm,armv8-pmuv3";
	interrupts = <GIC_SPI 144 IRQ_TYPE_LEVEL_HIGH>,
	    <GIC_SPI 145 IRQ_TYPE_LEVEL_HIGH>,
            <GIC_SPI 146 IRQ_TYPE_LEVEL_HIGH>,
	    <GIC_SPI 147 IRQ_TYPE_LEVEL_HIGH>;
};

An example of the MADT table can be found here.

  • In MADT table, each processor core have an table entry. Make sure you have the same CPU object in DSDT table, with identical and unique UID and CPU interface ID.
  • If your platform supports ARM PSCI, parking address field can be ignored.
  • The four registers in GIC device tree are GIC distributor, GIC base address, hypervisor GIC base address and virtual GIC base address.
  • You might need to supply GIC redistributor address on GICv3 architecture.
  • SPI interrupt number starts at 32. Plus 32 for all performance interrupt number in MADT table.
  • MPIDR value needs to be referred from platform resources.

DBG2 (Microsoft Debug Table 2)

Microsoft defines DBG2 table for ARM platforms. Although Microsoft docs mark DBG2 table info as mandatory, you do not need to supply debug device information if you just want to boot Windows as a proof-of-concept :P. An empty DBG2 table is enough for booting.

For debug purposes, it is necessary to define at least one debug device (8250/16550 serial or USB) in DSDT and DBG2 table. More information can be found at here.

FADT (Fixed ACPI Description Table)

Indicates PSCI support and Hardware-reduced ACPI mode, then you are good to go.

Debugging ACPI

It’s incredibly difficult to debug early ACPI startup if you don’t have serial or debug access on the platform. Fortunately, Linux provides some utility for it. It is feasible to enable the UEFI FrameBuffer early printk support on 5.0+ kernels to simplify the debug process.

Conclusion

With much effort, Windows on ARM can run on a variety of AArch64 devices. There’s still much work between “just-booted” and “usable”, and it may cost you countless nights to achieve your marvel, even if there are always guys ask you “why”:

https://www.reddit.com/r/SwitchHacks/comments/awukbx/windows_on_switch_being_worked_on/

Debugging early ARM ACPI bringup without UART

Sometimes it is not feasible to get UART access on consumer blackbox devices (e.g. Lumia 950XL). In the case of ARM ACPI debugging, the lack of UART access may make early boot debug incredibly difficult.

Starting from Linux Kernel 5.0, it is now possible to enable FrameBuffer-based early kernel display. All you need to do is:

  • Enable Earlyprintk and Earlycon support. By default it is on.
  • Enable EFI FrameBuffer display device.
  • Enable EFI FrameBuffer Earlycon device.
  • (Optional) Enable PSCI checker to verify PSCI functionality.

Then pass the following parameters in bootloader:

earlycon=efifb,mem

Then you are good to go.

Fix broken Windows Management Instrumentation

A colleague told me a Windows Server 2016 node entered an inconsistent state after an abnormal shutdown. The following symptoms were observed:

  • Explorer hangs with “loading…” text
  • Hyper-V Management couldn’t connect to the local server
  • Group Policy Update consequently failed
  • Telemetry metrics disappeared
  • WMI Management reported “RPC: the requested object does not exist” for object Root

A quick diagnostics indicated a component failure with Windows Management Instrumentation. To determine the failure source, I ran WMIDiag from Microsoft. The log showed a metadata failure:

.1526 21:47:43 (1) !! ERROR: WMI CONNECTION errors occured for the following namespaces: ………………………………………….. 20 ERROR(S)!
 .1527 21:47:43 (0) ** - Root, 0x80010114 - The requested object does not exist..
 .1528 21:47:43 (0) ** - Root, 0x80010114 - The requested object does not exist..
 .1529 21:47:43 (0) ** - Root/subscription, 0x80010114 - The requested object does not exist..
 .1530 21:47:43 (0) ** - Root/DEFAULT, 0x80010114 - The requested object does not exist..
 .1531 21:47:43 (0) ** - Root/CIMV2, 0x80010114 - The requested object does not exist..
 .1532 21:47:43 (0) ** - Root/CIMV2/Security, 0x80010114 - The requested object does not exist..
 .1533 21:47:43 (0) ** - Root/CIMV2/TerminalServices, 0x80010114 - The requested object does not exist..
 .1534 21:47:43 (0) ** - Root/nap, 0x80010114 - The requested object does not exist..
 .1535 21:47:43 (0) ** - Root/SECURITY, 0x80010114 - The requested object does not exist..
 .1536 21:47:43 (0) ** - Root/STANDARDCIMV2, 0x80010114 - The requested object does not exist..
 .1537 21:47:43 (0) ** - Root/RSOP, 0x80010114 - The requested object does not exist..
 .1538 21:47:43 (0) ** - Root/RSOP/User, 0x80010114 - The requested object does not exist..
 .1539 21:47:43 (0) ** - Root/RSOP/Computer, 0x80010114 - The requested object does not exist..
 .1540 21:47:43 (0) ** - Root/WMI, 0x80010114 - The requested object does not exist..
 .1541 21:47:43 (0) ** - Root/directory, 0x80010114 - The requested object does not exist..
 .1542 21:47:43 (0) ** - Root/directory/LDAP, 0x80010114 - The requested object does not exist..
 .1543 21:47:43 (0) ** - Root/Policy, 0x80010114 - The requested object does not exist..
 .1544 21:47:43 (0) ** - Root/Microsoft, 0x80010114 - The requested object does not exist..
 .1545 21:47:43 (0) ** - Root/Microsoft/HomeNet, 0x80010114 - The requested object does not exist..
 .1546 21:47:43 (0) ** - Root/aspnet, 0x80010114 - The requested object does not exist..

The documentation suggested performing a metadata registration. The following script is utilized for the metadata repair:

@echo on
cd /d c:\temp
if not exist %windir%\system32\wbem goto TryInstall
cd /d %windir%\system32\wbem
net stop winmgmt
winmgmt /kill
if exist Rep_bak rd Rep_bak /s /q
rename Repository Rep_bak
for %%i in (*.dll) do RegSvr32 -s %%i
for %%i in (*.exe) do call :FixSrv %%i
for %%i in (*.mof,*.mfl) do Mofcomp %%i
net start winmgmt
goto End

:FixSrv
if /I (%1) == (wbemcntl.exe) goto SkipSrv
if /I (%1) == (wbemtest.exe) goto SkipSrv
if /I (%1) == (mofcomp.exe) goto SkipSrv
%1 /RegServer

:SkipSrv
goto End

:TryInstall
if not exist wmicore.exe goto End
wmicore /s
net start winmgmt
:End

It will throw some errors. Ignore them. Then reboot the server.

Status Update: Lumia950XLPkg

It’s almost a year for Lumia950XLPkg and its derivative projects. A new touch-enabled graphical menu will be added in coming weeks (I’ve posted a picture on Twitter).

UEFI: Finalized

There are a few more things to do (mostly bugfix) after the PCIe initialization (Talkman variant will be released later). Here’s a list of current backlog:

  • Touch-enabled menu for boot device selection and basic settings
  • Time synchronization with BootShim
  • Environment variable (e.g. MDP settings) passthrough
  • PCIe initialization for Talkman
  • ACPI fix

This UEFI project will be finalized on March or April, then I will transfer the ownership to LumiaWoA organization.

Lumia UEFI menu, based on LittleVGL and EDK2

Mainline Linux & Android

Lumia950XLPkg makes it possible to run mainline Linux on Lumia 950 XL. So far I’ve brought up main components including touchscreen and Bluetooth. Wi-Fi will be available once I figure out the way to declare firmware-initialized PCIe bus in device tree.

camphoto_351212254.jpg
Debian on Lumia 950 XL demonstrating Bluetooth HCI status

Freedreno is also possible. However, it may takes significant time to figure out proper MIPI DSI commands for display panel enablement.

There are other people working on Android-side project for Lumia 950 XL, but I am unable to disclosure the progress at this moment.

Joining Microsoft / LinkedIn

I am excited to announce that I am joining Microsoft / LinkedIn in the coming summer. But the employment may have potential CoI (conflict of Interest) on projects that I am currently working on. I wish I can continue on making the next big thing 😛

Play “Overcooked” efficiently

I got a Nintendo Switch from my friend (for a research project). Meanwhile, I enjoyed the game “Overcooked” on Switch. In this game, you control cooks to perform variety tasks and then deliver orders in time. If orders are delivered in advance, some tip will be given. 1-4 players can play the game simultaneously.

It’s clear that you have to do everything as quick as possible to achieve high score in the game. Every task (e.g. cutting meat) need some time to complete. Certain task (e.g. frying) depends on other tasks. To eliminate unnecessary time cost (i.e. waiting for cutting to complete), I use the following strategies:

  • Minimize workers’ stall time (doing nothing). For example, it is not necessary for workers to wait for frying process (polling is not efficient). Like interrupts in modern machines, they can do something else like washing dishes and cutting meats while waiting for cooking. Once interrupt signals (frying completes), they enter interrupt servicing routine: get the food put into a plate. In most cases, the food is ready to serve then. Finally, they returned to what they were doing.
  • Again, make sure everything is doing something. This is especially important if you are playing with your friends. You had better analyze the dependency chain and discuss strategies with your friend before starting the game. Of course you should issue instructions to your friend during game if necessary.

Not all kitchens are easy to deal with. Some have dynamic arrangements – contents may change their location during the game session. Some kitchens have no constant light source. Other have isolated workspaces with conveyor belts or tables for swapping materials (I call it a “bus”).

  • Tables for swapping are usually space-constrained. If you are playing with your friends, you are probably simulating a Symmetric Multiprocessing system, and the bottleneck will be bus bandwidth. In such cases, you should consider the priority of materials. Once transfer finishes, get them as soon as possible.
  • Conveyor belts are high-latency bus, but they have relatively high bandwidth (Hey DDR4, I am looking at you). In some kitchen scenarios (e.g. making burgers), you can put everything on the belt in batches and fetch in batches too.
  • Some conveyor belts connect to trash can, which means materials must be fetched before the expiration. But some cook utilities will appear again if you put them into the trash can. In this way, you can prioritize the transfer of contents on the conveyor belt.
  • Try achieve full-duplex transfer and prefetch to save time. Consider the following scenario: you have a pot that cooks rice at once side, and food materials (rice and flour tortilla) on the other side. For the first time, you get rice and put them into pot. Once rice finishes, you carry cooked rice to the other side and wrap them with tortilla. Don’t get tortilla separately in another transfer. If you really have to do that, you can instruct other cooks (if exist) to prefetch some for you.
  • Prefetch might not work for all kitchens. In the case of cooking soup, mice will steal your food if it is unattended for a while. But you can secure processed food in pots so it won’t get stolen.

Get familiar with your kitchen and good luck! (Well, it is a bit boring if you have learned Machine Architecture and Operating System internals).

The case of UEFI for Windows on ARM, and comparison with LK/ABoot

Nights before trips are always boring, and I decided to draft some words to spend the time. So we have Windows 10 on ARM running on Dragonboard 410c, and Lumia 950 XL (Article in Chinese, sorry). It will be helpful to write down some firmware-related information for platform bring-ups for further reference. Meanwhile, the comparison of Little Kernel, the common Linux Android (well, Qualcomm says so) bootloader will provide useful information for Android on Lumia project.

I recommended you read this article if you are not familiar with UEFI.

Assumptions, assumptions

Compared to Linux, Windows Kernel assumes its platform firmware and bootloader (aka. Windows Boot Manager) prepare the basic environment for successful kernel initializations. If certain components are not initialized, bugchecks may occur. Even the system successfully launches, it may have some unexpected behaviors (weird things). An official document explains these a lot.

Little Kernel initializes basic hardware too (at least you need serial for debugging). Certain periapical, including clocks, regulators, and USB are initialized too for application purposes (e.g. Fastboot). Anyway, it initializes less periapical as possible. Sometimes even the panel is not brought up (I’ve seen a case on Android phone).

In short, you have to do more for a successful Windows bring-up:

  • If you know certain components are in the usable state already, skip initialization procedures. For example, on Lumia 950 XL, our UEFI implementation does not need to initialize USB since our bootstrapper (Qualcomm UEFI) did so.
  • If your platform has PCIe components, clocks them up, set regulators and mappings, etc.
  • Initialize at least one debug resource described in your DBG2 table (if applicable, likely on all ARM platforms)
  • Bring up the panel, set basic display parameters and pass a framebuffer pointer for Windows.

So how about Linux? If your Linux platform uses DT instead of ACPI, you are likely not required to do most of the stuff Windows requires. On Qualcomm platforms, Linux kernel will clock up PCIe cores, set regulators and mappings to make it in the usable state. If your platform uses standard ACPI and platform drivers do not perform additional initialization procedures, initialize these components in firmware.

Fill the hole

Both UEFI w/ ACPI and LK will perform fix-up tasks before transferring control to the kernel. On Qualcomm platforms, chipset metadata (revision, foundry ID, etc.) will be filled in DSDT. Certain logic in DSDT depends on them. Typical Linux Android device will ship with a large DT for multiple variants. LK selects the best fit using chipset ID/PMIC ID/board ID, then fill in some memory region information for kernel use.

ACPI tables in the firmware for Windows 10 on ARM is pre-patched. So I don’t implement the fix-up logic additionally.

Multi-processor Startup, Again

Why am I discussing the thing again? Because it is important.

Little Kernel (and likely other Linux Android bootloaders) will only use a single processor in its lifecycle (a notable exception is Raspberry Pi, which uses spin table except 3+). When it transfers control to Linux, Linux will bring other cores out of reset state and make them available for use.

Windows platforms that implement ACPI Multi-Processor Parking Protocol behaves differently. Although firmware uses a single core, other CPU cores are brought out of the reset state and being instructed to run a special piece of code. The code flow is like this:

parking:
    Wait for an interrupt.
    Am I the processor being waked up?
    If yes, go to the address that OS told me
    If not, go back to parking.

(Interrupt acknowledgment and memory barriers ignored. Sorry, I don’t want to write assembly at 11 PM.)

Because different platforms handle core startup differently (on Qualcomm platforms, TrustZone has participated), booting Linux Kernel and starts cores the Linux way with a UEFI firmware that implements this protocol may fail. Someone told me he was unable to bring up other three cores on 640. It is reasonable since LK on recent Lumia phones is launched via a special UEFI application in Windows Boot Application form. Qualcomm UEFI put the other three cores in running state (and WFI). Both LK and Linux are not aware of that (they have the assumption of core state). Finally, core startup fails.

Since it is not possible to ditch Qualcomm UEFI (unlike the exploit for first-generation Lumia WP8 devices), we have to comfort the parking protocol in AArch32 mode (You have PSCI for AArch64 SoCs):

  • Ignore other cores Unicore is the best
  • Implement parking protocol for unsupported systems (not too hard). Linux has the protocol support; you have to enable it.
  • Go AArch64 and use PSCI (remember to use HVC mode for 8992/8994)

 

Good night (And to my girlfriend: If you see this article, sorry that I say “Good Night” too early.)

 

Give TianoCore/EDK2 on AArch64 a hand in 2018

Windows 10 PE runs on a Spandragon 410 processor.

Also posted here: Bringing up Windows 10 AArch64 on a $50 Single Board Computer

Windows on ARM is not a new topic. There are some guys attempted to bring up Windows RT and Windows 10 on Qemu (ARM/AArch64 target). It even runs on Raspberry Pi 3. Obviously it is not a Snapdragon 835-only thing. We can give it a hand on our own Single Board Computers.

This article covers some important details in Dragonboard 410c SBC’s aa64 UEFI implementation.

Contents

    • Windows Boot Requirements
    • Bootstrapping your own EDK2/TianoCore UEFI
    • Memory Allocation / Memory Management Unit
    • UEFI Flash Definition
    • First-stage Bootloader (Little Kernel)
    • Persistent NVRAM Support
    • A “Working” RTC
    • Multi-processor startup (PSCI)

Windows Boot Requirements (AArch64)

  • AArch64 architecture processor. It seems that AArch64 cryptography extension is required too (Raspberry Pi 3 randomly throws UNSUPPORTED_PROCESSOR bugcheck, rs4 fixed the issue). The bugcheck is raised in Errata Check (a hardcoded ID check).
  • For multi-processor systems, either Microsoft ARM Multi-processor Parking Protocol or ARM PSCI interface shall be implemented. All current Windows 10 IoT ARM32 platforms implement former one.
  • A working interrupt controller. Most AArch64 SoC cores include ARM GIC, so there’s little work to do here. The only exception I know is BCM2837. Windows has inbox Broadcom interrupt controller support (for the sake of Raspberry Pi). But if your SoC has additional third party interrupt controller, you need to supply your own HAL extension library. There is few documentation for this available though…
  • A working processor timer. If not, supply your own HAL extension library.
  • Complete ACPI 5.1/6.0 and UEFI 2.3+ implementation. Do not try to use Das U-Boot’s EFI implementation; it’s broken.

These requirements are fairly similar to ARM SBBR certification requirements. If your SBC has a working EDK2/TianoCore UEFI, then you are probably good to go. Bootstrapping your own EDK2 is pretty easy too.

Bootstrapping your own EDK2/TianoCore

The board I used (DragonBoard 410c) doesn’t have a known EDK2/TianoCore implementation. So I have to build my own. This repository for Raspberry Pi 3 is a good start point and reference for you.

You need to do these things in UEFI:

  • Initialize serial output (for debugging) and Memory Management Unit (MMU). Refer to your platform datasheet for device memory address allocation.
  • Retrieve required information from pre-UEFI environment and build Hand-off Blocks (HOB) for DXE phase
  • Initialize processor (exception vector, etc.) in DXE phase.
  • Initialize required peripherals (GPIO, GIC, eMMC, USB, RTC, Display…) in DXE phase.
  • Initialize UEFI services (variable services) in DXE phase.
  • Jump to BDS phase, start Windows Boot Manager or something else.

Memory Allocation / Memory Management Unit

Memory allocation is a platform-specific thing. Check your platform HRD to get some idea about MMU and memory allocation. For Snapdragon 410, check out Qualcomm LM80-P0436-13.

UEFI Flash Definition

Our UEFI FD starts at 0x80200000. Update your tokens in platform definition and flash definition:

And the first piece code should be your SEC initialization code (without relocation).

Little Kernel (mentioned below) will be responsible for jumping into UEFI FD at 0x80200000 and handing off execution. If you want, you can actually removes Android-specific header and device tree validation in LK (apps/aboot.c).

First-stage bootloader (Little Kernel)

DragonBoard 410c uses ARM Secure Monitor Call to switch to AArch64 mode (See Qualcomm LM80-P0436-1 for more information). The stock close-sourced SBL doesn not recognize AArch64 ELF files (later model should). LK performs basic platform initialization (UART, eMMC, MMU, etc.) A modified variant LK also initializes FrameBuffer for U-Boot. We can make it work for our UEFI too.
Windows requires UEFI provide a BGRA FrameBuffer. To achieve this, we need to modify pixel unpack pattern in platform/msm_shared/mdp5.c:


case 32:
/* Windows requires a BGRA FB */
writel(0x000236FF, pipe_base + PIPE_SSPP_SRC_FORMAT);
writel(0x03020001, pipe_base + PIPE_SSPP_SRC_UNPACK_PATTERN);

You can either specify a hard-coded address for FrameBuffer, or have a random piece of memory block to transfer information (pixel format, width, height, etc.) to UEFI. UEFI SEC phase retrieve the information, allocate HOB block and transfer information to DXE phase. A simple FrameBuffer driver retrieve information from HOB block, initializes UEFI Graphics Output Protocol. For optimal performance, initialize this piece of memory block as write-through cache memory in MMU initialization.

Persistent NVRAM Support

For persistent NVRAM support, it’s a good idea to use eMMC as storage device. This implementation demonstrates how to simulate NVRAM using eMMC and a piece of memory. I slightly modified it make it work for Qualcomm devices:

  • If eMMC NVRAM region is corrupted or uninitialized, provision it and perform a platform warm reset so I don’t get a synchronous exception in volatile variable initialization phase.
  • Modify dependency relationship to prevent “device not found” error in BlockRamVariable DXE initialization.

Windows Boot Manager depends on a “working” Real Time Clock for miscellaneous purposes. APQ8016/MSM8916 has a RTC on its PMIC processor PM8916. To access RTC services, read/write SPMI registers (see Qualcomm LM80-P0436-36). If you are lazy, just use Xen fake RTC in ArmVirtPkg.
To enable PM8916 RTC, set SPMI register 0x6046 to enabled state, then read 0x6048 and three following bits.

Note: I implemented my own PMIC protocol called PM8916Protocol that read/writes PMIC register on SPMI bus, slave #0. This RTC library is based on Xen face RTC library from ArmVirtPkg.

4KB / 64KB Page Table

Revised: On certain SoC platforms, runtime memory allocations are not comply with 64KB alignment requirements. There are two solutions, either round these memory regions to 64KB alignments, or go to MdePkg/Include/AArch64/ProcessorBind.h:


///
/// The stack alignment required for AARCH64
///
#define CPU_STACK_ALIGNMENT 16

///
/// Page allocation granularity for AARCH64
///
#define DEFAULT_PAGE_ALLOCATION_GRANULARITY (0x1000)

///
/// For the sake of our SBCs
///
#define RUNTIME_PAGE_ALLOCATION_GRANULARITY (0x1000)

ARM Erratum

I randomly hit crashes (synchronous exception) during my UEFI development. After some investigation, it seems that the problem is related to load/store commands. (See ARM Errata 835769, 843419) To prevent random crashes, add these two flags to your GCC compiler:

Multi-Processor Startup (PSCI)

For platforms that implement ARM PSCI, indicate PSCI support in ACPI FADT table:


EFI_ACPI_6_0_HW_REDUCED_ACPI | EFI_ACPI_6_0_LOW_POWER_S0_IDLE_CAPABLE, // UINT32 Flags
{
EFI_ACPI_6_0_EMBEDDED_CONTROLLER,
0,
0,
EFI_ACPI_6_0_DWORD,
0x009020B4
}, // EFI_ACPI_6_0_GENERIC_ADDRESS_STRUCTURE ResetReg
1, // UINT8 ResetValue
EFI_ACPI_6_0_ARM_PSCI_COMPLIANT, // UINT16 ArmBootArchFlags
EFI_ACPI_6_0_FIXED_ACPI_DESCRIPTION_TABLE_MINOR_REVISION, // UINT8 MinorRevision

Typically you don’t need HVC call for PSCI. If you did so (and your platform doesn’t support HVC call for PSCI), you will get a INTERNAL_POWER_ERROR bugcheck with first parameter of 0x0000BEEF.
If you indicates PSCI support, you don’t have to provide parking protocol version in your ACPI MADT table. Simply set it to 0. Here’s one example:


[02Ch 0044 1] Subtable Type : 0B [Generic Interrupt Controller]
[02Dh 0045 1] Length : 50
[02Eh 0046 2] Reserved : 0000
[030h 0048 4] CPU Interface Number : 00000000
[034h 0052 4] Processor UID : 00000000
[038h 0056 4] Flags (decoded below) : 00000001
Processor Enabled : 1
Performance Interrupt Trigger Mode : 0
Virtual GIC Interrupt Trigger Mode : 0
[03Ch 0060 4] Parking Protocol Version : 00000000
[040h 0064 4] Performance Interrupt : 00000017
[044h 0068 8] Parked Address : 0000000080301000
[04Ch 0076 8] Base Address : 0000000000000000
[054h 0084 8] Virtual GIC Base Address : 0000000000000000
[05Ch 0092 8] Hypervisor GIC Base Address : 0000000000000000
[064h 0100 4] Virtual GIC Interrupt : 00000000
[068h 0104 8] Redistributor Base Address : 0000000000000000
[070h 0112 8] ARM MPIDR : 0000000000000000
[078h 0120 1] Efficiency Class : 00
[079h 0121 3] Reserved : 000000

See ARM Juno reference platform to get some idea about crafting ACPI tables.

That’s it! Welcome to Windows 10 Userland.

Spend some nights writing Windows drivers. 😛

Windows 10 PE runs on a Spandragon 410 processor.
Windows 10 PE runs on a Spandragon 410 processor.

Hack TwinUI to force Windows Store Apps run on low resolution screens

Windows Store Apps on Lumia 640 XL.

Windows 8 and Windows 8.1 has a minimum screen resolution constraint for Windows Store Apps (aka. Metro Apps or whatever). If the screen resolution doesn’t meet requirement, user will see a prompt indicating the resolution is too low for these applications.

However, on certain platforms (like phones and single-board computers), it is not convenient to change resolution. Recently I am trying Windows RT 8.1 on Lumia 640 XL. Qualcomm has the resolution hard-coded in platform configuration, so I was unable to change the resolution. 1280 * 720 is not sufficient for Store Apps.

But there was an exception – the PC settings (aka. Immersive Control Panel) app. It always opens regardless of current resolution settings. So how can I force other applications to launch?

Let’s turn to TwinUI.dll. It’s one of the core components of shell infrastructure. Start IDA Pro, load TwinUI with symbols from Microsoft. Go ahead and search the existence of PC settings app. All Windows Store Apps are associated with a package family identifier. Let’s search it. In this case, it’s windows.immersivecontrolpanel_cw5n1h2txyewy.

Bingo. We found it in some functions.

PC Settings Package Family ID is hardcoded in TwinUI.dll. This function has been patched by me, so it doesn't reflect actual situation you get from official Microsoft binary.
PC Settings Package Family ID is hardcoded in TwinUI.dll. This function has been patched by me, so it doesn’t reflect actual situation you get from official Microsoft binary.

By checking it’s references, we learned that layout checking routine verifies whether it is a desktop application, or PC settings app when resolution doesn’t meet requirements. Either you can patch layout checking routine or PC settings PFN verification routine. I decided to patch the second one, however patching the first is probably a better idea.

On ARMv7-A platform, I simply patched initial register store operation and the branch. Instruction BLX call was replaced with a simple NOP(MOV R0, R0).

Patched function
Patched function

There are two version of the PC settings check routines, so I need to patch both. The other one is similar to this one. Patching the layout verification routine (actually a better idea, as this patch will have some trouble when launch files from desktop) / patching on other architectures should be similar to this one.

A case that Hyper-V freezes host operating system

Two days ago I set up a new virtual machine for applications that require isolated security. When I put my computer into Connected Standby mode, I noticed SoC fan didn’t stop. To verify whether it was OS inconsistency or persistent issue, I manually initiated a system restart. However, it froze at login screen then. Nothing responded, including Ctrl + Alt + Delete key sequence. Attempting to force shutdown and start the computer almost didn’t improve the situation. After about ten attempts I managed to enter my desktop. There was no clue in event viewer: everything went well then suddenly the system froze.

I remembered that an alternative OS loader entry was configured to bypass Hypervisor launch at startup. I selected this entry to see whether it was related to Hyper-V. Test results indicated the freeze issue was strongly related to Hyper-V.

I tried to remember what I did before virtual machine configuration. I removed a virtual switch bridged to Surface Ethernet Adapter on my Surface Dock, then added a virtual switch bridged to VMware NAT adapter (which works better with Wireless network). Then I checked adapters in Network and Sharing Center, the old virtual switch didn’t get removed at all – and the “Remove” menu entry was unavailable. At last, I removed this adapter from Device Manager, and the issue was resolved.

This issue is likely related to OS inconsistency. When Hyper-V infrastructure attempts to initialize (bring up) all enabled network adapters including these Hyper-V virtual switches, the “removed” adapter is brought up, then enters failure state due to inconsistency in configuration. Because host operating system is a virtual machine on Hyper-V (with privileges), the host OS didn’t even get a chance to record what happened at that point.

The good thing is that I fixed it by myself; The bad thing is I missed an opportunity to report this bug and let Microsoft fix it.