====== Registers ====== The CPU registers are the closest place to the processor to store the data. However, not all registers are meant to store data. There might be more specialised registers to configure the processor, or additional modules for it. Microcontrollers have integrated special peripherals to perform communication tasks, convert analogue voltage signals into digital values, and vice versa, and many other peripherals. Each of them has its dedicated registers to store configuration and data. These registers are not located near the CPU but rather in memory, primarily in the specified RAM address range. Information about these peripheral module registers can be found in the reference manuals and/or the programmers' manuals. Some microcontrollers store all information in the datasheets. This section will focus on the CPU and the registers closest to it in the ARMv8 architecture. ===== CPU registers===== AArch64 provides 31 general-purpose registers (R0..R30) with 64 bits. A 64-bit general-purpose register is named X0 to X30, and a 32-bit register is named W0 to W30, like in the picture above. Both registers are the same physically – the programmer can choose when to use 32-bit or 64-bit registers. The term ‘Rn’ refers to architectural registers, not the registers to be used in the assembler code. {{ :en:multiasm:paarm:registersizes.svg |}} Note that accessing the W0 or W1 register does not allow access to the remaining 32 most significant bits. Also, when the W register is written in a 32-bit register, the top 32 bits (most significant bits of the 64-bit register) are zeroed. And there are no registers named R0 or R1, so if we need to access the 64-bit register result, we need to address it with X0 or X1 (or other register up to X30), and similarly with 32-bit registers – W0, W1 and so on are used to address general-purpose registers. These examples perform single 32-bit arithmetic operations: {{ :en:multiasm:paarm:arthmetic32bit.svg |}} These examples perform single 64-bit arithmetic operations: {{ :en:multiasm:paarm:arithmetic64bit.svg |}} Special registers like the Stack Pointer (SP), Link Register (LR), and Program Counter (PC) are available. The Link Register (LR) is stored in the X30 register. The Stack Pointer SP and the Program Counter PC registers are no longer available as regular general-purpose registers. It is still possible to use the SP register with a limited set of data-processing instructions via the WSP register name. Unlike ARMv7, the PC register is no longer accessible via data-processing instructions. The PC register can be read by ‘ADR’ instruction, i.e. ‘ADR, X17, . ’ – the dot ‘.’ means “here” and register X17 will be written with the exact PC register value. Some branch instructions and some load/store operations implicitly use the value of the PC register. Note that the PC and SP are general-purpose registers in the A32 and T32 instruction sets, but this is not the case in the A64 instruction set. Someone may have noticed that, for A64 instructions, 32 registers in total can be addressed, but only 31 general-purpose registers. Accessing the X31 register returns either the current stack pointer or the zero register, depending on the instruction. Writing to R31 will not have any effect. The register X31 is called the Zero Register. The Zero Register, ZXR, and WZR (64- or 32-bit wide) are always read as zero and ignore writes. ^ ^ General-purpose registers ^ Dedicated registers ^ Explanation ^ | Architectural name | R0, R1, R2, R3, .., R29, R30 | SP, ZR | These names are mainly used in documentation about architecture and instruction sets. | |64-bit | X0, X1, X2, X3, .., X29, X30 | SP, XZR | The ‘x’ stands for an extended word. All 64 bits are used | |32-bit | W0, W1, W2, W3, .., W29, W30 | WSP, WZR | The ‘w’ stands for word. Only the bottom (least significant) 32 bits are used |
Example adding two registers together with different register notation. {{ :en:multiasm:paarm:exampleregnotations.svg |}} The main difference between 64-bit and 32-bit register operations is that the result is calculated using only the 32 least significant bits of the whole register (the 32 most significant bits are greyed out). Also, the result: writing the result in the register zeroes out the 32 most important bits (the red zeroes). ARMv8 has an additional 32 register set for floating-point and vector operations, like general-purpose registers. These registers are 128 bits wide and, like general-purpose registers, can be accessed in several ways. The letters for these registers identify byte (Bx), half-word (Hx), single-word (Sx), double-word (Dx) and quad-word (Qx) access. {{ :en:multiasm:paarm:vectorreg1.svg |}} {{ :en:multiasm:paarm:vectorreg2.svg |}} More information on these registers and operations performed with floating-point is described in the following section, “Advanced Assembly Programming”. ===== CPU Configuration===== The Raspberry Pi 5 has an ARM Cortex-A76 processor with 4 CPU cores. Each core has its own stack pointers, status registers and other registers. Before looking at CPU registers, some specifics must be explained. The single core has several execution levels: EL0, EL1, EL2, and EL3. These execution levels in datasheets are called Exception Levels – the level at which the processor resources are managed. EL0 is the lowest level; all user applications are executed at this level. EL1 is meant for operating systems; EL2 is intended for a Hypervisor application to control resources for the OS and the lower exception layers. The CPU's general-purpose registers are independent of Exception levels, but it is essential to understand which Exception Level executes the code. This is called “System configuration” because the processor has multiple cores, and each core has multiple exception levels. To configure the system and access the system registers, the MRS and MSR instructions must be used. Note that the registers that have the suffix “_ELn” have a separate, banked copy in some or all of the levels, except for EL0. This suffix also defines the lowest exception level, which can access the particular system register. Only a few system registers are accessible from EL0, though the Cache Type Register (CTR_EL0) is one of them. ''MRS X0, CTR_EL0 @ Move CTR_EL0 into X0 – now the X0 register can be modified'' ''MSR CTR_EL0, X0 @ Move X0 into CTR_EL0 – write back the modifications'' {{:en:multiasm:paarm:exceptionlevels_new.jpg?600|}} In the image, all of the exception levels are visualised. The Orange area is so-called the untrusted or non-secure state. The region with a blue background is the Operating System and its parts and applications. User applications can request resources using SVC (supervisor calls), or on Raspberry Pi OS (and others), this is called SysCalls. The operating system is treated as a separate program on the exception level EL1 from the EL2 perspective. If the hypervisor is available, the OS may request resources via HVC (Hypervisor calls), and the hypervisor can request resources from the secure monitor via SMC (Secure monitor calls). On Raspberry Pi 5, the bootloader runs on EL3, loading memory and initialising the hardware. Then the operating system is started at the EL1 level, and the rest of the applications in the OS are at the EL0 level. Raspberry Pi 5 does not have hypervisor software, which is why Exception Level 2 is not used. The Green region is a Secure State where only special secure applications and operating systems are executed. This may be used in system duplication, where two identical systems must run, with the second used for integrity checks, fault and error detection in the central system that runs in a non-secure state. Note that both secure and non-secure states are isolated, and the resources can be shared only through the EL3 level. We will look only at AArch64 registers to narrow the number of registers. There are many registers dedicated to the CPU. Specialised registers will be left aside again to narrow the amount of information, and only those registers meant for program execution will be reviewed. As there is more than one core, each core must have a dedicated status register. All registers that store some status on AArch64 CPU cores are collected in the table below. Don’t get confused by many of these listed status registers. The registers whose names are in bold are relevant to the programming because only those registers store the actual status of instruction execution. ^ Register ^ description ^ | AFSR0_EL1..3 and AFSR1_EL1..2 | Auxiliary Fault Status Register 0/1 (EL1..EL3) provides additional fault information for exceptions taken to EL1, EL2 or EL3. | | DBGAUTHSTATUS_EL1 | The Debug Authentication Status Register provides information about the debug authentication interface's state. | | DISR_EL1 | The Deferred Interrupt Status Register stores the records that an ESB (Error synchronisation barrier) instruction has consumed an SError (System Error) exception. | | DSPSR_EL0 | The Debug Saved Program Status Register holds the saved process state for the Debug state. When entering the Debug state, PSTATE information is written in this register. Values are copied from this register to PSTATE on exiting the Debug state.| | ERXGSR_EL1 | The Selected Error Record Group Status Register shows the status for the records in a group of error records. Accesses ERRGSR for the group of error records selected by ERRSELR_EL1.SEL[15:6]. | | ERXSTATUS_EL1 | Selected Error Record Primary Status Register Accesses ERRSTATUS for the error record selected by ERRSELR_EL1.SEL | | **FPSR** | The Floating-point Status Register provides floating-point system status information. | | ICH_EISR_EL2 | Interrupt Controller End of Interrupt Status Register indicates which List registers have outstanding EOI (End Of Interrupt) maintenance interrupts. | | ICH_ELRSR_EL2 | Interrupt Controller Empty List Register Status Register. These registers can locate a usable List register when the hypervisor delivers an interrupt to a VM (Virtual Machine). | | IFSR32_EL2 | The Instruction Fault Status Register (EL2) allows access to the AArch32 IFSR register only from AArch64 state. Its value does not affect execution in AArch64 state. | | ISR_EL1 | Interrupt Status Register shows the pending status of IRQ and FIQ interrupts and SError exceptions. | | MDCCSR_EL0 | Monitor DCC Status Register is a read-only register containing control status flags for the DCC (Debug Communications Channel) | | OSLSR_EL1 | OS Lock Status Register provides the status of the OS Lock. | | **SPSR_EL1..3** | The Saved Program Status Register (EL1..EL3) holds the saved process state when an exception is taken to EL1, EL2, or EL3. | | **SPSR_abt** | Saved Program Status Register (Abort mode) holds the saved process state when an exception is taken to Abort mode. | | **SPSR_fiq** | The Saved Program Status Register (FIQ mode) holds the saved process state when an exception is taken into FIQ mode. | | **SPSR_irq** | Saved Program Status Register (IRQ mode) holds the saved process state when an exception is taken to IRQ mode. | | **SPSR_und** | Saved Program Status Register (Undefined mode) holds the saved process state when an exception is taken to Undefined mode. | | TFSRE0_EL1 | The Tag Fault Status Register (EL0) holds accumulated Tag Check Faults occurring in EL0 that are not taken precisely. | | TFSR_EL1..3 | Tag Fault Status Register (EL1..EL3) holds accumulated Tag Check Faults occurring in EL1, EL2 or EL3 that are not taken precisely | | TRCAUTHSTATUS | The Trace Authentication Status Register provides information about the authentication interface's state for debugging. The CoreSight Architecture Specification offers more information. | | TRCOSLSR | Trace OS Lock Status Register returns the status of the Trace OS Lock | | TRCRSR | The Trace Resources Status Register is used to set or read the status of the resources. | | TRCSSCSR | Trace Single-shot Comparator Control Status Register returns the status of the corresponding Single-shot Comparator Control. | | TRCSTATR | Trace Status Register returns the trace unit status. | | VDISR_EL2..3 | Virtual Deferred Interrupt Status Register (EL2..EL3) Records that a SError exception has been consumed by an ESB instruction executed at EL1 or EL2. |
some of statsu registers for ARMv8 processor
You can see how many states this processor has. Not all of them are used during program execution. Many registers are related to debugging and resource management. On the Raspberry Pi, the OS and bootloader have already configured all CPU Cores and the registers. Trace registers are used only when hardware debugging is enabled, such as JTAG or TRACE32. Summarising: in Cortex-A76, ARMV8.2-A, installed on Raspberry Pi 5, on EL1, the system OS (kernel) is running. Since the OS is booted from the SD card, there is no Hypervisor software, which means EL2 is not used, and the Linux OS is the only one running on the Raspberry Pi 5 board. All users’ software runs in EL0. However, any code executed by the OS kernel runs at EL1, and it can be designed and executed as a kernel module. There are rules for creating a Linux OS kernel module – it must contain functions that initialise the module and exit when the job is finished. The skeleton for the kernel module is given below in C. It will require a GCC compiler to compile the code, but inline assembly can be written directly in the code itself. After changing the Exception level from EL0 to EL1, only some system instruction executions will be allowed. Linux kernel module example // mymod.c #include #include static int __init mymod_init(void) { asm volatile( "mrs x0, CurrentEL\n" "lsr x0, x0, #2\n" // EL >> 2 // x0 now contains current EL (expect 1) ); pr_info("mymod: running in EL1\n"); return 0; } static void __exit mymod_exit(void) { pr_info("mymod: exit\n"); } module_init(mymod_init); module_exit(mymod_exit); MODULE_LICENSE("GPL"); There are restrictions on the use of privileged instructions in the code. In EL0, privileged instruction execution will trap into the kernel. Note that switching between EL0 and EL1 is allowed only in the kernel and firmware. The firmware code will require access to the whole chip documentation, and at the moment, this documentation is confidential. So, the only option left is to design a kernel module and use the available EL1 resources.