Chinese LoongArch Architecture Evaluation (Part 2 of 3)

October 3rd, 2025 by Brian

Previously, we discussed the dev board and Linux setup. Now, time to fuzz the instruction set.

Loongshaker

Overview

Our first goal was to search for hidden or buggy instructions in the new LoongArch architecture. LoongArch is a RISC architecture with fixed-width (32-bit) instructions. As such, we decided to port armshaker to LoongArch. This program uses a brute force technique to fuzz all possible 32-bit instructions.

Fuzzer Details

Many of the possible 32-bit values will correspond to known, valid instructions. To speed up execution time, the fuzzer does not (by default) execute known instructions. Loongshaker uses libopcodes (part of GNU Binutils) and, optionally, Capstone to disassemble potential instructions. If they disassemble without error, they are not executed by default, although flags can force execution of all instructions if desired.

When Loongshaker encounters a 32-bit value that does not disassemble to a valid instruction, it attempts to execute the instruction. If the instruction executes and does not produce a SIGILL signal, the fuzzer marks it as hidden and logs this result. Instruction execution can also be manually forced, and Loongshaker provides features for setting register values as desired prior to execution.

The fuzzer supports using ptrace for execution of the instruction to reduce the risk of bringing down the main fuzzer process itself. It also supports setting register values before the instruction is executed and checking how registers change across instruction execution.

Fuzzer Results

After fuzzing the entire range of 32-bit values, the following hidden instructions were found. Our findings for the first two sets of instructions were later confirmed when we found this page documenting the same hidden instructions. We believe this page was published around the same time we made our findings.

1. Vector Floating Point Scale

The first set of hidden instructions found lie between 0x71448000 and 0x7144ffff. Analysis of register states before and after execution indicated that the instructions are of the form

Bits31:1514:109:54:0
FieldopcodeVkVjVd

where Vk, Vj, and Vd refer to vector register operands. This operation performs a single-precision floating point scale. That is, it scales the 4 single-precision floating point values in Vj by the 4 32-bit signed integer values in Vk and stores the result in Vd. Scaling refers to adjusting the exponent of a floating point value (i.e. scaling the floating point value x by the integer y performs the operation x * 2^y, or equivalently, adds y to x‘s exponent).

Full example: suppose Vk contains the value 0xffffffff_00000002_00000003_00000004 (note that 0xffffffff = -1), and Vj contains the value 0x3f800000_3f900000_3fc00000_3fe00000 (the encodings for the single-precision values 1.0, 1.125, 1.5, and 1.75). This instruction will set the destination register to 0x3f000000_40900000_41400000_41e00000, which are the encodings for the values 0.5, 4.5, 12.0, and 28.0. Notice that 0.5 = 1.0 * 2^-1, 4.5 = 1.125 * 2^2, 12.0 = 1.5 * 2^3, and 28.0 = 1.75 * 2^4.

Instructions between 0x71450000 and 0x71457fff are of the same form as the above instructions and perform the same operation that the above instructions do. The only difference is that these instructions operate on 64-bit wide data types (doubles and 64-bit integers) as opposed to 32-bit wide data types.

2. Vector Fill

The next set of hidden instructions lie between 0x729b8000 and 0x729bfc7f. Analysis revealed that the instructions are of the form

Bits31:1514:109:76:54:0
FieldopcodeUk5opcodeUj2Vd

where Vd is a vector register, and Uk5 and Uj2 are both unsigned immediate values. Note that bits 9:7 must be low for the instruction to execute. The operation this instruction performs depends on the value of Uj2 as follows:

When Uj2 is 0, this instruction fills Vd with bytes of increasing value, starting at Uk5, and repeating every 4 bytes. Example: the instruction 0x729ba000 (note Uk5=8) fills v0 (encoded as 0b00000) with the bytes 0x0b0a0908_0b0a0908_0b0a0908_0b0a0908.

When Uj2 is 1, the instruction performs similarly, but when repeating, the first byte is 1 larger. Example: the instruction 0x729bc420 (note Uk5=0x11 and Uj2=1) fills v0 with the bytes 0x17161514_16151413_15141312_14131211.

When Uj2 is 2, the instruction performs similarly to when Uj2 is 1. The only difference is that the starting value is Uk5 + 4. Example: the instruction 0x729bd840 (note Uk5=0x16 and Uj2=2) fills v0 with the bytes 0x201f1e1d_1f1e1d1c_1e1d1c1b_1d1c1b1a (note 0x1a = 0x16 + 4).

When Uj2 is 3, the instruction performs the same operation as when Uj2 is 0 but does not repeat. Example: the instruction 0x729b8460 (note Uj2=3 and Uk5=1) fills v0 with the bytes 0x100f0e0d_0c0b0a09_08070605_04030201.

3. movgr2fcsr and movfcsr2gr

The movgr2fcsr instruction appears to be buggy on our OS/hardware combination. While running the fuzzer, it would bring down the whole system around the instruction 0x0114c000, which is the start of the movgr2fcsr instructions. After some debugging, we’ve determined that setting bit 6 of the fcsr register using one of these instructions causes a system crash. As per the official documentation, this bit shouldn’t have any significance. This site seems to indicate that bit 6 of the fcsr register is used for floating point stack mode, which is used for x86 binary translation. Our CPU does not have support for x86 binary translation, but it does have support for MIPS binary translation.

The root cause of the crash appears to be this line in the Linux kernel, which is executed as part of context switching. This line contains the encoding for the instruction x86mftop, which is part of the x86 LBT extensions (which our CPU does not support). This instruction is only executed if bit 6 of the fcsr register is high, and there is a comment that reads TM bit is always 0 if LBT not supported. By setting bit 6 of the fcsr manually (which we can do with the movgr2fcsr instruction without any special privileges), we break this assumption and cause the kernel to execute an illegal instruction, resulting in a crash. Though the code has changed slightly, the same bug appears to be present in the latest Linux kernel.

Additionally, the movgr2fcsr will execute with any 5-bit immediate in the fcsr destination field, even though there are only 4 valid fcsrs per the manual. This caused the fuzzer to mark some variants of this instruction as hidden, since they went beyond the valid fcsr options. When executing the instruction movgr2fcsr with the 5-bit fcsr field as 0x1f, it causes the same crash as above, regardless of the value in the source register.

Similarly, the movfcsr2gr instruction also allows any 5-bit immediate in the fcsr field. When beyond the 4 valid fcsrs, it appears to simply zero the destination register.

The next step is microarchitectural attacks.