4 RISC-V Compressed ISA V1.9
Since the original RISC ISAs did not leave sufficient opcode space free to include these
unplanned compressed instructions, they were instead developed as complete new ISAs. This
meant compilers needed different code generators for the separate compressed ISAs. The first
compressed RISC ISA extensions (e.g., ARM Thumb and MIPS16) used only a fixed 16-bit in-
struction size, which gave good reductions in static code size but caused an increase in dynamic
instruction count, which led to lower performance compared to the original fixed-width 32-bit
instruction size. This led to the development of a second generation of compressed RISC ISA
designs with mixed 16-bit and 32-bit instruction lengths (e.g., ARM Thumb2, microMIPS, Pow-
erPC VLE), so that performance was similar to pure 32-bit instructions but with significant
code size savings. Unfortunately, these different generations of compressed ISAs are incompati-
ble with each other and with the original uncompressed ISA, leading to significant complexity in
documentation, implementations, and software tools support.
Of the commonly used 64-bit ISAs, only PowerPC and microMIPS currently supports a
compressed instruction format. It is surprising that the most popular 64-bit ISA for mobile
platforms (ARM v8) does not include a compressed instruction format given that static code
size and dynamic instruction fetch bandwidth are important metrics. Although static code size
is not a major concern in larger systems, instruction fetch bandwidth can be a major bottleneck
in servers running commercial workloads, which often have a large instruction working set.
Benefiting from 25 years of hindsight, RISC-V was designed to support compressed instruc-
tions from the outset, leaving enough opcode space for RVC to be added as a simple extension
on top of the base ISA (along with many other extensions). The philosophy of RVC is to reduce
code size for embedded applications and to improve performance and energy-efficiency for all
applications due to fewer misses in the instruction cache. Waterman shows that RVC fetches
25%-30% fewer instruction bits, which reduces instruction cache misses by 20%-25%, or roughly
the same performance impact as doubling the instruction cache size [4].
1.3 Compressed Instruction Formats
Table 1.1 shows the eight compressed instruction formats. CR, CI, and CSS can use any of the
32 RVI registers, but CIW, CL, CS, and CB are limited to just 8 of them. Table 1.2 lists these
popular registers, which correspond to registers x8 to x15. Note that there is a separate version of
load and store instructions that use the stack pointer as the base address register, since saving to
and restoring from the stack are so prevalent, and that they use the CI and CSS formats to allow
access to all 32 data registers. CIW supplies an 8-bit immediate for the ADDI4SPN instruction.
The RISC-V ABI was changed to make the frequently used registers map to registers x8–x15.
This simplifies the decompression decoder by having a contiguous naturally aligned set of register
numbers, and is also compatible with the RV32E subset base specification, which only has 16
integer registers.
Compressed register-based floating-point loads and stores also use the CL and CS formats respec-
tively, with the eight registers mapping to f8 to f15.
The standard RISC-V calling convention maps the most frequently used floating-point registers
to registers f8 to f15, which allows the same register decompression decoding as for integer
register numbers.
The formats were designed to keep bits for the two register source specifiers in the same place in all
instructions, while the destination register field can move. When the full 5-bit destination register
specifier is present, it is in the same place as in the 32-bit RISC-V encoding. Where immediates
are sign-extended, the sign-extension is always from bit 12. Immediate fields have been scrambled,
as in the base specification, to reduce the number of immediate muxes required.