### Hiding in the Shadows: Empowering ARM for Stealthy Virtual Machine Introspection ACSAC 2018

Sergej Proskurin, <sup>1</sup> Tamas Lengyel, <sup>3</sup> Marius Momeu, <sup>1</sup> Claudia Eckert, <sup>1</sup> and Apostolis Zarras <sup>2</sup>

<sup>1</sup>Technical University of Munich <sup>2</sup>Maastricht University <sup>3</sup>The Honeynet Project

6th of December 2018

### Stealthy malware analysis on ARM!

# Motivation & Background Virtual Machine Introspection Recap

















Virtual Machine Introspection Recap



Technical University of Munich





















#### Split-personality malware

• Employ anti-virtualization to reveal a VMM (red pills)

#### Perfect VM transparency is not feasible

Insufficient to reveal virtual environments alone!

#### More interesting to know whether the system is being analyzed

 $\rightarrow$  Hide analysis artifacts from the guest





Technica

- **1** Intercept the guest in kernel-space
- **2** A stealthy single-stepping mechanism
- **3** Execute-only memory

#### Use instructions as a trigger to trap into the VMM

• E.g., software breakpoints (BRK/BKPT instruction)

#### Better: Secure Monitor Call instruction (SMC)

- Guest is not able to subscribe to SMC traps
- SMC traps do not have to be re-injected into the guest
- Can only be executed in the guest's kernel



| [   | .]        |
|-----|-----------|
| mov | x8, #0x3f |
| svc | #0×0      |

#### Kernel-Space:

```
SyS_read:
stp x29, x30, [sp, #-64]!
mov x29, sp
stp x21, x22, [sp, #32]
[...]
```

#### Use instructions as a trigger to trap into the VMM

• E.g., software breakpoints (BRK/BKPT instruction)

#### Better: Secure Monitor Call instruction (SMC)

- Guest is not able to subscribe to SMC traps
- SMC traps do not have to be re-injected into the guest
- Can only be executed in the guest's kernel

| User-Space:                                  |  |
|----------------------------------------------|--|
| []<br>mov x8, #0x3f<br>svc #0x0              |  |
| Kernel-Space:                                |  |
| SyS_read:<br>smc #0x0 ←                      |  |
| mov x29, sp<br>stp x21, x22, [sp, #32]<br>[] |  |

#### Use instructions as a trigger to trap into the VMM

• E.g., software breakpoints (BRK/BKPT instruction)

#### Better: Secure Monitor Call instruction (SMC)

- Guest is not able to subscribe to SMC traps
- SMC traps do not have to be re-injected into the guest
- Can only be executed in the guest's kernel

#### Issues: How to remain stealthy and in control?

- 4 Removing tap points introduces race conditions
- 4 No hardware support for stealthy single-stepping

| User-Space:                                                            |   |
|------------------------------------------------------------------------|---|
| []<br>mov x8, #0x3f<br>svc #0x0                                        |   |
| Kernel-Space:<br>SyS_read:                                             | ) |
| <pre>smc #0x0<br/>mov x29, sp<br/>stp x21, x22, [sp, #32]<br/>[]</pre> |   |

- Attackers can reveal the analysis framework
- $\rightarrow$  We need a novel, stealthy single-stepping scheme

- Locate instruction boundaries without a disassembler
- Use two SMCs to single-step one instruction
- $\rightarrow$  Multi-vCPU safe!





- Attackers can reveal the analysis framework
- $\rightarrow$  We need a novel, stealthy single-stepping scheme

- Locate instruction boundaries without a disassembler
- Use two SMCs to single-step one instruction
- $\rightarrow$  Multi-vCPU safe!

| Original-View |          |     |          |             |      |  |  |
|---------------|----------|-----|----------|-------------|------|--|--|
| (×)<br>GFN1 ◀ | MFN:     | 1   | GFN2     | (x)<br>◀⋯►► | MFN2 |  |  |
| Original I    | Page     |     | Bad      | ckup Pa     | ge   |  |  |
| SMC           |          |     | In       | istr        | 1    |  |  |
| Instr         | 2        |     |          | SMC         |      |  |  |
| Instr         | 3        |     |          |             |      |  |  |
|               |          |     |          |             |      |  |  |
|               | 1        | VMI | D        | ~           |      |  |  |
|               |          | r 1 | 1        |             |      |  |  |
|               | GFN<br>1 |     | GFN<br>2 |             |      |  |  |
| Guest         | Phy      | sic | al       | Memory      | /    |  |  |
|               | MFN<br>1 |     | MFN<br>2 |             |      |  |  |
| Machine       | Ph       | ysi | cal      | Memor       | У    |  |  |



- Attackers can reveal the analysis framework
- $\rightarrow$  We need a novel, stealthy single-stepping scheme

- Locate instruction boundaries without a disassembler
- Use two SMCs to single-step one instruction
- $\rightarrow$  Multi-vCPU safe!

| Original-View  |          |       |                       |         |          |  |  |
|----------------|----------|-------|-----------------------|---------|----------|--|--|
| (×)<br>GFN1 ◀► | MFN1     | -     | (x)<br>GFN2 ◀⋯⋯► MFN2 |         |          |  |  |
| Original I     | Page     |       | Bad                   | ckup Pa | ige      |  |  |
| SMC            |          |       | In                    | istr    | 1        |  |  |
| Instr          | 2        |       |                       | SMC     |          |  |  |
| Instr          | 3        |       |                       |         |          |  |  |
|                |          |       |                       |         |          |  |  |
|                | V        | /MI   | D                     | ~       | <u> </u> |  |  |
|                |          | · · · |                       |         |          |  |  |
|                | GFN<br>1 |       | GFN<br>2              |         |          |  |  |
| Guest          | Phys     | sic   | al                    | Memory  | /        |  |  |
|                | MFN<br>1 |       | MFN<br>2              |         |          |  |  |
| Machine        | Phy      | ysi   | cal                   | Memor   | У        |  |  |



- Attackers can reveal the analysis framework
- $\rightarrow$  We need a novel, stealthy single-stepping scheme

- Locate instruction boundaries without a disassembler
- Use two SMCs to single-step one instruction
- $\rightarrow$  Multi-vCPU safe!

| Original-View  |          |                  |          |           |  |  |
|----------------|----------|------------------|----------|-----------|--|--|
| (×)<br>GFN1 ◀► | GFN2     | (×)<br>◀──► MFN2 |          |           |  |  |
| Original I     | Page     |                  | Bad      | ckup Page |  |  |
| SMC            |          | →                | In       | istr 1    |  |  |
| Instr          | 2        |                  |          | SMC       |  |  |
| Instr          | 3        |                  |          |           |  |  |
|                |          |                  |          |           |  |  |
|                | 1        | VMI              | D        |           |  |  |
|                |          | · `              | 1        | <u>_</u>  |  |  |
|                | GFN<br>1 |                  | GFN<br>2 |           |  |  |
| Guest          | Phy      | sic              | al       | Memory    |  |  |
|                | MFN<br>1 |                  | MFN<br>2 |           |  |  |
| Machine        | Ph       | ysi              | cal      | Memory    |  |  |



- Attackers can reveal the analysis framework
- $\rightarrow$  We need a novel, stealthy single-stepping scheme

- Locate instruction boundaries without a disassembler
- Use two SMCs to single-step one instruction
- $\rightarrow$  Multi-vCPU safe!

| Original-View |          |     |                         |           |  |  |
|---------------|----------|-----|-------------------------|-----------|--|--|
| (×)<br>GFN1 ◀ | MFN:     | 1   | (×)<br>GFN2 ◀····► MFN2 |           |  |  |
| Original I    | Page     |     | Bad                     | ckup Page |  |  |
| SMC           |          |     | In                      | istr 1    |  |  |
| Instr         | 2        |     |                         | SMC       |  |  |
| Instr         | 3        |     |                         |           |  |  |
|               |          |     |                         |           |  |  |
| -             | 1        | VMI | D                       |           |  |  |
|               |          | r 1 | 1                       | <u>_</u>  |  |  |
|               | GFN<br>1 |     | GFN<br>2                |           |  |  |
| Guest         | Phy      | sic | al                      | Memory    |  |  |
|               | MFN<br>1 |     | MFN<br>2                |           |  |  |
| Machine       | Ph       | ysi | cal                     | Memory    |  |  |



- Attackers can reveal the analysis framework
- $\rightarrow$  We need a novel, stealthy single-stepping scheme

- Locate instruction boundaries without a disassembler
- Use two SMCs to single-step one instruction
- $\rightarrow$  Multi-vCPU safe!

| Original-View |          |     |          |         |     |  |  |
|---------------|----------|-----|----------|---------|-----|--|--|
| (x)<br>GFN1   |          |     |          |         |     |  |  |
| Original I    | Page     |     | Bad      | ckup Pa | ige |  |  |
| SMC           |          |     | In       | istr    | 1   |  |  |
| Instr         | 2        | *   |          | SMC     |     |  |  |
| Instr         | 3        |     |          |         |     |  |  |
|               |          |     |          |         |     |  |  |
|               | Ń        | /MI | D        |         |     |  |  |
|               |          | r . | V.       |         |     |  |  |
|               | GFN<br>1 |     | GFN<br>2 |         |     |  |  |
| Guest         | Phy      | sic | al       | Memory  | /   |  |  |
|               | MFN<br>1 |     | MFN<br>2 |         |     |  |  |
| Machine       | Ph       | ysi | cal      | Memor   | У   |  |  |



- Attackers can reveal the analysis framework
- $\rightarrow$  We need a novel, stealthy single-stepping scheme

#### Leverage the fixed-width ISA for single-stepping

- Locate instruction boundaries without a disassembler
- Use two SMCs to single-step one instruction
- $\rightarrow$  Multi-vCPU safe!

#### How do we hide injected SMC instructions?

Employ Second Level Address Translation (SLAT)

| Original-View    |          |     |                      |         |    |  |
|------------------|----------|-----|----------------------|---------|----|--|
| (×)<br>GFN1 ◀ ►► | MFN      | 1   | (x)<br>GFN2 ◀ ➡ MFN2 |         |    |  |
| Original I       | Page     |     | Ba                   | ckup Pa | ge |  |
| SMC              |          |     | Ir                   | nstr    | 1  |  |
| Instr            | 2        |     |                      | SMC     |    |  |
| Instr            | 3        |     |                      |         |    |  |
|                  |          |     |                      |         |    |  |
|                  | ١        | VMI | Þ                    |         |    |  |
|                  |          |     | 1                    |         |    |  |
|                  | GFN<br>1 |     | GFN<br>2             |         |    |  |
| Guest            | Phy      | sic | al                   | Memory  | /  |  |
|                  | MFN<br>1 |     | MFN<br>2             |         |    |  |
| Machine          | Ph       | vsi | cal                  | Memor   | v  |  |

#### Technical University of Munich

#### Xen physical to machine (p2m) subsystem

- Uses Second Level Address Translation (SLAT)
- Translates guest-physical to machine-physical addresses
- Represents a single view on the guest's physical memory

Xen p2m allows to control access permissions of the guest's physical memory

Hide injected SMC instructions by withdrawing read-permissions

#### Technical University of Munich

#### Xen physical to machine (p2m) subsystem

- Uses Second Level Address Translation (SLAT)
- Translates guest-physical to machine-physical addresses
- Represents a single view on the guest's physical memory

Xen p2m allows to control access permissions of the guest's physical memory

Hide injected SMC instructions by withdrawing read-permissions

#### Issue: On integrity-checks permissions must be relaxed

- **4** Walking the page tables is slow
- 4 Another vCPU can access the memory without notifying the VMM

# Req. 2: (Stealthy) Single-Stepping





#### Xen alternate p2m (altp2m) subsystem

- Maintains different views on the guest's physical memory
- Allows to allocate and assign different memory views to vCPUs
- → Switch views instead of relaxing permissions in a global view!





#### Xen alternate p2m (altp2m) subsystem

- Maintains different views on the guest's physical memory
- Allows to allocate and assign different memory views to vCPUs
- → Switch views instead of relaxing permissions in a global view!





Xen alternate p2m (altp2m) subsystem

- Allows to remap same guest-physical to different machine-physical page frames
- $\rightarrow$  Facilitates, e.g., SMC injections in selected views





Issue: No ARM support

4 Xen altp2m exclusively used on Intel CPUs







Google Summer of Code

























































## Req. 3: Execute-only Memory on AArch64 Stealthy Single-Stepping



#### Putting everything together (on AArch64)

- Allocate two additional views: Execute- and Step-View
- Duplicate the original page twice
  - Replace instr 1 with SMC in shadow-copy'
  - Replace instr 2 with SMC in shadow-copy"
- Map both duplicates as execute-only

On read-requests, switch to the Original-View

Satisfies integrity checks





#### 1 Intercept the guest in kernel-space

- ✓ Secure Monitor Call (SMC) instruction
- A stealthy single-stepping mechanism
  - X ARM has no hardware support for stealthy single-stepping
  - ✓ Stealthy single-stepping via Xen altp2m (when combined with execute-only memory)

#### **3** Execute-only memory

- ✓ AArch64
- ✗ AArch32 lacks execute-only memory
- ✓ Splitting the TLBs to hide injected tap points on AArch32

# Implement Xen altp2m for ARM

• Equip DRAKVUF and LibVMI with our single-stepping primitives

#### Use DRAKVUF to trace system calls inside the guest VM

Build the foundation for stealthy monitoring on ARM

- HiKey LeMaker development board
- Guest runs a Linux v4.15 kernel
- Xen v4.11

Evaluation

System Setup

ACSAC 2018







Table: Monitoring overhead (OHD) of DRAKVUF utilizing Hardware-SS, Double-SMC-SS, and Split-TLB-SS primitives measured by Lmbench 3.0, in msec.

|                | ,       |          |          | Double-SMC |          |             |                 | Split-TLB |                 |             |                 |
|----------------|---------|----------|----------|------------|----------|-------------|-----------------|-----------|-----------------|-------------|-----------------|
| Benchmark      | w/o     | Hardware | (OHD)    | Step-View  | (OHD)    | Backup Page | (OHD)           | Step-View | (OHD)           | Backup Page | (OHD)           |
| fork+execve    | 1383.33 | 6053.67  | 4.38 ×   | 5567.33    | 4.02 ×   | 6033.00     | 4.36 ×          | 26690.66  | 19.29 ×         | 17057.00    | 12.33 ×         |
| fork+exit      | 377.43  | 835.52   | 2.21 ×   | 787.14     | 2.09 ×   | 924.83      | $2.45 \times$   | 5910.83   | $15.66 \times$  | 4225.83     | $11.20 \times$  |
| fork+/bin/sh   | 3249.17 | 12542.00 | 3.86 ×   | 11672.67   | 3.59 ×   | 12737.33    | 3.92 ×          | 53134.66  | $16.35 \times$  | 34231.33    | $10.54 \times$  |
| fstat          | 0.62    | 94.94    | 152.57 × | 78.65      | 126.40 × | 84.20       | $135.81 \times$ | 103.52    | $166.97 \times$ | 75.33       | 121.06 ×        |
| mem read       | 1745.00 | 1692.33  | 0.97 ×   | 1692.33    | 0.97 ×   | 1738.00     | $1.00 \times$   | 1730.33   | 0.99 ×          | 1735.33     | 0.99 ×          |
| mem write      | 4687.67 | 4310.00  | 0.92 ×   | 4308.33    | 0.92 ×   | 4715.00     | $1.00 \times$   | 4575.33   | $0.98 \times$   | 4602.00     | 0.98 ×          |
| open/close     | 5.44    | 202.67   | 37.25 ×  | 158.33     | 29.11 ×  | 179.26      | 35.95 ×         | 269.67    | $49.57 \times$  | 184.65      | 33.94 ×         |
| page fault     | 1.49    | 1.72     | 1.15 ×   | 1.74       | 1.16 ×   | 1.62        | $1.09 \times$   | 1.90      | $1.28 \times$   | 1.91        | $1.28 \times$   |
| pipe lat       | 12.26   | 371.92   | 30.34 ×  | 344.83     | 28.13 ×  | 425.28      | 34.69 ×         | 955.53    | $77.94 \times$  | 482.60      | 39.36 ×         |
| read           | 0.67    | 95.21    | 141.14 × | 79.10      | 117.27 × | 84.06       | $125.46 \times$ | 99.34     | $148.27 \times$ | 75.39       | 111.77 ×        |
| select 500 fd  | 28.33   | 124.62   | 4.40 ×   | 110.23     | 3.89 ×   | 114.51      | $4.04 \times$   | 124.47    | $4.39 \times$   | 113.85      | $4.02 \times$   |
| signal handle  | 4.34    | 189.67   | 43.70 ×  | 150.33     | 34.64 ×  | 154.13      | 35.51 ×         | 178.00    | $41.01 \times$  | 158.33      | $36.48 \times$  |
| signal install | 0.51    | 95.00    | 186.27 × | 72.00      | 141.18 × | 75.13       | $147.31 \times$ | 89.07     | $174.65 \times$ | 73.73       | $144.58 \times$ |
| stat           | 2.63    | 99.97    | 38.06 ×  | 80.73      | 30.74 ×  | 85.30       | 32.43 ×         | 105.58    | $40.14 \times$  | 83.57       | 31.82 ×         |
| syscall        | 0.31    | 94.21    | 299.05 × | 75.15      | 238.55 × | 83.49       | 269.32 ×        | 98.48     | 317.68 ×        | 78.84       | 250.26 ×        |
| write          | 0.47    | 95.34    | 203.32 × | 76.82      | 163.81 × | 83.86       | $178.43 \times$ | 103.22    | 219.62 ×        | 73.77       | $157.31 \times$ |



#### Establish the foundation for stealthy malware analysis on ARM

- Introduce Xen altp2m to ARM
- Stealthy single-stepping approach for AArch{32|64}
- De-synchronize the TLB architecture on AArch32

#### DRAKVUF on ARM is open-source:

- https://github.com/drakvuf-on-arm/drakvuf-on-arm
- https://youtu.be/mfhZBBdC-Jg (Demo!)

#### ARM does not support stealthy single-stepping

 $\rightarrow$  Attackers can infer the presence of the analysis framework

AArch32: Use hardware breakpoints ("mismatching") for single-stepping

- CPU generates a debug event on instructions following the breakpoint
- **4** Finite number of hardware breakpoints

AArch64: Use Software-Step exceptions (set MDSCR\_EL1.SS and PSTATE.SS of EL1)

- ARM forbids access to PSTATE.SS in all exception levels
- Spill PSTATE.SS into the guest-accessible SPSR\_EL1



#### Xen altp2m exclusively used on Intel

- The VMCS has capacity for up to 512 EPTPs (memory views)
- Introduced to Xen to add support for the EPTP Switching functionality
  - Combine VMFUNC instruction with Virtualization Exceptions #VE
  - → No additional VM Exit overhead on memory violations!

#### External monitors can use altp2m

 $\rightarrow$  Unique tool for VMI applications

Appendix 2: No Execute-only Memory on AArch32 Splitting the TLBs

#### AArch32 does not support execute-only memory

Code-pages must be executable and readable

ARM uses VMIDs as TLB-tags to isolate translations

 Allocate two views with same VMID to de-synchronize the iTLB from the dTLB

Prime the TLBs in Original-View:

- ITLB holds the SMC from Execute-View
- dTLB holds instr 1 from Original-View



Universit of Munich

## Appendix 2: No Execute-only Memory on AArch32 Splitting the TLBs

### AArch32 does not support execute-only memory Code-pages must be executable and readable

ARM uses VMIDs as TLB-tags to isolate translations

 Allocate two views with same VMID to de-synchronize the iTLB from the dTLB

Prime the TLBs in Original-View:

- ITLB holds the SMC from Execute-View
- dTLB holds instr 1 from Original-View



Technical University of Munich