px-fwlib 0.10.0
Cross-platform embedded library and documentation for 8/16/32-bit microcontrollers generated with Doxygen 1.9.2
01 Flashing an LED in assembler

1. Introduction

KEEP CALM and DON'T PANIC :)

As part of the journey to becoming a true embedded hero you have to take a peek under the hood to see what is really happening on the bare metal level. It may sound crazy to dive right into the deep end on the assembly / machine code level first, but it will provide insight on what the C code is trying to achieve and how to debug tricky situations like optimized code, hard faults, stack overflows, etc. You do not need to be proficient at writing assembly at an expert level, just know enough to single step debug and follow the assembly flow.

Reference(s):

2. Brief description

This example flashes the LED at 1 Hz (500 ms on; 500 ms off). On the PX-HER0 Board, the LED is wired to Port H pin 0. PH0 must be configured as a digital output pin. Set PH0 high to enable the LED and set PH0 low to disable the LED.

3. Single step debug

Go right ahead and start a debugging session! There's no better teacher than single stepping through the code and observing the processor core and peripheral registers:

4. Assembly instruction tips

If you need more info regarding an instruction, search for it in [2]. For example, on line 121, the instruction subs r0, #1 is used. Searching for "subs", an explanation is found in [2] "3.5.1 ADC, ADD, RSB, SBC, and SUB" (page 54).

The "sub" part of the instruction means that it is a subtraction operation and the suffix "s" means that the condition code flags will be updated on the result of the operation. For example, if the result is zero, the Zero Flag (Z) will be set.

The condition flags (N = Negative Flag, Z = Zero Flag, C = Carry Flag, V = Overflow Flag) are stored in the Application Program Status Register (APSR) and can also be viewed in the Registers window during debugging (you may need to scroll down to view it).

On line 122 the instruction "bne _delay_loop" is used. The "b" part means that it is a branch instruction and the suffix "ne" (not equal) means that the branch should be taken if the Zero Flag (Z) is not set. See [2] "Table 17. Condition code suffixes" (page 44) for a summary.

When inspecting the Extended Listing File "flashing_led.lss" generated by the dissasembler you will notice that some instructions have a ".n" or ".w" suffix, for example:

800003e:   e7f6        b.n 800002e <_main_loop>

The ".n" suffix simply means that the narrow 16-bit version of the branch instruction has been used. The ".w" means that the wide 32-bit version of the instruction has been used. Observe that the narrow version of the instruction is encoded in two bytes of machine code: 0xe7 and 0xf6.

5. Building the project with a Makefile and Linker script

The project is built with an introductory Makefile to demonstrate that it's not that hard to understand and use. For a gentle introduction to Make, see 7.2 How to understand and modify Makefiles.

The linker places the code, data and variables into the right memory locations using the introductory linker script "stm32l072xb.ld".

See [1] "2.2 Memory organization" (page 57) for more information.

6. Machine code

The final executable "flashing_led.bin" is only 108 bytes of machine code:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000  00 50 00 20 15 00 00 08 10 00 00 08 12 00 00 08
00000010  FE E7 FE E7 0E 48 01 68 0E 4A 11 43 01 60 0E 48
00000020  01 68 0E 4A 11 40 0E 4A 11 43 01 60 0D 48 0C 49
00000030  01 60 00 F0 05 F8 0C 49 01 60 00 F0 01 F8 F6 E7
00000040  01 B5 04 20 00 04 C0 46 01 38 FC D1 01 BD 00 00
00000050  2C 10 02 40 80 00 00 00 00 1C 00 50 FD FF FF FF
00000060  01 00 00 00 18 1C 00 50 00 00 01 00

The example has been crafted to demonstrate specific concepts and it could have been made even smaller!

The first 4 bytes (00 50 00 20) assembled in little-endian order as a 32-bit value (0x2000_5000) tells the processor core to set the Stack Pointer register (SP / R13) to the end of SRAM on reset. The STM32L072RB has 20kB of SRAM (20 x 1024 = 2048 = 0x5000) and starts at 0x2000_0000.

The next 32-bit value (0x0800_0B15) tells the processor core to set the Program Counter register (PC / R15) to the start of the main() function on reset. This is the address of the first instruction that will be executed after reset. Flash memory starts at 0x0800_0000, but is also mapped to 0x0000_0000.

See [2] "2.3.4 Vector table" (page 29) for more info.

The address of main() is actually 0x0800_0B14, but the least significant bit is set to indicate to the processor core that it is jumping to a 16-bit Thumb assembly instruction and not a 32-bit ARM instruction. This is done to make the code compatible with other ARM processor cores that do support switching between ARM and Thumb instructions.

It will result in a Hard Fault if the least significant bit is not set.

7. Peripheral access

The communication interface with peripherals is memory mapped: the processor core communicates with a peripheral by writing to or reading from specific memory addresses.

[1] "9. General-purpose I/Os (GPIO)" (page 234) documents how to use the GPIO peripheral. Feel free to skip ahead and scan [1] "9.4 GPIO registers" (page 243) first. After all, this part documents the actual interface to the peripheral (the "buttons & lights" section).

The memory map of peripherals is listed in [1] "2.2.2 Memory map and register boundary addresses" (page 58). The base address of GPIOH is 0x5000_1C00. The offset of GPIOx_MODER register is 0x00, so the address of GPIOH_MODER is 0x5000_1C00. The address of GPIOH_BSRR register is 0x5000_1C18 (offset = 0x18).

Assume that a peripheral's clock is disabled on start up, unless the documentation proves otherwise.

As a general rule you always need to enable a peripheral's clock before you can use it. Imagine an ARM Cortex microcontroller being a mansion with a large number of rooms. A frugal person would not leave all the lights on, but only switch the lights on in the rooms that is used, otherwise the electricity bill would be huge. Likewise the clocks to peripherals are disabled by default to save power. It is also important to know that a higher clock frequency uses more power and therefor it is desirable to run a peripheral at the lowest acceptable frequency. The lower peripheral frequency may incur a communication penalty as the faster processor core has to wait for the slower peripheral register to return a valid value.

The first step is thus to enable the clock to GPIOH by setting bit 7 in the RCC_IOPENR register (address 0x4002_102C). See [1] "7.3.12 GPIO clock enable register (RCC_IOPENR)" (page 202).

PH0's mode must be changed from analog to digital output by setting GPIOH_MODER[1:0] to 01. See [1] "9.4.1 GPIO port mode register (GPIOx_MODER) (x =A..E and H)" on page 243.

The LED is enabled by setting PH0 high. This is acomplished by writing a 1 to GPIOH_BSRR[0].

The LED is disabled by setting PH0 low. This is acomplished by writing a 1 to GPIOH_BSRR[16].

8. Delay loop

A ~500 ms delay is achieved by wasting a large number of instruction clock cycles in an empty count down loop. The counter value can be calculated if the processor core clock frequency is known as well as how many clock cycles each instruction takes. See HERE for an instruction clock cycle summary.

9. Stack

The Delay() function also provides an opportunity to observe the registers and stack when it is called using a "bl" instruction (branch link):

When the branch is taken, the return address is stored in the Link Register (LR / R13). The initial value of the Stack Pointer (SP / R13) is 0x2000_5000. The push {r0, lr} instruction stores the two register values on the stack (in little-endian order) and SP decreases to 0x2000_4FF8:

See [2] "2.1.2 Stacks" (page 12) and also [2] "3.6 "Branch and control instructions" (page 65) for more info.

10. Source code

File(s):

  • arch/arm/stm32/tutorials/01_asm/flashing_led.S
1/* =============================================================================
2 ____ ___ ____ ___ _ _ ___ __ __ ___ __ __ TM
3 | _ \ |_ _| / ___| / _ \ | \ | | / _ \ | \/ | |_ _| \ \/ /
4 | |_) | | | | | | | | | | \| | | | | | | |\/| | | | \ /
5 | __/ | | | |___ | |_| | | |\ | | |_| | | | | | | | / \
6 |_| |___| \____| \___/ |_| \_| \___/ |_| |_| |___| /_/\_\
7
8 Copyright (c) 2019 Pieter Conradie <https://piconomix.com>
9
10 License: MIT
11 https://github.com/piconomix/px-fwlib/blob/master/LICENSE.md
12
13============================================================================= */
14
15/* Use 16-bit Thumb instruction set */
16.thumb
17/* Use modern UAL (Unified Assembler Language) common syntax for ARM and Thumb instructions */
18.syntax unified
19
20/* RCC_IOPENR (Reset and Clock Control - GPIO clock enable register) address */
21.equ RCC_IOPENR, 0x4002102c
22/* GPIOH_MODER (GPIO port H mode register) address */
23.equ GPIOH_MODER, 0x50001c00
24/* GPIOH_BSRR (GPIO port H bit set/reset register) address */
25.equ GPIOH_BSRR, 0x50001c18
26
27/* ISR_VECTOR section contains the ARM vector table; see linker script "stm32l072xb.ld" */
28.section .isr_vector,"a",%progbits
29.type vector_table, %object
30vector_table:
31 .word 0x20005000 /* Set SP (Stack Pointer) to end of SRAM */
32 .word main /* Set PC (Program Counter) to main() function address */
33 .word nmi_handler /* Address of NMI (Non Maskable Interrupt) handler function */
34 .word hard_fault_handler /* Address of Hard Fault handler function */
35
36/* TEXT section holds executable instructions and constant data; see linker script "stm32l072xb.ld" */
37.text
38
39/* nmi_handler() function */
40.global nmi_handler
41.type nmi_handler, %function
42nmi_handler:
43 /* Loop here forever */
44 b nmi_handler
45
46/* hard_fault_handler() function */
47.global hard_fault_handler
48.type hard_fault_handler, %function
49hard_fault_handler:
50 /* Loop here forever */
51 b hard_fault_handler
52
53/* main() function */
54.global main
55.type main, %function
56main:
57 /* Read RCC_IOPENR register value */
58 ldr r0, =RCC_IOPENR
59 ldr r1, [r0]
60 /* Set RCC_IOPENR register bit 7 to enable clock of Port H */
61 ldr r2, =0x00000080
62 orrs r1, r2
63 str r1, [r0]
64 /* Read GPIOH_MODER register value */
65 ldr r0, =GPIOH_MODER
66 ldr r1, [r0]
67 /* Set MODE0[1:0] = 01 (General purpose output mode) */
68 ldr r2, =0xfffffffd
69 ands r1, r2
70 ldr r2, =0x00000001
71 orrs r1, r2
72 /* Write updated GPIOH_MODER register value */
73 str r1, [r0]
74
75 /* Load R0 with GPIOH_BSRR address */
76 ldr r0, =GPIOH_BSRR
77_main_loop:
78 /* Set PH0 output to enable LED */
79 ldr r1, =0x00000001
80 str r1, [r0]
81 /* Call delay function; Return address is stored in LR (Link Register / R13) */
82 bl delay
83 /* Clear PH0 output to disable LED */
84 ldr r1, =0x00010000
85 str r1, [r0]
86 /* Call delay function; Return address is stored in LR (Link Register / R13) */
87 bl delay
88 /* Loop forever */
89 b _main_loop
90
91/*
92 * delay() function:
93 *
94 * Wait ~ 500 ms. Default system clock is 2.1 MHz after startup from reset
95 * which means that one instruction clock cycle is 476 ns.
96 */
97.global delay
98.type delay, %function
99delay:
100 /*
101 * Save R0 and LR (Link Register / R13) register content on stack (SP / R14).
102 * LR contains the program address to return to when function finishes.
103 */
104 push {r0, lr}
105 /*
106 * Set R0 to 0x00040000 (262 144) by loading it with 0x04 and shifting
107 * it left by 16 bits (multiplying it with 2^16 = 65 536)
108 */
109 movs r0, #0x04
110 lsls r0, #16
111 /*
112 * These two 16-bit Thumb instructions use less program memory space than
113 * using a "ldr r0, =0x00040000" instruction, because the latter generates a
114 * 16-bit instruction plus a 32-bit constant value that is stored in
115 * program memory (at the end in the literal pool).
116 */
117
118 /* Decrement R0 until it is zero; this is the actual delay loop */
119_delay_loop:
120 nop /* 1 instruction clock cycle */
121 subs r0, #1 /* 1 instruction clock cycle */
122 bne _delay_loop /* 2 instruction clock cycles if branch is taken */
123 /*
124 * Restore R0 saved value from stack and restore LR (Link Register / R13)
125 * saved value directly into PC (Program Counter / R15) to return.
126 */
127 pop {r0, pc}
128
129/* Place literal pool (constants) here after the main() and delay() function */
130.pool
131