Skip to content

lwip, submac: deadlock on user transmission #21843

@mguetschow

Description

@mguetschow

Description

Sending any (user) packet over 802154 submac using lwip stack will deadlock. I traced the issue down to a race condition between the main thread, which requests BH before setting the fsm state to PREPARE, and the lwip_netdev_mux thread who will happily try to handle the BH before while the fsm state is still RX.

This does not happen with GNRC as all submac interaction happens on a separate thread there.

Steps to reproduce the issue

Print out thread names on debug prints with

diff --git a/core/lib/include/debug.h b/core/lib/include/debug.h
index 620de78267..1d4bdef600 100644
--- a/core/lib/include/debug.h
+++ b/core/lib/include/debug.h
@@ -121,7 +121,7 @@ extern "C" {
  * @details If a variable is only accessed by `DEBUG()`, the compiler will
  *          warn about unused variables when `ENABLE_DEBUG` is set to `0`.
  */
-#define DEBUG(...) do { if (ENABLE_DEBUG) { DEBUG_PRINT(__VA_ARGS__); } } while (0)
+#define DEBUG(...) do { if (ENABLE_DEBUG) { puts(thread_get_name(thread_get_active())); DEBUG_PRINT(__VA_ARGS__); } } while (0)
 
 /**
  * @def DEBUG_PUTS
@@ -129,7 +129,7 @@ extern "C" {
  * @brief Print debug information to stdout using puts(), so no stack size
  *        restrictions do apply.
  */
-#define DEBUG_PUTS(str) do { if (ENABLE_DEBUG) { puts(str); } } while (0)
+#define DEBUG_PUTS(str) do { if (ENABLE_DEBUG) { puts(thread_get_name(thread_get_active())); puts(str); } } while (0)
 /** @} */
 
 /**

Enable debug prints for cpu/nrf52/radio/nrf802154/nrf802154_radio.c, drivers/netdev_ieee802154_submac/netdev_ieee802154_submac.c and /pkg/lwip/contrib/netdev/lwip_netdev.c.

Run LWIP_IPV6=1 make -C examples/networking/coap/gcoap_dtls BOARD=nrf52840dk flash term -j

Expected results

No race condition, submac stuff should be handled on a single thread I guess?

Actual results

coap get coap://[fe80::1]/st
2025-11-04 14:23:33,375 # coap get coap://[fe80::1]/
2025-11-04 14:23:33,379 # gcoap_cli: sending msg ID 64789, 6 bytes
2025-11-04 14:23:33,380 # main
2025-11-04 14:23:33,387 # IEEE802154 submac: ieee802154_submac_process_ev(): IEEE802154_FSM_STATE_RX + REQUEST_TX
2025-11-04 14:23:33,388 # main
2025-11-04 14:23:33,391 # [nrf802154] Device state: DISABLED
2025-11-04 14:23:33,391 # main
2025-11-04 14:23:33,394 # [nrf802154] Send a packet
2025-11-04 14:23:33,394 # main
2025-11-04 14:23:33,399 # [nrf802154] send: putting 64 bytes into the frame buffer
2025-11-04 14:23:33,399 # main
2025-11-04 14:23:33,406 # IEEE802154 submac: ieee802154_submac_bh_request(): post NETDEV_EVENT_ISR
2025-11-04 14:23:33,406 # main
2025-11-04 14:23:33,409 # [lwip_netdev] NETDEV_EVENT_ISR
2025-11-04 14:23:33,410 # lwip_netdev_mux
2025-11-04 14:23:33,413 # [lwip_netdev] handle netdev isr
2025-11-04 14:23:33,414 # lwip_netdev_mux
2025-11-04 14:23:33,419 # IEEE802154 submac: _isr(): NETDEV_SUBMAC_FLAGS_BH_REQUEST
2025-11-04 14:23:33,421 # lwip_netdev_mux
2025-11-04 14:23:33,428 # IEEE802154 submac: ieee802154_submac_process_ev(): IEEE802154_FSM_STATE_RX + BH
2025-11-04 14:23:33,429 # lwip_netdev_mux
2025-11-04 14:23:33,431 # RX--(BH)->INVALID
2025-11-04 14:23:38,382 # gcoap: timeout for msg ID 64789

and deadlock because the main process waits for TX_DONE.

Versions

Current master.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Area: networkArea: NetworkingType: bugThe issue reports a bug / The PR fixes a bug (including spelling errors)

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions