syscall emulation can wake up a futex waiting on the wrong address which leads simulation to fail with "Exiting @ tick 18446744073709551615 because simulate() limit reached"

Description

Patch at: https://gem5-review.googlesource.com/c/public/gem5/+/29777

gem5 d9cb548d83fa81858599807f54b52e5be35a6b03, full CLI:

Test executable source code: https://github.com/cirosantilli/linux-kernel-module-cheat/blob/6275f70ed8862d8fe4e58ca4524a6994d254be35/userland/posix/pthread_barrier.c That version of that repository uniquely specifies the Buildroot toolchain used to build that executable. I only reproduce with dynamic linking however (timing coincidence?) so no point in sharing the executable here.

At the end of the traces, we see for example:

in which cpu5 sleeps on address 4268172 and then just after cpu6 tries to wake up someone sleeping on 274876716432, and that wrongly wakes up cpu5. This type of "waking up the wrong thread" event likely leaves a thread sleeping forever on the previous address, and thus the failure.

I think I have a patch for this, I'll publish it soon.

Looking above, we confirm that there is actually an LLSC event that woke up CPU1

which was earlier sleeping on a futex:

Minimized example static example

I found one later one attached at tmp.out, same toolchain as mentioned previously:

Environment

None

Attachments

1

Activity

Show:
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Priority

Components

Created May 12, 2020 at 5:35 PM
Updated June 25, 2020 at 3:57 PM