Software prefetches may forward from stores

Description

Hi all,

Issue

I ran into this assert running a SPEC2k17 simpoint on a very large o3 machine.

build/ARM/cpu/o3/mem_dep_unit_impl.hh:601: MemDepUnit<MemDepPred, Impl>::MemDepEntryPtr& MemDepUnit<MemDepPred, Impl>::findInHash(const DynInstConstPtr&) [with MemDepPred = StoreSet; Impl = O3CPUImpl; MemDepUnit<MemDepPred, Impl>::MemDepEntryPtr = std::shared_ptr<MemDepUnit<StoreSet, O3CPUImpl>::MemDepEntry>; MemDepUnit<MemDepPred, Impl>::DynInstConstPtr = RefCountingPtr<const BaseO3DynInst<O3CPUImpl> >]: Assertion `hash_it != memDepHash.end()' failed.

The problem here is that an ARM PRFM PLDL1KEEP has a partial address match with a store, causing it to be rescheduled (gem5 terminology). However, from the PoV of the memory dependency tracker, the instructions is successful and so it is removed from the hashmap. When it goes back for issue, the hash cannot be found in the memory dependency tracker and the assert fires.

See the debug log :


24631852137354: global: DynInst: [sn:81436244] Instruction created. Instcount for system.switch_cpus = 505
24631852137354: system.switch_cpus.fetch: [tid:0] Instruction PC 0x45cd50 (0) created [sn:81436244].
24631852137354: system.switch_cpus: Adding [sn:81436244] to cpu inst list
24631852137354: system.switch_cpus.fetch: [tid:0] [sn:81436244] Sending instruction to decode from fetch queue. Fetch queue size: 5.
24631852138353: system.switch_cpus.decode: [tid:0] Processing instruction [sn:81436244] with PC (0x45cd50=>0x45cd54).(0=>1)
24631852138686: system.switch_cpus.rename: [tid:0] Processing instruction [sn:81436244] with PC (0x45cd50=>0x45cd54).(0=>1).
24631852138686: global: [sn:81436244] has 1 ready out of 1 sources. RTI 0)
24631852139685: system.switch_cpus.iew: [tid:0] Issue: Adding PC (0x45cd50=>0x45cd54).(0=>1) [sn:81436244] [tid:0] to IQ.
24631852139685: system.switch_cpus.iew.lsq: Inserting load PC (0x45cd50=>0x45cd54).(0=>1), idx:75 [sn:81436244]
24631852139685: system.switch_cpus.iq: Adding instruction [sn:81436244] PC (0x45cd50=>0x45cd54).(0=>1) to the IQ.
24631852139685: system.switch_cpus.memDep0: No dependency for inst PC (0x45cd50=>0x45cd54).(0=>1) [sn:81436244].
24631852139685: system.switch_cpus.memDep0: Adding instruction [sn:81436244] to the ready list.
24631852139685: system.switch_cpus.iq: Instruction is ready to issue, putting it onto the ready list, PC (0x45cd50=>0x45cd54).(0=>1) opclass:47 [sn:81436244].
24631852139685: system.switch_cpus.iq: Thread 0: Issuing instruction PC (0x45cd50=>0x45cd54).(0=>1) [sn:81436244]
24631852139685: system.switch_cpus.memDep0: Issuing instruction PC 0x45cd50 [sn:81436244].
24631852140018: system.switch_cpus.iew: Execute: Processing PC (0x45cd50=>0x45cd54).(0=>1), [tid:0] [sn:81436244].
24631852140018: system.switch_cpus.iew.lsq: Executing load PC (0x45cd50=>0x45cd54).(0=>1), [sn:81436244]
24631852140018: system.switch_cpus.iq: Rescheduling mem inst [sn:81436244]
24631852140018: system.switch_cpus.iew.lsq: Load [sn:81436244] not executed from fault
24631852140018: system.switch_cpus.iew: Sending instructions to commit, [sn:81436244] PC (0x45cd50=>0x45cd54).(0=>1).
24631852140018: system.switch_cpus.memDep0: Completed mem instruction PC (0x45cd50=>0x45cd54).(0=>1) [sn:81436244].
24631852140018: system.switch_cpus.iq: Completing mem instruction PC: (0x45cd50=>0x45cd54).(0=>1) [sn:81436244]
24631852140018: system.switch_cpus.memDep0: Could not find [sn:81436244] in memdep.
24631852140018: system.switch_cpus.iew.lsq: Executing load PC (0x45cd50=>0x45cd54).(0=>1), [sn:81436244]
24631852140018: global: RegFile: Access to int register 34, has data 0x7f0d583ff9
24631852140018: system.switch_cpus.iew.lsq: Read called, load idx: 75, store idx: 0, storeHead: 77 addr: 0 split
24631852140018: system.switch_cpus.iq: Rescheduling mem inst [sn:81436244]
24631852140018: system.switch_cpus.iew.lsq: Load-store forwarding mis-match. Store idx 79 to load addr 0x7f0d583ff9
24631852140018: system.switch_cpus.iew.lsq: Load [sn:81436244] not executed from fault

gem5 commit


I am using commit 4ef1f17e0f9c6f15b8ad63eff7a2d07025c29709 as a base for my changes (Nov 11 2020), but I think the issue is still present in the current master (not 100% sure because the conditions have to be right, however, in master's code, SW prefetches still attempt to match Store Queue data).

Local fix

Testing for instruction being a prefetch before checking the STQ seems to fix the issue. SW prefetch does not return data so imho this is safe. However the correct fix may be to not add SW prefetches to the memory dependency tracker altogether.

Environment

  • python v3.7.3

  • g++ (Debian 8.3.0-6) 8.3.0

Activity

Show:
Jason Lowe-Power
March 29, 2021, 9:20 PM

This definitely looks like a bug! But I’m not sure how to fix it . Any suggestions/contributions would be appreciated.

Assignee

Unassigned

Reporter

Arthur Perais

Priority

Low

Components