According to the OR arch manual, if l.sys occurs in a delay slot, EPCR is set to the "Address of just executed jump instruction" (pp. 254-255). So on l.rfe we would run into an infinite loop executing the system call over and over again. I could verify this behaviour on the verilog model using the following program:
_main: l.addi r3, r0, 12 l.j foo l.sys 1 foo: l.jr r9 l.nop
We'll need to fix this in the OR1K spec, then. We either specify that the EPCR, when called in a delay slot, is the targeted jump address, or indicate it's not allowed. I prefer the latter.
The compiler never generates system calls, so there's no worry with compiled code, just assembly-level code like your example.