I found that the read data is sampled one clock cycle too late.
The foundation for this behaviour is made in atahost_pio_tctrl.vhd. In process T2proc, DIOR follows IORDY_done after one cycle. In process gen_dstrb, dstrb follows IORDY_done after one cycle, too. That is, the external read cycle terminates at the rising edge of dstrb.
The bug is caused then in atahost_pio_actrl.vhd, process gen_PIOq, where DDi is synchronously registered when dstrb is active, ie one clock cycle after the external read cycle has been signalled as terminated.
Marc