Latency seems to be larger than reported in the overview. Measuring from enable active on a rising clock edge to ready rising, I get the following with the non-pipelined version:
Add = 25 Sub = 26 Mul = 29 Div = 76