The idea is to remove the TX clock generation from the core and just source it from the external pin. That will give user more control, for example user may source it directly from PLL rather then using your clock dividing logic.
Then, both RX and TX buffers should act DDR to achieve maximum speed of 400 Mbaud with FPGA.
There is already an external TX clock input. The clock dividing logic can be bypassed by setting the division factor to 1.
The receiver uses DDR but the transmitter currently does not. Using DDR for TX could certainly improve the maximum TX bit rate. It should be possible, for example, to write a 3rd implementation of spwxmit which uses DDR.
However I will not be able to work on this any time soon. First i'd like to improve the receiver in order to achieve 400 Mbit in Virtex FPGAs. (I believe the transmitter can already do 400 Mbit in Virtex.) But if you have code you'd like to share, you are very welcome.