OpenCores

About mult instruction: Multiplication instructions must be placed in the assembly code so that they are searched by pipeline 1,
after, place a nop instruction (or one that doesn't depend of the multiplication's result) and the mflo/mfhi following the nop. See f8o.lst (remove the nops!) in benchmark folder for an example of stream nature program (fft) note the samples must be pre-tested in a math software to check if the result will not be real since the core does not deal with floating point numbers. And take a look at m6x.lst (6x6 matrix multiplication) for an example of memory access nature program not needing of nops to allow time for the multiplication result to be stable

About mult2 instruction: This instruction is useful for operands with a maximum 16 (the 16 least significants) bits significants out of 32 from the source . Otherwise the result will be wrong. Both pipes can fetch it since it does not depend on the mflo instruction and already saves the 32-bit result in the register bank. Following, insert a nop or one instruction that doesn't depend of the multiplication's result.

About sw and lw instructions: These instructions must be placed at adjacent addresses when there are more than one in a piece of code, whatever the quantity. If the quantity is even, it is better because each pipeline will make an equal number of instructions for accessing memory. If the quantity is odd, one of the pipelines will make one more memory access instruction than the other, then one lw or sw instruction should be placed after the last memory access instruction in the code so that the pipelines have equal amount of memory access instructions. An exception is when there is only one sw statement, see the matrix multiplication program in the benchmarks folder. The two pipelines must always be executing instructions at adjacent addresses, so they will be collaborating to finish executing the program more quickly. They cannot open distance from each other or the program will fail.