URL
https://opencores.org/ocsvn/or1k/or1k/trunk
Subversion Repositories or1k
[/] [or1k/] [trunk/] [rtems-20020807/] [c/] [src/] [lib/] [libbsp/] [powerpc/] [shared/] [bootloader/] [em86real.S] - Rev 1765
Compare with Previous | Blame | View Log
/** em86real.S** Copyright (C) 1998, 1999 Gabriel Paubert, paubert@iram.es** Modified to compile in RTEMS development environment* by Eric Valette** Copyright (C) 1999 Eric Valette. valette@crf.canon.fr** The license and distribution terms for this file may be* found in found in the file LICENSE in this distribution or at* http://www.OARcorp.com/rtems/license.html.** em86real.S,v 1.2 2002/07/25 13:51:22 joel Exp*//* If the symbol __BOOT__ is defined, a slightly different version is* generated to be compiled with the -m relocatable option*/#ifdef __BOOT__#include "bootldr.h"/* It is impossible to gather statistics in the boot version */#undef EIP_STATS#endif/*** Given the size of this code, it deserves a few comments on how it works,* and why it was implemented the way it is.** The goal is to have a real mode i486SX emulator to initialize hardware,* mostly graphics boards, by interpreting ROM BIOSes. The choice of a 486SX* is logical since this is the lowest processor that PCI ROM BIOSes must run* on.** The goal of this emulator is not performance, but a small enough memory* footprint to include it in a bootloader.** It is actually likely to be comparable to a 25MHz 386DX on a 200MHz 603e !* This is not as serious as it seems since most of the BIOS code performs* a lot of accesses to I/O and non-cacheable memory spaces. For such* instructions, the execution time is often dominated by bus accesses.* Statistics of the code also shows that it spends a large function of* the time in loops waiting for vertical retrace or programs one of the* timers and waits for the count to go down to zero. This type of loop* runs emulated at the same speed as on 5 GHz Pentium IV++ ;)**//** Known bugs or differences with a real 486SX (real mode):* - segment limits are not enforced (too costly)* - xchg instructions with memory are not locked* - lock prefixes are not implemented at all* - long divides implemented but perhaps still buggy* - miscellaneous system instructions not implemented* (some probably cannot be implemented)* - neither control nor debug registers are implemented for the time being* (debug registers are impossible to implement at a reasonable cost)*//* Code options, put them on the compiler command line *//* #define EIP_STATS */ /* EIP based profiling *//* #undef EIP_STATS *//** Implementation notes:** A) flags emulation.** The most important decisions when it comes to obtain a reasonable speed* are related to how the EFLAGS register is emulated.** Note: the code to set up flags is complex, but it is only seldom* executed since cmp and test instructions use much faster flag evaluation* paths. For example the overflow flag is almost only needed for pushf and* int. Comparison results only involve (SF^OF) or (SF^OF)+ZF and the* implementation is fast in this case.** Rarely used flags: AC, NT and IOPL are kept in a memory EFLAGS image.* All other flags are either kept explicitly in PPC cr (DF, IF, and TF) or* lazily evaluated from the state of 4 registers called flags, result, op1,* op2, and sometimes the cr itself. The emulation has been designed for* minimal overhead for the common case where the flags are never used. With* few exceptions, all instructions that set flags leave the result of the* computation in a register called result, and operands are taken from op1* and op2 registers. However a few instructions like cmp, test and bit tests* (bt/btc/btr/bts/bsf/bsr) explicitly set cr bits to short circuit* condition code evaluation of conditional instructions.** As a very brief summary:** - the result of the last flag setting operation is often either in the* result register or in op2 after increment or decrement instructions* because result and op1 may be needed to compute the carry.** - compare instruction leave the result of the unsigned comparison* in cr4 and of signed comparison in cr6. This means that:* - cr4[0]=CF (short circuit for jc/jnc)* - cr4[1]=~(CF+ZF) (short circuit for ja/jna)* - cr6[0]=(OF^SF) (short circuit for jl/jnl)* - cr6[1]=~((SF^OF)+ZF) (short circuit for jg/jng)* - cr6[2]=ZF (short circuit for jz/jnz)** - test instruction set flags in cr6 and clear overflow. This means that:* - cr6[0]=SF=(SF^OF) (short circuit for jl/jnl/js/jns)* - cr6[1]=~((SF^OF)+ZF) (short circuit for jg/jng)* - cr6[2]=ZF (short circuit for jz/jnz)** All flags may be lazily evaluated from several values kept in registers:** Flag: Depends upon:* OF result, op1, op2, flags[INCDEC_FIELD,SUBTRACTING,OF_STATE_MASK]* SF result, op2, flags[INCDEC_FIELD,RES_SIZE]* ZF result, op2, cr6[2], flags[INCDEC_FIELD,RES_SIZE,ZF_PROTECT]* AF op1, op2, flags[INCDEC_FIELD,SUBTRACTING,CF_IN]* PF result, op2, flags[INCDEC_FIELD]* CF result, op1, flags[CF_STATE_MASK, CF_IN]** The order of the fields in the flags register has been chosen so that a* single rlwimi is necessary for common instruction that do not affect all* flags. (See the code for inc/dec emulation).*** B) opcodes and prefixes.** The register called opcode holds in its low order 8 bits the opcode* (second byte if the first byte is 0x0f). More precisely it holds the* last byte fetched before the modrm byte or the immediate operand(s)* of the instruction, if any. High order 24 bits are zero unless the* instruction has prefixes. These higher order bits have the following* meaning:* 0x80000000 segment override prefix* 0x00001000 repnz prefix (0xf2)* 0x00000800 repz prefix (0xf3)* 0x00000400 address size prefix (0x67)* 0x00000200 operand size prefix (0x66)* (bit 0x1000 and 0x800 cannot be set simultaneously)** Therefore if there is a segment override the value will be between very* negative (between 0x80000000 and 0x800016ff), if there is no segment* override, the value will be between 0 and 0x16ff. The reason for* this choice will be understood in the next part.** C) addresing mode description tables.** the encoding of the modrm bytes (especially in 16 bit mode) is quite* complex. Hence a table, indexed by the five useful bits of the modrm* byte is used to simplify decoding. Here is a description:** bit mask meaning* 0x80000000 use ss as default segment register* 0x00004000 means that this addressing mode needs a base register* (set for all entries except sib and displacement-only)* 0x00002000 set if preceding is not set* 0x00001000 set if an sib follows* 0x00000700 base register to use (16 and 32 bit)* 0x00000080 set in 32 bit addressing mode table, cleared in 16 bit* (so extsb mask,entry; ori mask,mask,0xffff gives a mask)* 0x00000070 kludge field, possible values are* 0: 16 bit addressing mode without index* 10: 32 bit addressing mode* 60: 16 bit addressing mode with %si as index* 70: 16 bit addressing mode with %di as index** This convention leads to the following special values used to check for* sib present and displacement-only, which happen to the three lowest* values in the table (unsigned):* 0x00003090 sib follows (implies it is a 32 bit mode)* 0x00002090 32 bit displacement-only* 0x00002000 16 bit displacement-only** This means that all entries are either very negative in the 0x80002000* range if the segment defaults to ss or higher than 0x2000 if it defaults* to ds. Combined with the value in opcode this gives the following table:* opcode entry entry>opcode ? segment to use* positive positive yes ds (default)* negative positive yes overriden by prefix* positive negative no ss* negative negative yes overridden by prefix** Hence a simple comparison allows to check for the need to override* the current base with ss, i.e., when ss is the default base and the* instruction has no override prefix.** D) BUGS** This software is obviously bug-free :-). Nevertheless, if you encounter* an interesting feature. Mail me a note, if possible with a detailed* instruction example showing where and how it fails.**//* Now the details of flag evaluation with the necessary macros *//* Alignment check is toggable so the system believes it is a 486, butCPUID is not to avoid unnecessary complexities. However, alignmentis actually never checked (real mode is CPL 0 anyway). */#define AC86 13 /* Can only be toggled */#define VM86 14 /* Not used for now */#define RF86 15 /* Not emulated precisely *//* Actually NT and IOPL are kept in memory */#define NT86 17#define IOPL86 18 /* Actually 18 and 19 */#define OF86 20#define DF86 21#define IF86 22#define TF86 23#define SF86 24#define ZF86 25#define AF86 27#define PF86 29#define CF86 31/* Where the less important flags are placed in PPC cr */#define RF 20 /* Suppress trap flag: cr5[0] */#define DF 21 /* Direction flag: cr5[1] */#define IF 22 /* Interrupt flag: cr5[2] */#define TF 23 /* Single step flag: cr5[3] *//* Now the flags which are frequently used *//** CF_IN is a copy of the input carry with PPC polarity,* it is cleared for add, set for sub and cmp,* equal to the x86 carry for adc and to its complement for sbb.* it is used to evaluate AF and CF.*/#define CF_IN 0x80000000/* #define GET_CF_IN(dst) rlwinm dst,flags,1,0x01 *//* CF_IN_CR set in flags means that cr4[0] is a copy of carry bit */#define CF_IN_CR 0x40000000#define EVAL_CF andis. r3,flags,(CF_IN_CR)>>16; beql- _eval_cf/** CF_STATE tells how to compute the carry bit.* NOTRESULT16 and NOTRESULT8 are never set explicitly,* but they may happen after a cmc instruction.*/#define CF 16 /* cr4[0] */#define CF_LOCATION 0x30000000#define CF_ZERO 0x00000000#define CF_EXPLICIT 0x00000000#define CF_COMPLEMENT 0x08000000 /* Indeed a polarity bit */#define CF_STATE_MASK (CF_LOCATION|CF_COMPLEMENT)#define CF_VALUE 0x08000000#define CF_SET 0x08000000#define CF_RES32 0x10000000#define CF_NOTRES32 0x18000000#define CF_RES16 0x20000000#define CF_NOTRES16 0x28000000#define CF_RES8 0x30000000#define CF_NOTRES8 0x38000000#define CF_ADDL CF_RES32#define CF_SUBL CF_NOTRES32#define CF_ADDW CF_RES16#define CF_SUBW CF_RES16#define CF_ADDB CF_RES8#define CF_SUBB CF_RES8#define CF_ROTCNT(dst) rlwinm dst,flags,7,0x18#define CF_POL(dst,pos) rlwinm dst,flags,(36-pos)%32,pos,pos#define CF_POL_INSERT(dst,pos) \rlwimi dst,flags,(36-pos)%32,pos,pos#define RES2CF(dst) rlwinm dst,result,8,7,15/** OF_STATE tells how to compute the overflow bit. When the low order bit* is set (OF_EXPLICIT), it means that OF is the exclusive or of the* two other bits. For the reason of this choice, see rotate instructions.*/#define OF 1 /* Only after EVAL_OF */#define OF_STATE_MASK 0x07000000#define OF_INCDEC 0x00000000#define OF_EXPLICIT 0x01000000#define OF_ZERO 0x01000000#define OF_VALUE 0x04000000#define OF_SET 0x04000000#define OF_ONE 0x05000000#define OF_XOR 0x06000000#define OF_ARITHL 0x06000000#define OF_ARITHW 0x02000000#define OF_ARITHB 0x04000000#define EVAL_OF rlwinm. r3,flags,6,0,1; bngl+ _eval_of; andis. r3,flags,OF_VALUE>>16/* See _eval_of to see how this can be used */#define OF_ROTCNT(dst) rlwinm dst,flags,10,0x1c/** SIGNED_IN_CR means that cr6 is set as after a signed compare:* - cr6[0] is SF^OF for jl/jnl/setl/setnl...* - cr6[1] is ~((SF^OF)+ZF) for jg/jng/setg/setng...* - cr6[2] is ZF (ZF_IN_CR is always set if this bit is set)*/#define SLT 24 /* cr6[0], signed less than */#define SGT 25 /* cr6[1], signed greater than */#define SIGNED_IN_CR 0x00800000#define EVAL_SIGNED andis. r3,flags,SIGNED_IN_CR>>16; beql- _eval_signed/** Above in CR means that cr4 is set as after an unsigned compare:* - cr4[0] is CF (CF_IN_CR is also set)* - cr4[1] is ~(CF+ZF) (ZF_IN_CR is also set)*/#define ABOVE 17 /* cr4[1] */#define ABOVE_IN_CR 0x00400000#define EVAL_ABOVE andis. r3,flags,ABOVE_IN_CR>>16; beql- _eval_above/* SF_IN_CR means cr6[0] is a copy of SF. It implies ZF_IN_CR is also set */#define SF 24 /* cr6[0] */#define SF_IN_CR 0x00200000#define EVAL_SF andis. r3,flags,SF_IN_CR>>16; beql- _eval_sf_zf/* ZF_IN_CR means cr6[2] is a copy of ZF. */#define ZF 26#define ZF_IN_CR 0x00100000#define EVAL_ZF andis. r3,flags,ZF_IN_CR>>16; beql- _eval_sf_zf#define ZF2ZF86(s,d) rlwimi d,s,ZF-ZF86,ZF86,ZF86#define ZF862ZF(reg) rlwimi reg,reg,32+ZF86-ZF,ZF,ZF/** ZF_PROTECT means cr6[2] is the only valid value for ZF. This is necessary* because some infrequent instructions may leave SF and ZF in an apparently* inconsistent state (both set): sahf, popf and the few (not implemented)* instructions that only affect ZF.*/#define ZF_PROTECT 0x00080000/* The parity is always evaluated when it is needed */#define PF 0 /* Only after EVAL_PF */#define EVAL_PF bl _eval_pf/* This field gives the shift amount to use to evaluate SFand ZF when ZF_PROTECT is not set */#define RES_SIZE_MASK 0x00060000#define RESL 0x00000000#define RESW 0x00040000#define RESB 0x00060000#define RES_SHIFT(dst) rlwinm dst,flags,18,0x18/* SUBTRACTING is set if the last flag setting instruction was sub/sbb/cmp,used to evaluate OF and AF */#define SUBTRACTING 0x00010000#define GET_ADDSUB(dst) rlwinm dst,flags,16,0x01/* rotate (rcl/rcr/rol/ror) affect CF and OF but not other flags */#define ROTATE_MASK (CF_IN_CR|CF_STATE_MASK|ABOVE_IN_CR|OF_STATE_MASK|SIGNED_IN_CR)#define ROTATE_FLAGS rlwimi flags,one,24,ROTATE_MASK/** INCDEC_FIELD has at most one bit set when the last flag setting instruction* was either inc or dec (which do not affect the carry). When one of these* bits is set, it affects the way OF, SF, ZF, AF, and PF are evaluated.*/#define INCDEC_FIELD 0x0000ff00#define DECB_SHIFT 8#define INCB_SHIFT 9#define DECW_SHIFT 10#define INCW_SHIFT 11#define DECL_SHIFT 14#define INCL_SHIFT 15#define INCDEC_MASK (OF_STATE_MASK|SIGNED_IN_CR|ABOVE_IN_CR|SF_IN_CR|\ZF_IN_CR|ZF_PROTECT|RES_SIZE_MASK|SUBTRACTING|\INCDEC_FIELD)/* Operations to perform to tell where the flags are after inc or dec */#define INC_FLAGS(BWL) rlwimi flags,one,INC##BWL##_SHIFT,INCDEC_MASK#define DEC_FLAGS(BWL) rlwimi flags,one,DEC##BWL##_SHIFT,INCDEC_MASK/* How the flags are set after arithmetic operations */#define FLAGS_ADD(BWL) (CF_ADD##BWL|OF_ARITH##BWL|RES##BWL)#define FLAGS_SBB(BWL) (CF_SUB##BWL|OF_ARITH##BWL|RES##BWL|SUBTRACTING)#define FLAGS_SUB(BWL) FLAGS_SBB(BWL)|CF_IN#define FLAGS_CMP(BWL) FLAGS_SUB(BWL)|ZF_IN_CR|CF_IN_CR|SIGNED_IN_CR|ABOVE_IN_CR/* How the flags are set after logical operations */#define FLAGS_LOG(BWL) (CF_ZERO|OF_ZERO|RES##BWL)#define FLAGS_TEST(BWL) FLAGS_LOG(BWL)|ZF_IN_CR|SIGNED_IN_CR|SF_IN_CR/* How the flags are set after bt/btc/btr/bts. */#define FLAGS_BTEST CF_IN_CR|CF_ADDL|OF_ZERO|RESL/* How the flags are set after bsf/bsr. */#define FLAGS_BSRCH(WL) CF_ZERO|OF_ZERO|RES##WL|ZF_IN_CR/* How the flags are set after logical right shifts */#define FLAGS_SHR(BWL) (CF_EXPLICIT|OF_ARITH##BWL|RES##BWL)/* How the flags are set after double length shifts */#define FLAGS_DBLSH(WL) (CF_EXPLICIT|OF_ARITH##WL|RES##WL)/* How the flags are set after multiplies */#define FLAGS_MUL (CF_EXPLICIT|OF_EXPLICIT)#define SET_FLAGS(fl) lis flags,(fl)>>16#define ADD_FLAGS(fl) addis flags,flags,(fl)>>16/** We are always off by one when compared with Intel's eip, this shortens* code by allowing to load next byte with lbzu x,1(eip). The register* called eip actually contains csbase+eip, and thus should be called lip* for linear ip.*//** Reason codes passed to the C part of the emulator, this includes all* instructions which may change the current code segment. These definitions* will soon go into a separate include file. Codes 0 to 255 correspond* directly to the interrupt/trap that has to be generated.*/#define code_divide_err 0#define code_trap 1#define code_int3 3#define code_into 4#define code_bound 5#define code_ud 6#define code_dna 7 /* FPU not available */#define code_iretw 256 /* Interrupt returns */#define code_iretl 257#define code_lcallw 258 /* Far calls and jumps */#define code_lcalll 259#define code_ljmpw 260#define code_ljmpl 261#define code_lretw 262 /* Far returns */#define code_lretl 263#define code_softint 264 /* int $xx */#define code_lock 265 /* Lock prefix *//* Codes 1024 to 2047 are used for I/O port access instructions:- The three LSB define the port size (1, 2 or 4)- bit of weight 512 means out if set, in if clear- bit of weight 256 means ins/outs if set, in/out if clear- bit of weight 128 means use 32 bit addresses if set, 16 bit if clear(only used for ins/outs instructions, always clear for in/out)*/#define code_inb 1024+1#define code_inw 1024+2#define code_inl 1024+4#define code_outb 1024+512+1#define code_outw 1024+512+2#define code_outl 1024+512+4#define code_insb_a16 1024+256+1#define code_insw_a16 1024+256+2#define code_insl_a16 1024+256+4#define code_outsb_a16 1024+512+256+1#define code_outsw_a16 1024+512+256+2#define code_outsl_a16 1024+512+256+4#define code_insb_a32 1024+256+128+1#define code_insw_a32 1024+256+128+2#define code_insl_a32 1024+256+128+4#define code_outsb_a32 1024+512+256+128+1#define code_outsw_a32 1024+512+256+128+2#define code_outsl_a32 1024+512+256+128+4#define state 31/* r31 (state) is a pointer to a structure describing the emulated x86processor, its layout is the following:first the general purpose registers, they are in little endian byte orderoffset name0 eax/ax/al1 ah4 ecx/cx/cl5 ch8 edx/dx/dl9 dh12 ebx/bx/bl13 bh16 esp/sp20 ebp/bp24 esi/si28 edi/di*/#define AL 0#define AX 0#define EAX 0#define AH 1#define CL 4#define CX 4#define ECX 4#define DX 8#define EDX 8#define BX 12#define EBX 12#define SP 16#define ESP 16#define BP 20#define EBP 20#define SI 24#define ESI 24#define DI 28#define EDI 28/*than the rest of the machine state, big endian !offset name32 essel segment register selectors (values)36 cssel40 sssel44 dssel48 fssel52 gssel56 eipimg true eip (register named eip is csbase+eip)60 eflags eip and eflags only valid when C code running !64 esbase segment registers bases68 csbase72 ssbase76 dsbase80 fsbase84 gsbase88 iobase For I/O instructions, I/O space virtual base92 ioperm I/O permission bitmap pointer96 reason Reason code when calling external emulator100 nexteip eip past instruction for external emulator104 parm1 parameter for external emulator108 parm2 parameter for external emulator112 _opcode current opcode register for external emulator116 _base segment register base for external emulator120 _offset intruction operand offsetMore internal state was dumped here for debugging in first versions128 vbase where the 1Mb memory is mapped132 cntimg instruction counter136 scratch192 eipstat array of 32k unsigned long pairs for eip stats*/#define essel 32#define cssel 36#define sssel 40#define dssel 44#define fssel 48#define gssel 52#define eipimg 56#define eflags 60#define esbase 64#define csbase 68#define ssbase 72#define dsbase 76#define fsbase 80#define gsbase 84#define iobase 88#define ioperm 92#define reason 96#define nexteip 100#define parm1 104#define parm2 108#define _opcode 112#define _base 116#define _offset 120#define vbase 128#define cntimg 132#ifdef EIP_STATS#define eipstat 192#endif/* Global registers *//* Some segment register bases are permanently kept in registers since theyare often used: these are csb, esb and ssb because they arerequired for jumps, string instructions, and pushes/pops/calls/rets.dsbase is not kept in a register but loaded from memory to allow somewhatmore parallelism in the main emulation loop.*/#define one 30 /* Constant one, so pervasive */#define ssb 29#define csb 28#define esb 27#define eip 26 /* That one is indeed csbase+(e)ip-1 */#define result 25 /* For the use of result, op1, op2 */#define op1 24 /* see the section on flag emulation */#define op2 23#define opbase 22 /* default opcode table */#define flags 21 /* See earlier description */#define opcode 20 /* Opcode */#define opreg 19 /* Opcode extension/register number *//* base is reloaded with the base of the ds segment at the beginning ofevery instruction, it is modified by segment override prefixes, whenthe default base segment is ss, or when the modrm byte specifies aregister operand */#define base 18 /* Instruction's operand segment base */#define offset 17 /* Instruction's memory operand offset *//* used to address a table telling how to decode the addressing modespecified by the modrm byte */#define adbase 16 /* addressing mode table *//* Following registers are used only as dedicated temporaries during decoding,they are free for use during emulation *//** ceip (current eip) is only in use when we call the external emulator for* instructions that fault. Note that it is forbidden to change flags before* the check for the fault happens (divide by zero...) ! ceip is also used* when measuring timing.*/#define ceip 15/* A register used to measure timing information (when enabled) */#ifdef EIP_STATS#define tstamp 14#endif#define count 12 /* Instruction counter. */#define r0 0#define r1 1 /* PPC Stack pointer. */#define r3 3#define r4 4#define r5 5#define r6 6#define r7 7/* Macros to read code stream */#define NEXTBYTE(dest) lbzu dest,1(eip)#define NEXTWORD(dest) lhbrx dest,eip,one; la eip,2(eip)#define NEXTDWORD(dest) lwbrx dest,eip,one; la eip,4(eip)#define NEXT b nop#define GOTNEXT b gotopcode#ifdef __BOOT__START_GOTGOT_ENTRY(_jtables)GOT_ENTRY(jtab_www)GOT_ENTRY(adtable)END_GOT#else.text#endif.align 2.global em86_enter.type em86_enter,@functionem86_enter: stwu r1,-96(r1) # allocate stackmflr r0stmw 14,24(r1)mfcr r4stw r0,100(r1)mr state,r3stw r4,20(r1)#ifdef __BOOT__/* We need this since r30 is the default GOT pointer */#define r30 30GET_GOT/* The relocation of these tables is explicit, this could be done* automatically with fixups but would add more than 8kb in the fixup tables.*/lwz r3,GOT(_jtables)lwz r4,_endjtables-_jtables(r3)sub. r4,r3,r4beq+ 1fli r0,((_endjtables-_jtables)>>2)+1addi r3,r3,-4mtctr r00: lwzu r5,4(r3)add r5,r5,r4stw r5,0(r3)bdnz 0b1: lwz adbase,GOT(adtable)lwz opbase,GOT(jtab_www)/* Now r30 is only used as constant 1 */#undef r30li one,1 # pervasive constant#elselis opbase,jtab_www@halis adbase,adtable@hali one,1 # pervasive constantaddi opbase,opbase,jtab_www@laddi adbase,adbase,adtable@l#ifdef EIP_STATSli ceip,0mftb tstamp#endif#endif/* We branch back here when calling an external function tells us to resume */restart: lwz r3,eflags(state)lis flags,(OF_EXPLICIT|ZF_IN_CR|ZF_PROTECT|SF_IN_CR)>>16lwz csb,csbase(state)extsb result,r3 # SF/PFrlwinm op1,r3,31,0x08 # AFlwz eip,eipimg(state)ZF862ZF(r3) # cr6addi op2,op1,0 # AFlwz ssb,ssbase(state)rlwimi flags,r3,15,OF_VALUE # OFrlwimi r3,r3,32+RF86-RF,RF,RF # RFlwz esb,esbase(state)ori result,result,0xfb # PFmtcrf 0x06,r3 # RF/DF/IF/TF/SF/ZFlbzux opcode,eip,csbrlwimi flags,r3,27,CF_VALUE # CFxori result,result,0xff # PFlwz count,cntimg(state)GOTNEXT # start the emulator/* Now return */exit: lwz r0,100(r1)lwz r4,20(r1)mtlr r0lmw 14,24(r1)mtcr r4addi r1,r1,96blrtrap: crmove 0,RFcrclr RFbt- 0,resumesub ceip,eip,csbli r3,code_trapcomplex: addi eip,eip,1stw r3,reason(state)sub eip,eip,csbstw op1,240(state)stw op2,244(state)stw result,248(state)stw flags,252(state)stw r4,parm1(state)stw r5,parm2(state)stw opcode,_opcode(state)bl _eval_flagsstw base,_base(state)stw eip,nexteip(state)stw r3,eflags(state)mr r3,statestw offset,_offset(state)stw ceip,eipimg(state)stw count,cntimg(state)bl em86_trapcmpwi r3,0bne exitb restart/* Main loop *//** The two LSB of each entry in the main table mean the following:* 00: indirect opcode: modrm follows and the three middle bits are an* opcode extension. The entry points to another jump table.* 01: direct instruction, branch directly to the routine.* 10: modrm specifies byte size memory and register operands.* 11: modrm specifies word/long memory and register operands.** The modrm byte, if present, is always loaded in r7.** Note: most "mr x,y" instructions have been replaced by "addi x,y,0" since* the latter can be executed in the second integer unit on 603e.*//** This code is very good example of absolutely unmaintainable code.* It was actually much easier to write than it is to understand !* If my computations are right, the maximum path length from fetching* the opcode to exiting to the actual instruction execution is* 46 instructions (for non-prefixed, single byte opcode instructions).**/.align 5#ifdef EIP_STATSnop: NEXTBYTE(opcode)gotopcode: slwi r3,opcode,2bt- TF,trapresume: lwzx r4,opbase,r3addi r5,state,eipstat+4clrlslwi r6,ceip,17,3mtctr r4lwzux r7,r5,r6slwi. r0,r4,30 # two lsb of table entrysub r7,r7,tstamplwz r6,-4(r5)mftb tstampaddi r6,r6,1sub ceip,eip,csbstw r6,-4(r5)add r7,r7,tstamplwz base,dsbase(state)stw r7,0(r5)#elsenop: NEXTBYTE(opcode)gotopcode: slwi r3,opcode,2bt- TF,trapresume: lwzx r4,opbase,r3sub ceip,eip,csbmtctr r4slwi. r0,r4,30 # two lsb of table entrylwz base,dsbase(state)addi count,count,1#endifbgtctr- # for instructions without modrm/* modrm byte present */NEXTBYTE(r7) # modrm bytecmplwi cr1,r7,192rlwinm opreg,r7,31,0x1cbeq- cr0,8f # extended opcode/* modrm with middle 3 bits specifying a register (non prefixed) */rlwinm r0,r4,3,0x8li r4,0x1c0drlwimi opreg,r7,27,0x01srw r4,r4,r0and opreg,opreg,r4blt cr1,9f/* modrm with 2 register operands */1: rlwinm offset,r7,2,0x1caddi base,state,0rlwimi offset,r7,30,0x01and offset,offset,r4bctr/* Prefixes: first segment overrides */.align 4_es: NEXTBYTE(r7); addi base,esb,0oris opcode,opcode,0x8000; b 2f_cs: NEXTBYTE(r7); addi base,csb,0oris opcode,opcode,0x8000; b 2f_fs: NEXTBYTE(r7); lwz base,fsbase(state)oris opcode,opcode,0x8000; b 2f_gs: NEXTBYTE(r7); lwz base,gsbase(state)oris opcode,opcode,0x8000; b 2f_ss: NEXTBYTE(r7); addi base,ssb,0oris opcode,opcode,0x8000; b 2f_ds: NEXTBYTE(r7)oris opcode,opcode,0x8000; b 2f/* Lock (unimplemented) and repeat prefixes */_lock: li r3,code_lock; b complex_repnz: NEXTBYTE(r7); rlwimi opcode,one,12,0x1800; b 2f_repz: NEXTBYTE(r7); rlwimi opcode,one,11,0x1800; b 2f/* Operand and address size prefixes */.align 4_opsize: NEXTBYTE(r7); ori opcode,opcode,0x200rlwinm r3,opcode,2,0x1ffc; b 2f_adsize: NEXTBYTE(r7); ori opcode,opcode,0x400rlwinm r3,opcode,2,0x1ffc; b 2f_twobytes: NEXTBYTE(r7); addi r3,r3,0x4002: rlwimi r3,r7,2,0x3fclwzx r4,opbase,r3rlwimi opcode,r7,0,0xffmtctr r4slwi. r0,r4,30bgtctr- # direct instruction/* modrm byte in a prefixed instruction */NEXTBYTE(r7) # modrm bytecmpwi cr1,r7,192rlwinm opreg,r7,31,0x1cbeq- 6f/* modrm with middle 3 bits specifying a register (prefixed) */rlwinm r0,r4,3,0x8li r4,0x1c0drlwimi opreg,r7,27,0x01srw r4,r4,r0and opreg,opreg,r4bnl cr1,1b # 2 register operands/* modrm specifying memory with prefix */3: rlwinm r3,r3,27,0xff80rlwimi adbase,r7,2,0x1cextsh r3,r3rlwimi r3,r7,31,0x60lwzx r4,r3,adbasecmpwi cr1,r4,0x3090bnl+ cr1,10f/* displacement only addressing modes */4: cmpwi r4,0x2000bne 5fNEXTWORD(offset)bctr5: NEXTDWORD(offset)bctr/* modrm with opcode extension (prefixed) */6: lwzx r4,r4,opregmtctr r4blt cr1,3b/* modrm with opcode extension and register operand */7: rlwinm offset,r7,2,0x1caddi base,state,0rlwinm r0,r4,3,0x8li r4,0x1c0drlwimi offset,r7,30,0x01srw r4,r4,r0and offset,offset,r4bctr/* modrm with opcode extension (non prefixed) */8: lwzx r4,r4,opregmtctr r4/* FIXME ? We continue fetching even if the opcode extension is undefined.* It shouldn't do any harm on real mode emulation anyway, and for ROM* BIOS emulation, we are supposed to read valid code.*/bnl cr1,7b/* modrm specifying memory without prefix */9: rlwimi adbase,r7,2,0x1c # memory addressing mode computationrlwinm r3,r7,31,0x60lwzx r4,r3,adbasecmplwi cr1,r4,0x3090blt- cr1,4b # displacement only addressing mode10: rlwinm. r0,r7,24,0,1 # three cases distinguishedbeq- cr1,15f # an sib followsrlwinm r3,r4,30,0x1c # 16bit/32bit/%si index/%di indexcmpwi cr1,r3,8 # set cr1 as early as possiblerlwinm r6,r4,26,0x1c # base registerlwbrx offset,state,r6 # load the base registerbeq cr0,14f # no displacementcmpw cr2,r4,opcode # check for ss as default basebgt cr0,12f # byte offsetbeq cr1,11f # 32 bit displacementNEXTWORD(r5) # 16 bit displacementbgt cr1,13f # d16(base,index)/* d16(base) */add offset,offset,r5clrlwi offset,offset,16bgtctr cr2addi base,ssb,0bctr/* d32(base) */11: NEXTDWORD(r5)add offset,offset,r5bgtctr cr2addi base,ssb,0bctr/* 8 bit displacement */12: NEXTBYTE(r5)extsb r5,r5bgt cr1,13f/* d8(base) */extsb r6,r4add offset,offset,r5ori r6,r6,0xffffand offset,offset,r6bgtctr cr2addi base,ssb,0bctr/* d8(base,index) and d16(base,index) share this code ! */13: lhbrx r3,state,r3add offset,offset,r5add offset,offset,r3clrlwi offset,offset,16bgtctr cr2addi base,ssb,0bctr/* no displacement: only indexed modes may use ss as default base */14: beqctr cr1 # 32 bit register indirectclrlwi offset,offset,16bltctr cr1 # 16 bit register indirect/* (base,index) */lhbrx r3,state,r3 # 16 bit [{bp,bx}+{si,di}]cmpw cr2,r4,opcode # check for ss as default baseadd offset,offset,r3clrlwi offset,offset,r3bgtctr+ cr2addi base,ssb,0bctr/* sib modes, note that the size of the offset can be known from cr0 */15: NEXTBYTE(r7) # get sibrlwinm r3,r7,31,0x1c # indexrlwinm offset,r7,2,0x1c # basecmpwi cr1,r3,ESP # has index ?bne cr0,18f # base+d8/d32cmpwi offset,EBPbeq 17f # d32(,index,scale)xori r4,one,0xcc01 # build 0x0000cc00rlwnm r4,r4,offset,0,1 # 0 or 0xc0000000lwbrx offset,state,offsetcmpw cr2,r4,opcode # use ss ?beq- cr1,16f # no index/* (base,index,scale) */lwbrx r3,state,r3srwi r6,r7,6slw r3,r3,r6add offset,offset,r3bgtctr cr2addi base,ssb,0bctr/* (base), in practice only (%esp) is coded this way */16: bgtctr cr2addi base,ssb,0bctr/* d32(,index,scale) */17: NEXTDWORD(offset)beqctr- cr1 # no index: very unlikelylwbrx r3,state,r3srwi r6,r7,6slw r3,r3,r6add offset,offset,r3bctr/* 8 or 32 bit displacement */18: xori r4,one,0xcc01 # build 0x0000cc00rlwnm r4,r4,offset,0,1 # 0 or 0xc0000000lwbrx offset,state,offsetcmpw cr2,r4,opcode # use ss ?bgt cr0,20f # 8 bit offset/* 32 bit displacement */NEXTDWORD(r5)beq- cr1,21f/* d(base,index,scale) */19: lwbrx r3,state,r3add offset,offset,r5add offset,offset,r3bgtctr cr2addi base,ssb,0bctr/* 8 bit displacement */20: NEXTBYTE(r5)extsb r5,r5bne+ cr1,19b/* d(base), in practice base is %esp */21: add offset,offset,r5bgtctr- cr2addi base,ssb,0bctr/** Flag evaluation subroutines: they have not been written for performance* since they are not often used in practice. The rule of the game was to* write them with as few branches as possible.* The first routines eveluate either one or 2 (ZF and SF simultaneously)* flags and do not use r0 and r7.* The more complex routines (_eval_above, _eval_signed and _eval_flags)* call the former ones, using r0 as a return address save register and* r7 as a safe temporary.*//** _eval_sf_zf evaluates simultaneously SF and ZF unless ZF is already valid* and protected because it is possible, although it is exceptional, to have* SF and ZF set at the same time after a few instructions which may leave the* flags in this apparently inconsistent state: sahf, popf, iret and the few* (for now unimplemented) instructions which only affect ZF (lar, lsl, arpl,* cmpxchg8b). This also solves the obscure case of ZF set and PF clear.* On return: SF=cr6[0], ZF=cr6[2].*/_eval_sf_zf: andis. r5,flags,ZF_PROTECT>>16rlwinm r3,flags,0,INCDEC_FIELDRES_SHIFT(r4)cntlzw r3,r3slw r4,result,r4srwi r5,r3,5 # ? use result : use op1rlwinm r3,r3,2,0x18oris flags,flags,(SF_IN_CR|SIGNED_IN_CR|ZF_IN_CR)>>16neg r5,r5 # mux result/op2slw r3,op2,r3and r4,r4,r5andc r3,r3,r5xoris flags,flags,(SIGNED_IN_CR)>>16bne- 1f # 12 instructions between setor r3,r3,r4 # and test, good for foldingcmpwi cr6,r3,0blr1: or. r3,r3,r4crmove SF,0blr/** _eval_cf may be called at any time, no other flag is affected.* On return: CF=cr4[0], r3= CF ? 0x100:0 = CF<<8.*/_eval_cf: addc r3,flags,flags # CF_IN to xer[ca]RES2CF(r4) # get 8 or 16 bit carrysubfe r3,result,op1 # generate PPC carry forCF_ROTCNT(r5) # preceding operationaddze r3,r4 # put carry into LSBCF_POL(r4,23) # polarity & 0x100oris flags,flags,(CF_IN_CR|ABOVE_IN_CR)>>16rlwnm r3,r3,r5,23,23 # shift carry therexor r3,r3,r4 # CF <<8xoris flags,flags,(ABOVE_IN_CR)>>16cmplw cr4,one,r3 # sets cr4[0]blr/** eval_of returns the overflow flag in OF_STATE field, which will be* either 001 (OF clear) or 101 (OF set), is is only called when the two* low order bits of OF_STATE are not 01 (otherwise it will work but* it is an elaborate variant of a nop with a few registers destroyed)* The code multiplexes several sources in a branchless way, was fun to write.*/_eval_of: GET_ADDSUB(r4) # 0(add)/1(sub)rlwinm r3,flags,0,INCDEC_FIELDneg r4,r4 # 0(add)/-1(sub)eqv r5,result,op1 # result[]==op1[] (bit by bit)cntlzw r3,r3 # inc/decxor r4,r4,op2 # true sign of op2oris r5,r5,0x0808 # bits to clearclrlwi r6,r3,31 # 0(inc)/1(dec)eqv r4,r4,op1 # op1[]==op2[] (bit by bit)add r6,op2,r6 # add 1 if decrlwinm r3,r3,2,0x18 # incdec_shiftandc r4,r4,r5 # arithmetic overflowslw r3,r6,r3 # shifted inc/dec resultaddis r3,r3,0x8000 # compare with 0x80000000ori r4,r4,0x0808 # bits to setcntlzw r3,r3 # 32 if inc/dec overflowOF_ROTCNT(r6)rlwimi r4,r3,18,0x00800000 # insert inc/dec overflowrlwimi flags,one,24,OF_STATE_MASKrlwnm r3,r4,r6,8,8 # get fieldrlwimi flags,r3,3,OF_VALUE # insert OFblr/** _eval_pf will always be called when needed (complex but infrequent),* there are a few quirks for a branchless solution.* On return: PF=cr0[0], PF=MSB(r3)*/_eval_pf: rlwinm r3,flags,0,INCDEC_FIELDrotrwi r4,op2,4 # from inc/decrotrwi r5,result,4 # from resultcntlzw r3,r3 # use result if 32xor r4,r4,op2xor r5,r5,resultrlwinm r3,r3,26,0,0 # 32 becomes 0x80000000clrlwi r4,r4,28lis r6,0x9669 # constant to shiftclrlwi r5,r5,28rlwnm r4,r6,r4,0,0 # parity from inc/decrlwnm r5,r6,r5,0,0 # parity from resultandc r4,r4,r3 # select which oneand r5,r5,r3add. r3,r4,r5 # and test to simplifyblr # returns in r3 and cr0 set./** _eval_af will always be called when needed (complex but infrequent):* - if after inc, af is set when 4 low order bits of op1 are 0* - if after dec, af is set when 4 low order bits of op1 are 1* (or 0 after adding 1 as implemented here)* - if after add/sub/adc/sbb/cmp af is set from sum of 4 LSB of op1* and 4 LSB of op2 (eventually complemented) plus carry in.* - other instructions leave AF undefined so the returned value is irrelevant.* Returned value must be masked with 0x10, since all other bits are undefined.* There branchless code is perhaps not the most efficient, but quite parallel.*/_eval_af: rlwinm r3,flags,0,INCDEC_FIELDclrlwi r5,op2,28 # 4 LSB of op2addc r4,flags,flags # carry_inGET_ADDSUB(r6)cntlzw r3,r3 # if inc/dec 16..23 else 32neg r6,r6 # add/subclrlwi r4,r3,31 # if dec 1 else 0xor r5,r5,r6 # conditionally complementclrlwi r6,op1,28 # 4 LSB of op1add r4,op2,r4 # op2+(dec ? 1 : 0)clrlwi r4,r4,28 # 4 LSB of op2+(dec ? 1 : 0)adde r5,r6,r5 # op1+cy_in+(op2/~op2)cntlzw r4,r4 # 28..31 if not AF, 32 if setandc r5,r5,r3 # masked AF from add/sub...andc r4,r3,r4 # masked AF from inc/decor r3,r4,r5blr/** _eval_above will only be called if ABOVE_IN_CR is not set.* On return: ZF=cr6[2], CF=cr4[0], ABOVE=cr4[1]*/_eval_above: andis. r3,flags,ZF_IN_CR>>16mflr r0beql+ _eval_sf_zfandis. r3,flags,CF_IN_CR>>16beql+ _eval_cfmtlr r0oris flags,flags,ABOVE_IN_CR>>16crnor ABOVE,ZF,CFblr/* _eval_signed may only be called when signed_in_cr is clear ! */_eval_signed: andis. r3,flags,SF_IN_CR>>16mflr r0beql+ _eval_sf_zf/* SF_IN_CR and ZF_IN_CR are set, SIGNED_IN_CR is clear */rlwinm. r3,flags,5,0,1xoris flags,flags,(SIGNED_IN_CR|SF_IN_CR)>>16bngl+ _eval_ofandis. r3,flags,OF_VALUE>>16mtlr r0crxor SLT,SF,OFcrnor SGT,SLT,ZFblr_eval_flags: mflr r0bl _eval_cfli r7,2rlwimi r7,r3,24,CF86,CF86 # 2 if CF clear, 3 if setbl _eval_pfandis. r4,flags,SF_IN_CR>>16rlwimi r7,r3,32+PF-PF86,PF86,PF86bl _eval_afrlwimi r7,r3,0,AF86,AF86beql+ _eval_sf_zfmfcr r3rlwinm. r4,flags,5,0,1rlwimi r7,r3,0,DF86,SF86ZF2ZF86(r3,r7)bngl+ _eval_ofmtlr r0lis r4,0x0004lwz r3,eflags(state)addi r4,r4,0x7000rlwimi r7,flags,17,OF86,OF86and r3,r3,r4or r3,r3,r7blr/* Quite simple for real mode, input in r4, returns in r3. */_segment_load: lwz r5,vbase(state)rlwinm r3,r4,4,0xffff0 # segment selector * 16add r3,r3,r5blr/* To allow I/O port virtualization if necessary, code for exception in r3,port number in r4 */_check_port: lwz r5,ioperm(state)rlwinm r6,r4,29,0x1fff # 0 to 8kBlis r0,0xfffflhbrx r5,r5,r6clrlwi r6,r4,29 # modulo 8rlwnm r0,r0,r3,0x0f # 1, 3, or 0xfslw r0,r0,r6and. r0,r0,r5bne- complexblr/** Instructions are in approximate functional order:* 1) move, exchange, lea, push/pop, pusha/popa* 2) cbw/cwde/cwd/cdq, zero/sign extending moves, in/out* 3) arithmetic: add/sub/adc/sbb/cmp/inc/dec/neg* 4) logical: and/or/xor/test/not/bt/btc/btr/bts/bsf/bsr* 5) jump, call, ret* 6) string instructions and xlat* 7) rotate/shift/mul/div* 8) segment register, far jumps, calls and rets, interrupts* 9) miscellenaous (flags, bcd,...)*/#define MEM offset,base#define REG opreg,state#define SELECTORS 32#define SELBASES 64/* Immediate moves */movb_imm_reg: rlwinm opreg,opcode,2,28,29; lbz r3,1(eip)rlwimi opreg,opcode,30,31,31; lbzu opcode,2(eip)stbx r3,REG; GOTNEXTmovw_imm_reg: lhz r3,1(eip); clrlslwi opreg,opcode,29,2; lbzu opcode,3(eip)sthx r3,REG; GOTNEXTmovl_imm_reg: lwz r3,1(eip); clrlslwi opreg,opcode,29,2; lbzu opcode,5(eip)stwx r3,REG; GOTNEXTmovb_imm_mem: lbz r0,1(eip); cmpwi opreg,0lbzu opcode,2(eip); bne- udstbx r0,MEM; GOTNEXTmovw_imm_mem: lhz r0,1(eip); cmpwi opreg,0lbzu opcode,3(eip); bne- udsthx r0,MEM; GOTNEXTmovl_imm_mem: lwz r0,1(eip); cmpwi opreg,0lbzu opcode,5(eip); bne- udstwx r0,MEM; GOTNEXT/* The special short form moves between memory and al/ax/eax */movb_al_a32: lwbrx offset,eip,one; lbz r0,AL(state); lbzu opcode,5(eip)stbx r0,MEM; GOTNEXTmovb_al_a16: lhbrx offset,eip,one; lbz r0,AL(state); lbzu opcode,3(eip)stbx r0,MEM; GOTNEXTmovw_ax_a32: lwbrx offset,eip,one; lhz r0,AX(state); lbzu opcode,5(eip)sthx r0,MEM; GOTNEXTmovw_ax_a16: lhbrx offset,eip,one; lhz r0,AX(state); lbzu opcode,3(eip)sthx r0,MEM; GOTNEXTmovl_eax_a32: lwbrx offset,eip,one; lwz r0,EAX(state); lbzu opcode,5(eip)stwx r0,MEM; GOTNEXTmovl_eax_a16: lhbrx offset,eip,one; lwz r0,EAX(state); lbzu opcode,3(eip)stwx r0,MEM; GOTNEXTmovb_a32_al: lwbrx offset,eip,one; lbzu opcode,5(eip); lbzx r0,MEMstb r0,AL(state); GOTNEXTmovb_a16_al: lhbrx offset,eip,one; lbzu opcode,3(eip); lbzx r0,MEMstb r0,AL(state); GOTNEXTmovw_a32_ax: lwbrx offset,eip,one; lbzu opcode,5(eip); lhzx r0,MEMsth r0,AX(state); GOTNEXTmovw_a16_ax: lhbrx offset,eip,one; lbzu opcode,3(eip); lhzx r0,MEMsth r0,AX(state); GOTNEXTmovl_a32_eax: lwbrx offset,eip,one; lbzu opcode,5(eip); lwzx r0,MEMstw r0,EAX(state); GOTNEXTmovl_a16_eax: lhbrx offset,eip,one; lbzu opcode,3(eip); lwzx r0,MEMstw r0,EAX(state); GOTNEXT/* General purpose move (all are exactly 4 instructions long) */.align 4movb_reg_mem: lbzx r0,REGNEXTBYTE(opcode)stbx r0,MEMGOTNEXTmovw_reg_mem: lhzx r0,REGNEXTBYTE(opcode)sthx r0,MEMGOTNEXTmovl_reg_mem: lwzx r0,REGNEXTBYTE(opcode)stwx r0,MEMGOTNEXTmovb_mem_reg: lbzx r0,MEMNEXTBYTE(opcode)stbx r0,REGGOTNEXTmovw_mem_reg: lhzx r0,MEMNEXTBYTE(opcode)sthx r0,REGGOTNEXTmovl_mem_reg: lwzx r0,MEMNEXTBYTE(opcode)stwx r0,REGGOTNEXT/* short form exchange ax/eax with register */xchgw_ax_reg: clrlslwi opreg,opcode,29,2lhz r3,AX(state)lhzx r4,REGsthx r3,REGsth r4,AX(state)NEXTxchgl_eax_reg: clrlslwi opreg,opcode,29,2lwz r3,EAX(state)lwzx r4,REGstwx r3,REGstw r4,EAX(state)NEXT/* General exchange (unlocked!) */xchgb_reg_mem: lbzx r3,MEMlbzx r4,REGNEXTBYTE(opcode)stbx r3,REGstbx r4,MEMGOTNEXTxchgw_reg_mem: lhzx r3,MEMlhzx r4,REGsthx r3,REGsthx r4,MEMNEXTxchgl_reg_mem: lwzx r3,MEMlwzx r4,REGstwx r3,REGstwx r4,MEMNEXT/* lea, one of the simplest instructions */leaw: cmpw base,statebeq- udsthbrx offset,REGNEXTleal: cmpw base,statebeq- udstwbrx offset,REGNEXT/* Short form pushes and pops */pushw_sp_reg: li r3,SPlhbrx r4,state,r3clrlslwi opreg,opcode,29,2lhzx r0,REGaddi r4,r4,-2sthbrx r4,state,r3clrlwi r4,r4,16sthx r0,ssb,r4NEXTpushl_sp_reg: li r3,SPlhbrx r4,state,r3clrlslwi opreg,opcode,29,2lwzx r0,REGaddi r4,r4,-4sthbrx r4,state,r3clrlwi r4,r4,16stwx r0,ssb,r4NEXTpopw_sp_reg: li r3,SPlhbrx r4,state,r3clrlslwi opreg,opcode,29,2lhzx r0,ssb,r4addi r4,r4,2 # order is important in case of pop spsthbrx r4,state,r3sthx r0,REGNEXTpopl_sp_reg: li r3,SPlhbrx r4,state,r3clrlslwi opreg,opcode,29,2lwzx r0,ssb,r4addi r4,r4,4sthbrx r4,state,r3stwx r0,REGNEXT/* Push immediate */pushw_sp_imm: li r3,SPlhbrx r4,state,r3lhz r0,1(eip)addi r4,r4,-2sthbrx r4,state,r3clrlwi r4,r4,16lbzu opcode,3(eip)sthx r0,ssb,r4GOTNEXTpushl_sp_imm: li r3,SPlhbrx r4,state,r3lwz r0,1(eip)addi r4,r4,-4sthbrx r4,state,r3clrlwi r4,r4,16lbzu opcode,5(eip)stwx r0,ssb,r4GOTNEXTpushw_sp_imm8: li r3,SPlhbrx r4,state,r3lhz r0,1(eip)addi r4,r4,-2sthbrx r4,state,r3clrlwi r4,r4,16lbzu opcode,2(eip)extsb r0,r0sthx r0,ssb,r4GOTNEXTpushl_sp_imm8: li r3,SPlhbrx r4,state,r3lhz r0,1(eip)addi r4,r4,-4sthbrx r4,state,r3clrlwi r4,r4,16lbzu opcode,2(eip)extsb r0,r0stwx r0,ssb,r4GOTNEXT/* General push/pop */pushw_sp: lhbrx r0,MEMli r3,SPlhbrx r4,state,r3addi r4,r4,-2sthbrx r4,state,r3clrlwi r4,r4,16sthbrx r0,r4,ssbNEXTpushl_sp: lwbrx r0,MEMli r3,SPlhbrx r4,state,r3addi r4,r4,-4sthbrx r4,state,r3clrlwi r4,r4,16stwbrx r0,r4,ssbNEXT/* pop is an exception with 32 bit addressing modes, it is possibleto calculate wrongly the address when esp is used as base. But 16 bitaddressing modes are safe */popw_sp_a16: cmpw cr1,opreg,0 # first check the opcodeli r3,SPlhbrx r4,state,r3bne- cr1,udlhzx r0,ssb,r4addi r4,r4,2sthx r0,MEMsthbrx r4,state,r3NEXTpopl_sp_a16: cmpw cr1,opreg,0li r3,SPlhbrx r4,state,r3bne- cr1,udlwzx r0,ssb,r4addi r4,r4,2stwx r0,MEMsthbrx r4,state,r3NEXT/* 32 bit addressing modes for pop not implemented for now. */.equ popw_sp_a32,unimpl.equ popl_sp_a32,unimpl/* pusha/popa */pushaw_sp: li r3,SPli r0,8lhbrx r4,r3,statemtctr r0addi r5,state,-41: addi r4,r4,-2lhzu r6,4(r5)clrlwi r4,r4,16sthx r6,ssb,r4bdnz 1bsthbrx r4,r3,state # new spNEXTpushal_sp: li r3,SPli r0,8lhbrx r4,r3,statemtctr r0addi r5,state,-41: addi r4,r4,-4lwzu r6,4(r5)clrlwi r4,r4,16stwx r6,ssb,r4bdnz 1bsthbrx r4,r3,state # new spNEXTpopaw_sp: li r3,SPli r0,8lhbrx r4,state,r3mtctr r0addi r5,state,321: lhzx r6,ssb,r4addi r4,r4,2sthu r6,-4(r5)clrlwi r4,r4,16bdnz 1bsthbrx r4,r3,state # updated spNEXTpopal_sp: li r3,SPlis r0,0xef00 # mask to skip esplhbrx r4,state,r3addi r5,state,321: add. r0,r0,r0lwzx r6,ssb,r4addi r4,r4,4stwu r6,-4(r5)clrlwi r4,r4,16blt 1baddi r6,r6,-4beq 2faddi r4,r4,4clrlwi r4,r4,16b 1b2: sthbrx r4,state,r3 # updated spNEXT/* Moves with zero or sign extension: first the special cases */cbw: lbz r3,AL(state)extsb r3,r3sthbrx r3,AX,stateNEXTcwde: lhbrx r3,AX,stateextsh r3,r3stwbrx r3,EAX,stateNEXTcwd: lbz r3,AH(state)extsb r3,r3srwi r3,r3,8 # get sign bitssth r3,DX(state)NEXTcdq: lwbrx r3,EAX,statesrawi r3,r3,31stw r3,EDX(state) # byte order unimportant !NEXT/* The move with zero or sign extension are special since the sourceand destination are not the same size. The register describing the destinationis modified to take this into account. */movsbw: lbzx r3,MEMrlwimi opreg,opreg,4,0x10extsb r3,r3rlwinm opreg,opreg,0,0x1csthbrx r3,REGNEXTmovsbl: lbzx r3,MEMrlwimi opreg,opreg,4,0x10extsb r3,r3rlwinm opreg,opreg,0,0x1cstwbrx r3,REGNEXT.equ movsww, movw_mem_regmovswl: lhbrx r3,MEMextsh r3,r3stwbrx r3,REGNEXTmovzbw: lbzx r3,MEMrlwimi opreg,opreg,4,0x10rlwinm opreg,opreg,0,0x1csthbrx r3,REGNEXTmovzbl: lbzx r3,MEMrlwimi opreg,opreg,4,0x10rlwinm opreg,opreg,0,0x1cstwbrx r3,REGNEXT.equ movzww, movw_mem_regmovzwl: lhbrx r3,MEMstwbrx r3,REGNEXT/* Byte swapping */bswap: clrlslwi opreg,opcode,29,2 # extract reg from opcodelwbrx r0,REGstwx r0,REGNEXT/* Input/output */inb_port_al: NEXTBYTE(r4)b 1finb_dx_al: li r4,DXlhbrx r4,r4,state1: li r3,code_inbbl _check_portlwz r3,iobase(state)lbzx r5,r4,r3eieiostb r5,AL(state)NEXTinw_port_ax: NEXTBYTE(r4)b 1finw_dx_ax: li r4,DXlhbrx r4,r4,state1: li r3,code_inwbl _check_portlwz r3,iobase(state)lhzx r5,r4,r3eieiosth r5,AX(state)NEXTinl_port_eax: NEXTBYTE(r4)b 1finl_dx_eax: li r4,DXlhbrx r4,r4,state1: li r3,code_inlbl _check_portlwz r3,iobase(state)lwzx r5,r4,r3eieiostw r5,EAX(state)NEXToutb_al_port: NEXTBYTE(r4)b 1foutb_al_dx: li r4,DXlhbrx r4,r4,state1: li r3,code_outbbl _check_portlwz r3,iobase(state)lbz r5,AL(state)stbx r5,r4,r3eieioNEXToutw_ax_port: NEXTBYTE(r4)b 1foutw_ax_dx: li r4,DXlhbrx r4,r4,state1: li r3,code_outwbl _check_portlwz r3,iobase(state)lhz r5,AX(state)sthx r5,r4,r3eieioNEXToutl_eax_port: NEXTBYTE(r4)b 1foutl_eax_dx: li r4,DXlhbrx r4,r4,state1: li r3,code_outlbl _check_portlwz r4,iobase(state)lwz r5,EAX(state)stwx r5,r4,r3eieioNEXT/* Macro used for add and sub */#define ARITH(op,fl) \op##b_reg_mem: lbzx op1,MEM; SET_FLAGS(fl(B)); lbzx op2,REG; \op result,op1,op2; \stbx result,MEM; NEXT; \op##w_reg_mem: lhbrx op1,MEM; SET_FLAGS(fl(W)); lhbrx op2,REG; \op result,op1,op2; \sthbrx result,MEM; NEXT; \op##l_reg_mem: lwbrx op1,MEM; SET_FLAGS(fl(L)); lwbrx op2,REG; \op result,op1,op2; \stwbrx result,MEM; NEXT; \op##b_mem_reg: lbzx op2,MEM; SET_FLAGS(fl(B)); lbzx op1,REG; \op result,op1,op2; \stbx result,REG; NEXT; \op##w_mem_reg: lhbrx op2,MEM; SET_FLAGS(fl(W)); lhbrx op1,REG; \op result,op1,op2; \sthbrx result,REG; NEXT; \op##l_mem_reg: lwbrx op2,MEM; SET_FLAGS(fl(L)); lwbrx op1,REG; \op result,op1,op2; \stwbrx result,REG; NEXT; \op##b_imm_al: addi base,state,0; li offset,AL; \op##b_imm: lbzx op1,MEM; SET_FLAGS(fl(B)); lbz op2,1(eip); \op result,op1,op2; \lbzu opcode,2(eip); \stbx result,MEM; GOTNEXT; \op##w_imm_ax: addi base,state,0; li offset,AX; \op##w_imm: lhbrx op1,MEM; SET_FLAGS(fl(W)); lhbrx op2,eip,one; \op result,op1,op2; \lbzu opcode,3(eip); \sthbrx result,MEM; GOTNEXT; \op##w_imm8: lbz op2,1(eip); SET_FLAGS(fl(W)); lhbrx op1,MEM; \extsb op2,op2; clrlwi op2,op2,16; \op result,op1,op2; \lbzu opcode,2(eip); \sthbrx result,MEM; GOTNEXT; \op##l_imm_eax: addi base,state,0; li offset,EAX; \op##l_imm: lwbrx op1,MEM; SET_FLAGS(fl(L)); lwbrx op2,eip,one; \op result,op1,op2; lbzu opcode,5(eip); \stwbrx result,MEM; GOTNEXT; \op##l_imm8: lbz op2,1(eip); SET_FLAGS(fl(L)); lwbrx op1,MEM; \extsb op2,op2; lbzu opcode,2(eip); \op result,op1,op2; \stwbrx result,MEM; GOTNEXTARITH(add, FLAGS_ADD)ARITH(sub, FLAGS_SUB)#define adc(result, op1, op2) adde result,op1,op2#define sbb(result, op1, op2) subfe result,op2,op1#define ARITH_WITH_CARRY(op, fl) \op##b_reg_mem: lbzx op1,MEM; bl carryfor##op; lbzx op2,REG; \ADD_FLAGS(fl(B)); op(result, op1, op2); \stbx result,MEM; NEXT; \op##w_reg_mem: lhbrx op1,MEM; bl carryfor##op; lhbrx op2,REG; \ADD_FLAGS(fl(W)); op(result, op1, op2); \sthbrx result,MEM; NEXT; \op##l_reg_mem: lwbrx op1,MEM; bl carryfor##op; lwbrx op2,REG; \ADD_FLAGS(fl(L)); op(result, op1, op2); \stwbrx result,MEM; NEXT; \op##b_mem_reg: lbzx op1,MEM; bl carryfor##op; lbzx op2,REG; \ADD_FLAGS(fl(B)); op(result, op1, op2); \stbx result,REG; NEXT; \op##w_mem_reg: lhbrx op1,MEM; bl carryfor##op; lhbrx op2,REG; \ADD_FLAGS(fl(W)); op(result, op1, op2); \sthbrx result,REG; NEXT; \op##l_mem_reg: lwbrx op1,MEM; bl carryfor##op; lwbrx op2,REG; \ADD_FLAGS(fl(L)); op(result, op1, op2); \stwbrx result,REG; NEXT; \op##b_imm_al: addi base,state,0; li offset,AL; \op##b_imm: lbzx op1,MEM; bl carryfor##op; lbz op2,1(eip); \ADD_FLAGS(fl(B)); lbzu opcode,2(eip); op(result, op1, op2); \stbx result,MEM; GOTNEXT; \op##w_imm_ax: addi base,state,0; li offset,AX; \op##w_imm: lhbrx op1,MEM; bl carryfor##op; lhbrx op2,eip,one; \ADD_FLAGS(fl(W)); lbzu opcode,3(eip); op(result, op1, op2); \sthbrx result,MEM; GOTNEXT; \op##w_imm8: lbz op2,1(eip); bl carryfor##op; lhbrx op1,MEM; \extsb op2,op2; ADD_FLAGS(fl(W)); clrlwi op2,op2,16; \lbzu opcode,2(eip); op(result, op1, op2); \sthbrx result,MEM; GOTNEXT; \op##l_imm_eax: addi base,state,0; li offset,EAX; \op##l_imm: lwbrx op1,MEM; bl carryfor##op; lwbrx op2,eip,one; \ADD_FLAGS(fl(L)); lbzu opcode,5(eip); op(result, op1, op2); \stwbrx result,MEM; GOTNEXT; \op##l_imm8: lbz op2,1(eip); SET_FLAGS(fl(L)); lwbrx op1,MEM; \extsb op2,op2; lbzu opcode,2(eip); \op(result, op1, op2); \stwbrx result,MEM; GOTNEXTcarryforadc: addc r3,flags,flags # CF_IN to xer[ca]RES2CF(r4) # get 8 or 16 bit carrysubfe r3,result,op1 # generate PPC carry forCF_ROTCNT(r5) # preceding operationaddze r3,r4 # 32 bit carry in LSBCF_POL(r4,23) # polarityrlwnm r3,r3,r5,0x100 # shift carry therexor flags,r4,r3 # CF86 ? 0x100 : 0addic r4,r3,0xffffff00 # set xer[ca]rlwinm flags,r3,23,CF_INblrARITH_WITH_CARRY(adc, FLAGS_ADD)/* for sbb the input carry must be the complement of the x86 carry */carryforsbb: addc r3,flags,flags # CF_IN to xer[ca]RES2CF(r4) # 8/16 bit carry from resultsubfe r3,result,op1CF_ROTCNT(r5)addze r3,r4CF_POL(r4,23)rlwnm r3,r3,r5,0x100eqv flags,r4,r3 # CF86 ? 0xfffffeff:0xffffffffaddic r4,r3,1 # set xer[ca]rlwinm flags,r3,23,CF_IN # keep only the carryblrARITH_WITH_CARRY(sbb, FLAGS_SBB)cmpb_reg_mem: lbzx op1,MEMSET_FLAGS(FLAGS_CMP(B))lbzx op2,REGextsb r3,op1cmplw cr4,op1,op2extsb r4,op2sub result,op1,op2cmpw cr6,r3,r4NEXTcmpw_reg_mem: lhbrx op1,MEMSET_FLAGS(FLAGS_CMP(W))lhbrx op2,REGextsh r3,op1cmplw cr4,op1,op2extsh r4,op2sub result,op1,op2cmpw cr6,r3,r4NEXTcmpl_reg_mem: lwbrx op1,MEMSET_FLAGS(FLAGS_CMP(L))lwbrx op2,REGcmplw cr4,op1,op2sub result,op1,op2cmpw cr6,op1,op2NEXTcmpb_mem_reg: lbzx op2,MEMSET_FLAGS(FLAGS_CMP(B))lbzx op1,REGextsb r4,op2cmplw cr4,op1,op2extsb r3,op1sub result,op1,op2cmpw cr6,r3,r4NEXTcmpw_mem_reg: lhbrx op2,MEMSET_FLAGS(FLAGS_CMP(W))lhbrx op1,REGextsh r4,op2cmplw cr4,op1,op2extsh r3,op1sub result,op1,op2cmpw cr6,r3,r4NEXTcmpl_mem_reg: lwbrx op2,MEMSET_FLAGS(FLAGS_CMP(L))lwbrx op1,REGcmpw cr6,op1,op2sub result,op1,op2cmplw cr4,op1,op2NEXTcmpb_imm_al: addi base,state,0li offset,ALcmpb_imm: lbzx op1,MEMSET_FLAGS(FLAGS_CMP(B))lbz op2,1(eip)extsb r3,op1cmplw cr4,op1,op2lbzu opcode,2(eip)extsb r4,op2sub result,op1,op2cmpw cr6,r3,r4GOTNEXTcmpw_imm_ax: addi base,state,0li offset,AXcmpw_imm: lhbrx op1,MEMSET_FLAGS(FLAGS_CMP(W))lhbrx op2,eip,oneextsh r3,op1cmplw cr4,op1,op2lbzu opcode,3(eip)extsh r4,op2sub result,op1,op2cmpw cr6,r3,r4GOTNEXTcmpw_imm8: lbz op2,1(eip)SET_FLAGS(FLAGS_CMP(W))lhbrx op1,MEMextsb r4,op2extsh r3,op1lbzu opcode,2(eip)clrlwi op2,r4,16cmpw cr6,r3,r4sub result,op1,op2cmplw cr4,op1,op2GOTNEXTcmpl_imm_eax: addi base,state,0li offset,EAXcmpl_imm: lwbrx op1,MEMSET_FLAGS(FLAGS_CMP(L))lwbrx op2,eip,onecmpw cr6,op1,op2lbzu opcode,5(eip)sub result,op1,op2cmplw cr4,op1,op2GOTNEXTcmpl_imm8: lbz op2,1(eip)SET_FLAGS(FLAGS_CMP(L))lwbrx op1,MEMextsb op2,op2lbzu opcode,2(eip)cmpw cr6,op1,op2sub result,op1,op2cmplw cr4,op1,op2GOTNEXT/* Increment and decrement */incb: lbzx op2,MEMINC_FLAGS(B)addi op2,op2,1stbx op2,MEMNEXTincw_reg: clrlslwi opreg,opcode,29,2 # extract reg from opcodelhbrx op2,REGINC_FLAGS(W)addi op2,op2,1sthbrx op2,REGNEXTincw: lhbrx op2,MEMINC_FLAGS(W)addi op2,op2,1sthbrx op2,MEMNEXTincl_reg: clrlslwi opreg,opcode,29,2lwbrx op2,REGINC_FLAGS(L)addi op2,op2,1sthbrx op2,REGNEXTincl: lwbrx op2,MEMINC_FLAGS(L)addi op2,op2,1stwbrx op2,MEMNEXTdecb: lbzx op2,MEMDEC_FLAGS(B)addi op2,op2,-1stbx op2,MEMNEXTdecw_reg: clrlslwi opreg,opcode,29,2 # extract reg from opcodelhbrx op2,REGDEC_FLAGS(W)addi op2,op2,-1sthbrx op2,REGNEXTdecw: lhbrx op2,MEMDEC_FLAGS(W)addi op2,op2,-1sthbrx op2,MEMNEXTdecl_reg: clrlslwi opreg,opcode,29,2lwbrx op2,REGDEC_FLAGS(L)addi op2,op2,-1sthbrx op2,REGNEXTdecl: lwbrx op2,MEMDEC_FLAGS(L)addi op2,op2,-1stwbrx op2,MEMNEXTnegb: lbzx op2,MEMSET_FLAGS(FLAGS_SUB(B))neg result,op2li op1,0stbx result,MEMNEXTnegw: lhbrx op2,MEMSET_FLAGS(FLAGS_SUB(W))neg result,op2li op1,0sthbrx r0,MEMNEXTnegl: lwbrx op2,MEMSET_FLAGS(FLAGS_SUB(L))subfic result,op2,0li op1,0stwbrx result,MEMNEXT/* Macro used to generate code for OR/AND/XOR */#define LOGICAL(op) \op##b_reg_mem: lbzx op1,MEM; SET_FLAGS(FLAGS_LOG(B)); lbzx op2,REG; \op result,op1,op2; \stbx result,MEM; NEXT; \op##w_reg_mem: lhbrx op1,MEM; SET_FLAGS(FLAGS_LOG(W)); lhbrx op2,REG; \op result,op1,op2; \sthbrx result,MEM; NEXT; \op##l_reg_mem: lwbrx op1,MEM; SET_FLAGS(FLAGS_LOG(L)); lwbrx op2,REG; \op result,op1,op2; \stwbrx result,MEM; NEXT; \op##b_mem_reg: lbzx op1,MEM; SET_FLAGS(FLAGS_LOG(B)); lbzx op2,REG; \op result,op1,op2; \stbx result,REG; NEXT; \op##w_mem_reg: lhbrx op2,MEM; SET_FLAGS(FLAGS_LOG(W)); lhbrx op1,REG; \op result,op1,op2; \sthbrx result,REG; NEXT; \op##l_mem_reg: lwbrx op2,MEM; SET_FLAGS(FLAGS_LOG(L)); lwbrx op1,REG; \op result,op1,op2; \stwbrx result,REG; NEXT; \op##b_imm_al: addi base,state,0; li offset,AL; \op##b_imm: lbzx op1,MEM; SET_FLAGS(FLAGS_LOG(B)); lbz op2,1(eip); \op result,op1,op2; lbzu opcode,2(eip); \stbx result,MEM; GOTNEXT; \op##w_imm_ax: addi base,state,0; li offset,AX; \op##w_imm: lhbrx op1,MEM; SET_FLAGS(FLAGS_LOG(W)); lhbrx op2,eip,one; \op result,op1,op2; lbzu opcode,3(eip); \sthbrx result,MEM; GOTNEXT; \op##w_imm8: lbz op2,1(eip); SET_FLAGS(FLAGS_LOG(W)); lhbrx op1,MEM; \extsb op2,op2; lbzu opcode,2(eip); \op result,op1,op2; \sthbrx result,MEM; GOTNEXT; \op##l_imm_eax: addi base,state,0; li offset,EAX; \op##l_imm: lwbrx op1,MEM; SET_FLAGS(FLAGS_LOG(L)); lwbrx op2,eip,one; \op result,op1,op2; lbzu opcode,5(eip); \stwbrx result,MEM; GOTNEXT; \op##l_imm8: lbz op2,1(eip); SET_FLAGS(FLAGS_LOG(L)); lwbrx op1,MEM; \extsb op2,op2; lbzu opcode,2(eip); \op result,op1,op2; \stwbrx result,MEM; GOTNEXTLOGICAL(or)LOGICAL(and)LOGICAL(xor)testb_reg_mem: lbzx op1,MEMSET_FLAGS(FLAGS_TEST(B))lbzx op2,REGand result,op1,op2extsb r3,resultcmpwi cr6,r3,0NEXTtestw_reg_mem: lhbrx op1,MEMSET_FLAGS(FLAGS_TEST(W))lhbrx op2,REGand result,op1,op2extsh r3,resultcmpwi cr6,r3,0NEXTtestl_reg_mem: lwbrx r3,MEMSET_FLAGS(FLAGS_TEST(L))lwbrx r4,REGand result,op1,op2cmpwi cr6,result,0NEXTtestb_imm_al: addi base,state,0li offset,ALtestb_imm: lbzx op1,MEMSET_FLAGS(FLAGS_TEST(B))lbz op2,1(eip)and result,op1,op2lbzu opcode,2(eip)extsb r3,resultcmpwi cr6,r3,0GOTNEXTtestw_imm_ax: addi base,state,0li offset,AXtestw_imm: lhbrx op1,MEMSET_FLAGS(FLAGS_TEST(W))lhbrx op2,eip,oneand result,op1,op2lbzu opcode,3(eip)extsh r3,resultcmpwi cr6,r3,0GOTNEXTtestl_imm_eax: addi base,state,0li offset,EAXtestl_imm: lwbrx op1,MEMSET_FLAGS(FLAGS_TEST(L))lwbrx op2,eip,oneand result,r3,r4lbzu opcode,5(eip)cmpwi cr6,result,0GOTNEXT/* Not does not affect flags */notb: lbzx r3,MEMxori r3,r3,255stbx r3,MEMNEXTnotw: lhzx r3,MEMxori r3,r3,65535sthx r3,MEMNEXTnotl: lwzx r3,MEMnot r3,r3stwx r3,MEMNEXTboundw: lhbrx r4,REGli r3,code_boundlhbrx r5,MEMaddi offset,offset,2extsh r4,r4lhbrx r6,MEMextsh r5,r5cmpw r4,r5extsh r6,r6blt- complexcmpw r4,r6ble+ nopb complexboundl: lwbrx r4,REGli r3,code_boundlwbrx r5,MEMaddi offset,offset,4lwbrx r6,MEMcmpw r4,r5blt- complexcmpw r4,r6ble+ nopb complex/* Bit test and modify instructions *//* Common routine: bit index in op2, returns memory value in r3, mask in op2,and of mask and value in op1. CF flag is set as with 32 bit add when bit isnon zero since result (which is cleared) will be less than op1, and in cr4,all other flags are undefined from Intel doc. Here OF and SF are clearedand ZF is set as a side effect of result being cleared. */_setup_bitw: cmpw base,stateSET_FLAGS(FLAGS_BTEST)extsh op2,op2beq- 1fsrawi r4,op2,4add offset,offset,r41: clrlwi op2,op2,28 # true bit indexlhbrx r3,MEMslw op2,one,op2 # build maskli result,0 # implicitly sets CFand op1,r3,op2 # if result<op1cmplw cr4,result,op1 # sets CF in cr4blr_setup_bitl: cmpw base,stateSET_FLAGS(FLAGS_BTEST)beq- 1fsrawi r4,op2,5add offset,offset,r41: lwbrx r3,MEMrotlw op2,one,op2 # build maskli result,0and op1,r3,op2cmplw cr4,result,op1blr/* Immediate forms bit tests are not frequent since logical are often faster */btw_imm: NEXTBYTE(op2)b 1fbtw_reg_mem: lhbrx op2,REG1: bl _setup_bitwNEXTbtl_imm: NEXTBYTE(op2)b 1fbtl_reg_mem: lhbrx op2,REG1: bl _setup_bitlNEXTbtcw_imm: NEXTBYTE(op2)b 1fbtcw_reg_mem: lhbrx op2,REG1: bl _setup_bitwxor r3,r3,op2sthbrx r3,MEMNEXTbtcl_imm: NEXTBYTE(op2)b 1fbtcl_reg_mem: lhbrx op2,REG1: bl _setup_bitlxor r3,r3,op2stwbrx result,MEMNEXTbtrw_imm: NEXTBYTE(op2)b 1fbtrw_reg_mem: lhbrx op2,REG1: bl _setup_bitwandc r3,r3,op2sthbrx r3,MEMNEXTbtrl_imm: NEXTBYTE(op2)b 1fbtrl_reg_mem: lhbrx op2,REG1: bl _setup_bitlandc r3,r3,op2stwbrx r3,MEMNEXTbtsw_imm: NEXTBYTE(op2)b 1fbtsw_reg_mem: lhbrx op2,REG1: bl _setup_bitwor r3,r3,op2sthbrx r3,MEMNEXTbtsl_imm: NEXTBYTE(op2)b 1fbtsl_reg_mem: lhbrx op2,REG1: bl _setup_bitlor r3,r3,op2stwbrx r3,MEMNEXT/* Bit string search instructions, only ZF is defined after these, and theresult value is not defined when the bit field is zero. */bsfw: lhbrx result,MEMSET_FLAGS(FLAGS_BSRCH(W))neg r3,resultcmpwi cr6,result,0 # sets ZFand r3,r3,result # keep only LSBcntlzw r3,r3subfic r3,r3,31sthbrx r3,REGNEXTbsfl: lwbrx result,MEMSET_FLAGS(FLAGS_BSRCH(L))neg r3,resultcmpwi cr6,result,0 # sets ZFand r3,r3,result # keep only LSBcntlzw r3,r3subfic r3,r3,31stwbrx r3,REGNEXTbsrw: lhbrx result,MEMSET_FLAGS(FLAGS_BSRCH(W))cntlzw r3,resultcmpwi cr6,result,0subfic r3,r3,31sthbrx r3,REGNEXTbsrl: lwbrx result,MEMSET_FLAGS(FLAGS_BSRCH(L))cntlzw r3,resultcmpwi cr6,result,0subfic r3,r3,31stwbrx r3,REGNEXT/* Unconditional jumps, first the indirect than relative */jmpw: lhbrx eip,MEMlbzux opcode,eip,csbGOTNEXTjmpl: lwbrx eip,MEMlbzux opcode,eip,csbGOTNEXTsjmp_w: lbz r3,1(eip)sub eip,eip,csbaddi eip,eip,2 # EIP after instructionextsb r3,r3add eip,eip,r3clrlwi eip,eip,16 # module 64klbzux opcode,eip,csbGOTNEXTjmp_w: lhbrx r3,eip,one # eip now off by 3sub eip,eip,csbaddi r3,r3,3 # compensateadd eip,eip,r3clrlwi eip,eip,16lbzux opcode,eip,csbGOTNEXTsjmp_l: lbz r3,1(eip)addi eip,eip,2extsb r3,r3lbzux opcode,eip,r3GOTNEXTjmp_l: lwbrx r3,eip,one # Simpleaddi eip,eip,5lbzux opcode,eip,r3GOTNEXT/* The conditional jumps: although it should not happen,byte relative jumps (sjmp) may wrap around in 16 bit mode */#define NOTTAKEN_S lbzu opcode,2(eip); GOTNEXT#define NOTTAKEN_W lbzu opcode,3(eip); GOTNEXT#define NOTTAKEN_L lbzu opcode,5(eip); GOTNEXT#define CONDJMP(cond, eval, flag) \sj##cond##_w: EVAL_##eval; bt flag,sjmp_w; NOTTAKEN_S; \j##cond##_w: EVAL_##eval; bt flag,jmp_w; NOTTAKEN_W; \sj##cond##_l: EVAL_##eval; bt flag,sjmp_l; NOTTAKEN_S; \j##cond##_l: EVAL_##eval; bt flag,jmp_l; NOTTAKEN_L; \sjn##cond##_w: EVAL_##eval; bf flag,sjmp_w; NOTTAKEN_S; \jn##cond##_w: EVAL_##eval; bf flag,jmp_w; NOTTAKEN_W; \sjn##cond##_l: EVAL_##eval; bf flag,sjmp_l; NOTTAKEN_S; \jn##cond##_l: EVAL_##eval; bf flag,jmp_l; NOTTAKEN_LCONDJMP(o, OF, OF)CONDJMP(c, CF, CF)CONDJMP(z, ZF, ZF)CONDJMP(a, ABOVE, ABOVE)CONDJMP(s, SF, SF)CONDJMP(p, PF, PF)CONDJMP(g, SIGNED, SGT)CONDJMP(l, SIGNED, SLT)jcxz_w: lhz r3,CX(state); cmpwi r3,0; beq- sjmp_w; NOTTAKEN_Sjcxz_l: lhz r3,CX(state); cmpwi r3,0; beq- sjmp_l; NOTTAKEN_Sjecxz_w: lwz r3,ECX(state); cmpwi r3,0; beq- sjmp_w; NOTTAKEN_Sjecxz_l: lwz r3,ECX(state); cmpwi r3,0; beq- sjmp_l; NOTTAKEN_S/* Note that loop is somewhat strange, the data size attribute givesthe size of eip, and the address size whether the counter is cx or ecx.This is the same for jcxz/jecxz. */loopw_w: li opreg,CXlhbrx r0,REGsub. r0,r0,onesthbrx r0,REGbne+ sjmp_wNOTTAKEN_Sloopl_w: li opreg,ECXlwbrx r0,REGsub. r0,r0,onestwbrx r0,REGbne+ sjmp_wNOTTAKEN_Sloopw_l: li opreg,CXlhbrx r0,REGsub. r0,r0,onesthbrx r0,REGbne+ sjmp_lNOTTAKEN_Sloopl_l: li opreg,ECXlwbrx r0,REGsub. r0,r0,onestwbrx r0,REGbne+ sjmp_lNOTTAKEN_Sloopzw_w: li opreg,CXlhbrx r0,REGEVAL_ZFsub. r0,r0,onesthbrx r0,REGbf ZF,1fbne+ sjmp_w1: NOTTAKEN_Sloopzl_w: li opreg,ECXlwbrx r0,REGEVAL_ZFsub. r3,r3,onestwbrx r3,REGbf ZF,1fbne+ sjmp_w1: NOTTAKEN_Sloopzw_l: li opreg,CXlhbrx r0,REGEVAL_ZFsub. r0,r0,onesthbrx r0,REGbf ZF,1fbne+ sjmp_l1: NOTTAKEN_Sloopzl_l: li opreg,ECXlwbrx r0,REGEVAL_ZFsub. r0,r0,onestwbrx r0,REGbf ZF,1fbne+ sjmp_l1: NOTTAKEN_Sloopnzw_w: li opreg,CXlhbrx r0,REGEVAL_ZFsub. r0,r0,onesthbrx r0,REGbt ZF,1fbne+ sjmp_w1: NOTTAKEN_Sloopnzl_w: li opreg,ECXlwbrx r0,REGEVAL_ZFsub. r0,r0,onestwbrx r0,REGbt ZF,1fbne+ sjmp_w1: NOTTAKEN_Sloopnzw_l: li opreg,CXlhbrx r0,REGEVAL_ZFsub. r0,r0,onesthbrx r0,REGbt ZF,1fbne+ sjmp_l1: NOTTAKEN_Sloopnzl_l: li opreg,ECXlwbrx r0,REGEVAL_ZFsub. r0,r0,onestwbrx r0,REGbt ZF,1fbne+ sjmp_l1: NOTTAKEN_S/* Memory indirect calls are rare enough to limit code duplication */callw_sp_mem: lhbrx r3,MEMsub r4,eip,csbaddi r4,r4,1 # r4 is now return addressb 1f.equ calll_sp_mem, unimplcallw_sp: lhbrx r3,eip,onesub r4,eip,csbaddi r4,r4,3 # r4 is return addressadd r3,r4,r31: clrlwi eip,r3,16li r5,SPlhbrx r6,state,r5 # get spaddi r6,r6,-2lbzux opcode,eip,csbsthbrx r6,state,r5 # update spclrlwi r6,r6,16sthbrx r4,ssb,r6 # push return addressGOTNEXT.equ calll_sp, unimplretw_sp_imm: li opreg,SPlhbrx r4,REGlhbrx r6,eip,oneaddi r5,r4,2lhbrx eip,ssb,r4lbzux opcode,eip,csbadd r5,r5,r6sthbrx r5,REGGOTNEXT.equ retl_sp_imm, unimplretw_sp: li opreg,SPlhbrx r4,REGaddi r5,r4,2lhbrx eip,ssb,r4lbzux opcode,eip,csbsthbrx r5,REGGOTNEXT.equ retl_sp, unimpl/* Enter is a mess, and the description in Intel documents is actually wrong* in most revisions (all PPro/PII I have but the old Pentium is Ok) !*/enterw_sp: lhbrx r0,eip,one # Stack space to allocateli opreg,SPlhbrx r3,REG # SPli r7,BPlbzu r4,3(eip) # nesting leveladdi r3,r3,-2lhbrx r5,state,r7 # Original BPclrlwi r3,r3,16sthbrx r5,ssb,r3 # Push BPandi. r4,r4,31 # modulo 32 and testmr r6,r3 # Save frame pointer to tempbeq 3fmtctr r4 # iterate level-1 timesb 2f1: addi r5,r5,-2 # copy list of frame pointersclrlwi r5,r5,16lhzx r4,ssb,r5addi r3,r3,-2clrlwi r3,r3,16sthx r4,ssb,r32: bdnz 1baddi r3,r3,-2 # save current frame pointerclrlwi r3,r3,16sthbrx r6,ssb,r33: sthbrx r6,state,r7 # New BPsub r3,r3,r0sthbrx r3,REG # Save new stack pointerNEXT.equ enterl_sp, unimplleavew_sp: li opreg,BPlhbrx r3,REG # Stack = BPaddi r4,r3,2 #lhzx r3,ssb,r3li opreg,SPsthbrx r4,REG # New Stacksth r3,BP(state) # Popped BPNEXT.equ leavel_sp, unimpl/* String instructions: first a generic setup routine, which exits earlyif there is a repeat prefix with a count of 0 */#define STRINGSRC base,offset#define STRINGDST esb,opreg_setup_stringw: li offset,SI #rlwinm. r3,opcode,19,0,1 # lt=repnz, gt= repz, eq noneli opreg,DIlhbrx offset,state,offset # load sili r3,1 # no repeatlhbrx opreg,state,opreg # load dibeq 1f # no repeatli r3,CXlhbrx r3,state,r3 # load CXcmpwi r3,0beq nop # early exit here !1: mtctr r3 # ctr=CX or 1li r7,1 # stridebflr+ DFli r7,-1 # change stride signblr/* Ending routine to update all changed registers (goes directly to NEXT) */_finish_strw: li r4,SIsthbrx offset,state,r4 # update sili r4,DIsthbrx opreg,state,r4 # update dibeq nopmfctr r3li r4,CXsthbrx r3,state,r4 # update cxNEXTlodsb_a16: bl _setup_stringw1: lbzx r0,STRINGSRC # [rep] lodsbadd offset,offset,r7clrlwi offset,offset,16bdnz 1bstb r0,AL(state)b _finish_strwlodsw_a16: bl _setup_stringwslwi r7,r7,11: lhzx r0,STRINGSRC # [rep] lodswadd offset,offset,r7clrlwi offset,offset,16bdnz 1bsth r0,AX(state)b _finish_strwlodsl_a16: bl _setup_stringwslwi r7,r7,21: lwzx r0,STRINGSRC # [rep] lodsladd offset,offset,r7clrlwi offset,offset,16bdnz 1bstw r0,EAX(state)b _finish_strwstosb_a16: bl _setup_stringwlbz r0,AL(state)1: stbx r0,STRINGDST # [rep] stosbadd opreg,opreg,r7clrlwi opreg,opreg,16bdnz 1bb _finish_strwstosw_a16: bl _setup_stringwlhz r0,AX(state)slwi r7,r7,11: sthx r0,STRINGDST # [rep] stoswadd opreg,opreg,r7clrlwi opreg,opreg,16bdnz 1bb _finish_strwstosl_a16: bl _setup_stringwlwz r0,EAX(state)slwi r7,r7,21: stwx r0,STRINGDST # [rep] stosladd opreg,opreg,r7clrlwi opreg,opreg,16bdnz 1bb _finish_strwmovsb_a16: bl _setup_stringw1: lbzx r0,STRINGSRC # [rep] movsbadd offset,offset,r7stbx r0,STRINGDSTclrlwi offset,offset,16add opreg,opreg,r7clrlwi opreg,opreg,16bdnz 1bb _finish_strwmovsw_a16: bl _setup_stringwslwi r7,r7,11: lhzx r0,STRINGSRC # [rep] movswadd offset,offset,r7sthx r0,STRINGDSTclrlwi offset,offset,16add opreg,opreg,r7clrlwi opreg,opreg,16bdnz 1bb _finish_strwmovsl_a16: bl _setup_stringwslwi r7,r7,21: lwzx r0,STRINGSRC # [rep] movsladd offset,offset,r7stwx r0,STRINGDSTclrlwi offset,offset,16add opreg,opreg,r7clrlwi opreg,opreg,16bdnz 1bb _finish_strw/* At least on a Pentium, repeated string I/O instructions check foraccess port permission even if count is 0 ! So the order of the check is notimportant. */insb_a16: li r4,DXli r3,code_insb_a16lhbrx r4,state,r4bl _check_portbl _setup_stringwlwz base,iobase(state)1: lbzx r0,base,r4 # [rep] insbeieiostbx r0,STRINGDSTadd opreg,opreg,r7clrlwi opreg,opreg,16bdnz 1bb _finish_strwinsw_a16: li r4,DXli r3,code_insw_a16lhbrx r4,state,r4bl _check_portbl _setup_stringwlwz base,iobase(state)slwi r7,r7,11: lhzx r0,base,r4 # [rep] insweieiosthx r0,STRINGDSTadd opreg,opreg,r7clrlwi opreg,opreg,16bdnz 1bb _finish_strwinsl_a16: li r4,DXli r3,code_insl_a16lhbrx r4,state,r4bl _check_portbl _setup_stringwlwz base,iobase(state)slwi r7,r7,21: lwzx r0,base,r4 # [rep] insleieiostwx r0,STRINGDSTadd opreg,opreg,r7clrlwi opreg,opreg,16bdnz 1bb _finish_strwoutsb_a16: li r4,DXli r3,code_outsb_a16lhbrx r4,state,r4bl _check_portbl _setup_stringwlwz r6,iobase(state)1: lbzx r0,STRINGSRC # [rep] outsbadd offset,offset,r7stbx r0,r6,r4clrlwi offset,offset,16eieiobdnz 1bb _finish_strwoutsw_a16: li r4,DXli r3,code_outsw_a16lhbrx r4,state,r4bl _check_portbl _setup_stringwli r5,DXlwz r6,iobase(state)slwi r7,r7,11: lhzx r0,STRINGSRC # [rep] outswadd offset,offset,r7sthx r0,r6,r4clrlwi offset,offset,16eieiobdnz 1bb _finish_strwoutsl_a16: li r4,DXli r3,code_outsl_a16lhbrx r4,state,r4bl _check_portbl _setup_stringwlwz r6,iobase(state)slwi r7,r7,21: lwzx r0,STRINGSRC # [rep] outsladd offset,offset,r7stwx r0,r6,r4clrlwi offset,offset,16eieiobdnz 1bb _finish_strwcmpsb_a16: bl _setup_stringwSET_FLAGS(FLAGS_CMP(B))blt 3f # repnz prefix1: lbzx op1,STRINGSRC # [repz] cmpsbadd offset,offset,r7lbzx op2,STRINGDSTadd opreg,opreg,r7cmplw cr4,op1,op2clrlwi offset,offset,16clrlwi opreg,opreg,16bdnzt CF+2,1b2: extsb r3,op1extsb r4,op2cmpw cr6,r3,r4sub result,op1,op2b _finish_strw3: lbzx op1,STRINGSRC # repnz cmpsbadd offset,offset,r7lbzx op2,STRINGDSTadd opreg,opreg,r7cmplw cr4,op1,op2clrlwi offset,offset,16clrlwi opreg,opreg,16bdnzf CF+2,3bb 2bcmpsw_a16: bl _setup_stringwSET_FLAGS(FLAGS_CMP(W))slwi r7,r7,1blt 3f # repnz prefix1: lhbrx op1,STRINGSRC # [repz] cmpsbadd offset,offset,r7lhbrx op2,STRINGDSTadd opreg,opreg,r7cmplw cr4,op1,op2clrlwi offset,offset,16clrlwi opreg,opreg,16bdnzt CF+2,1b2: extsh r3,op1extsh r4,op2cmpw cr6,r3,r4sub result,op1,op2b _finish_strw3: lhbrx op1,STRINGSRC # repnz cmpswadd offset,offset,r7lhbrx op2,STRINGDSTadd opreg,opreg,r7cmplw cr4,op1,op2clrlwi offset,offset,16clrlwi opreg,opreg,16bdnzf CF+2,3bb 2bcmpsl_a16: bl _setup_stringwSET_FLAGS(FLAGS_CMP(L))slwi r7,r7,2blt 3f # repnz prefix1: lwbrx op1,STRINGSRC # [repz] cmpsladd offset,offset,r7lwbrx op2,STRINGDSTadd opreg,opreg,r7cmplw cr4,op1,op2clrlwi offset,offset,16clrlwi opreg,opreg,16bdnzt CF+2,1b2: cmpw cr6,op1,op2sub result,op1,op2b _finish_strw3: lwbrx op1,STRINGSRC # repnz cmpsladd offset,offset,r7lwbrx op2,STRINGDSTadd opreg,opreg,r7cmplw cr4,op1,op2clrlwi offset,offset,16clrlwi opreg,opreg,16bdnzf CF+2,3bb 2bscasb_a16: bl _setup_stringwlbzx op1,AL,state # ALSET_FLAGS(FLAGS_CMP(B))bgt 3f # repz prefix1: lbzx op2,STRINGDST # [repnz] scasbadd opreg,opreg,r7cmplw cr4,op1,op2clrlwi opreg,opreg,16bdnzf CF+2,1b2: extsb r3,op1extsb r4,op2cmpw cr6,r3,r4sub result,op1,op2b _finish_strw3: lbzx op2,STRINGDST # repz scasbadd opreg,opreg,r7cmplw cr4,op1,op2clrlwi opreg,opreg,16bdnzt CF+2,3bb 2bscasw_a16: bl _setup_stringwlhbrx op1,AX,stateSET_FLAGS(FLAGS_CMP(W))slwi r7,r7,1bgt 3f # repz prefix1: lhbrx op2,STRINGDST # [repnz] scaswadd opreg,opreg,r7cmplw cr4,op1,op2clrlwi opreg,opreg,16bdnzf CF+2,1b2: extsh r3,op1extsh r4,op2cmpw cr6,r3,r4sub result,op1,op2b _finish_strw3: lhbrx op2,STRINGDST # repz scaswadd opreg,opreg,r7cmplw cr4,op1,op2clrlwi opreg,opreg,16bdnzt CF+2,3bb 2bscasl_a16: bl _setup_stringwlwbrx op1,EAX,stateSET_FLAGS(FLAGS_CMP(L))slwi r7,r7,2bgt 3f # repz prefix1: lwbrx op2,STRINGDST # [repnz] scasladd opreg,opreg,r7cmplw cr4,op1,op2clrlwi opreg,opreg,16bdnzf CF+2,1b2: cmpw cr6,op1,op2sub result,op1,op2b _finish_strw3: lwbrx op2,STRINGDST # repz scasladd opreg,opreg,r7cmplw cr4,op1,op2clrlwi opreg,opreg,16bdnzt CF+2,3bb 2b.equ lodsb_a32, unimpl.equ lodsw_a32, unimpl.equ lodsl_a32, unimpl.equ stosb_a32, unimpl.equ stosw_a32, unimpl.equ stosl_a32, unimpl.equ movsb_a32, unimpl.equ movsw_a32, unimpl.equ movsl_a32, unimpl.equ insb_a32, unimpl.equ insw_a32, unimpl.equ insl_a32, unimpl.equ outsb_a32, unimpl.equ outsw_a32, unimpl.equ outsl_a32, unimpl.equ cmpsb_a32, unimpl.equ cmpsw_a32, unimpl.equ cmpsl_a32, unimpl.equ scasb_a32, unimpl.equ scasw_a32, unimpl.equ scasl_a32, unimplxlatb_a16: li offset,BXlbz r3,AL(state)lhbrx offset,offset,stateadd r3,r3,baselbzx r3,r3,offsetstb r3,AL(state)NEXT.equ xlatb_a32, unimpl/** Shift and rotates: note the oddity that rotates do not affect SF/ZF/AF/PF* but shifts do. Also testing has indicated that rotates with a count of zero* do not affect any flag. The documentation specifies this for shifts but* is more obscure for rotates. The overflow flag setting is only specified* when count is 1, otherwise OF is undefined which simplifies emulation.*//** The rotates through carry are among the most difficult instructions,* they are implemented as a shift of 2*n+some bits depending on case.* First the left rotates through carry.*//* Byte rcl is performed on 18 bits (17 actually used) in a single register */rclb_imm: NEXTBYTE(r3)b 1frclb_cl: lbz r3,CL(state)b 1frclb_1: li r3,11: lbzx r0,MEMandi. r3,r3,31 # count%32addc r4,flags,flags # CF_IN->xer[ca]RES2CF(r6)subfe r4,result,op1mulli r5,r3,29 # 29=ceil(256/9)CF_ROTCNT(r7)addze r6,r6CF_POL_INSERT(r0,23)srwi r5,r5,8 # count/9rlwnm r6,r6,r7,0x100xor r0,r0,r6 # (23)0:CF:data8rlwimi r5,r5,3,26,28 # 9*(count/9)rlwimi r0,r0,23,0,7 # CF:(data8):(14)0:CF:data8sub r3,r3,r5 # count%9beq- nop # no flags changed if count 0ROTATE_FLAGSrlwnm r0,r0,r3,0x000001ff # (23)0:NewCF:Result8rlwimi flags,r0,19,CF_VALUEstbx r0,MEMrlwimi flags,r0,18,OF_XORNEXT/* Word rcl is performed on 33 bits (CF:data16:CF:(15 MSB of data16) */rclw_imm: NEXTBYTE(r3)b 1frclw_cl: lbz r3,CL(state)b 1frclw_1: li r3,11: lhbrx r0,MEMandi. r3,r3,31 # count=count%32addc r4,flags,flagsRES2CF(r6)subfe r4,result,op1addi r5,r3,15 # modulo 17: >=32 if >=17CF_ROTCNT(r7)addze r6,r6addi r7,r7,8CF_POL_INSERT(r0,15)srwi r5,r5,5 # count/17rlwnm r6,r6,r7,0x10000rlwimi r5,r5,4,27,27 # 17*(count/17)xor r0,r0,r6 # (15)0:CF:data16sub r3,r3,r5 # count%17rlwinm r4,r0,15,0xffff0000 # CF:(15 MSB of data16):(16)0slw r0,r0,r3 # New carry and MSBsrlwnm r4,r4,r3,16,31 # New LSBsbeq- nop # no flags changed if count 0ROTATE_FLAGSadd r0,r0,r4 # resultrlwimi flags,r0,11,CF_VALUEsthbrx r0,MEMrlwimi flags,r0,10,OF_XORNEXT/* Longword rcl only needs 64 bits because the maximum rotate count is 31 ! */rcll_imm: NEXTBYTE(r3)b 1frcll_cl: lbz r3,CL(state)b 1frcll_1: li r3,11: lwbrx r0,MEMandi. r3,r3,31 # count=count%32addc r4,r4,flags # ~XER[CA]RES2CF(r6)subfe r4,result,op1CF_ROTCNT(r7)addze r6,r6srwi r4,r0,1 # 0:(31 MSB of data32)addi r7,r7,23CF_POL_INSERT(r4,0)rlwnm r6,r6,r7,0,0beq- nop # no flags changed if count 0subfic r5,r3,32xor r4,r4,r6ROTATE_FLAGSslw r0,r0,r3 # New MSBssrw r5,r4,r5 # New LSBsrlwnm r4,r4,r3,0,0 # New Carryadd r0,r0,r5 # resultrlwimi flags,r4,28,CF_VALUErlwimi flags,r0,27,OF_XORstwbrx r0,MEMNEXT/* right rotates through carry are even worse because PPC only has a leftrotate instruction. Somewhat tough when combined with modulo 9, 17, or33 operation and the rules of OF and CF flag settings. *//* Byte rcr is performed on 17 bits */rcrb_imm: NEXTBYTE(r3)b 1frcrb_cl: lbz r3,CL(state)b 1frcrb_1: li r3,11: lbzx r0,MEMandi. r3,r3,31 # count%32addc r4,flags,flags # cf_in->xer[ca]RES2CF(r6)mulli r5,r3,29 # 29=ceil(256/9)subfe r4,result,op1CF_ROTCNT(r7)addze r6,r6CF_POL_INSERT(r0,23)srwi r5,r5,8 # count/9rlwimi r0,r0,9,0x0001fe00 # (15)0:data8:0:data8rlwnm r6,r6,r7,0x100rlwimi r5,r5,3,26,28 # 9*(count/9)xor r0,r0,r6 # (15)0:data8:CF:data8sub r3,r3,r5 # count%9beq- nop # no flags changed if count 0ROTATE_FLAGSsrw r0,r0,r3 # (23)junk:NewCF:Result8rlwimi flags,r0,19,CF_VALUE|OF_XORstbx r0,MEMNEXT/* Word rcr is a 33 bit right shift with a quirk, because the 33rd bitis only needed when the rotate count is 16 and rotating left or rightby 16 a 32 bit quantity is the same ! */rcrw_imm: NEXTBYTE(r3)b 1frcrw_cl: lbz r3,CL(state)b 1frcrw_1: li r3,11: lhbrx r0,MEMandi. r3,r3,31 # count%32addc r4,flags,flags # cf_in->xer[ca]RES2CF(r6)subfe r4,result,op1addi r5,r3,15 # >=32 if >=17CF_ROTCNT(r7)addze r6,r6addi r7,r7,8CF_POL_INSERT(r0,15)srwi r5,r5,5 # count/17rlwnm r6,r6,r7,0x10000rlwinm r7,r0,16,0x01 # MSB of data16rlwimi r0,r0,17,0xfffe0000 # (15 MSB of data16):0:data16rlwimi r5,r5,4,27,27 # 17*(count/17)xor r0,r0,r6 # (15 MSB of data16):CF:data16sub r3,r3,r5 # count%17beq- nop # no flags changed if count 0srw r0,r0,r3 # shift rightrlwnm r7,r7,r3,0x10000 # just in case count=16ROTATE_FLAGSadd r0,r0,r7 # junk15:NewCF:result16rlwimi flags,r0,11,CF_VALUE|OF_XORsthbrx r0,MEMNEXT/* Longword rcr need only 64 bits since the rotate count is limited to 31 */rcrl_imm: NEXTBYTE(r3)b 1frcrl_cl: lbz r3,CL(state)b 1frcrl_1: li r3,11: lwbrx r0,MEMandi. r3,r3,31 # count%32addc r4,flags,flagsRES2CF(r6)subfe r4,result,op1CF_ROTCNT(r7)slwi r4,r0,1 # (31MSB of data32):0addze r6,r6addi r7,r7,24CF_POL_INSERT(r4,31)rlwnm r6,r6,r7,0x01beq- nop # no flags changed if count 0subfic r7,r3,32xor r4,r4,r6srw r0,r0,r3 # Result LSBslw r5,r4,r7 # Result MSBsrw r4,r4,r3 # NewCF in LSBadd r0,r0,r5 # resultrlwimi flags,r4,27,CF_VALUEstwbrx r0,MEMrlwimi flags,r0,27,OF_XORNEXT/* After the rotates through carry, normal rotates are so simple ! */rolb_imm: NEXTBYTE(r3)b 1frolb_cl: lbz r3,CL(state)b 1frolb_1: li r3,11: lbzx r0,MEMandi. r4,r3,31 # count%32 == 0 ?clrlwi r3,r3,29 # count%8rlwimi r0,r0,24,0xff000000 # replicate for shift inbeq- nop # no flags changed if count 0ROTATE_FLAGSrotlw r0,r0,r3rlwimi flags,r0,27,CF_VALUE # New CFstbx r0,MEMrlwimi flags,r0,26,OF_XOR # New OF (CF xor MSB)NEXTrolw_imm: NEXTBYTE(r3)b 1frolw_cl: lbz r3,CL(state)b 1frolw_1: li r3,11: lhbrx r0,MEMandi. r3,r3,31rlwimi r0,r0,16,0,15 # duplicatebeq- nop # no flags changed if count 0ROTATE_FLAGSrotlw r0,r0,r3 # result word duplicatedrlwimi flags,r0,27,CF_VALUE # New CFsthbrx r0,MEMrlwimi flags,r0,26,OF_XOR # New OF (CF xor MSB)NEXTroll_imm: NEXTBYTE(r3)b 1froll_cl: lbz r3,CL(state)b 1froll_1: li r3,11: lwbrx r0,MEMandi. r3,r3,31beq- nop # no flags changed if count 0ROTATE_FLAGSrotlw r0,r0,r3 # resultrlwimi flags,r0,27,CF_VALUE # New CFstwbrx r0,MEMrlwimi flags,r0,26,OF_XOR # New OF (CF xor MSB)NEXTrorb_imm: NEXTBYTE(r3)b 1frorb_cl: lbz r3,CL(state)b 1frorb_1: li r3,11: lbzx r0,MEMandi. r4,r3,31 # count%32 == 0 ?clrlwi r3,r3,29 # count%8rlwimi r0,r0,8,0x0000ff00 # replicate for shift inbeq- nop # no flags changed if count 0ROTATE_FLAGSsrw r0,r0,r3rlwimi flags,r0,20,CF_VALUEstbx r0,MEMrlwimi flags,r0,19,OF_XORNEXTrorw_imm: NEXTBYTE(r3)b 1frorw_cl: lbz r3,CL(state)b 1frorw_1: li r3,11: lhbrx r0,MEMandi. r4,r3,31clrlwi r3,r3,28 # count %16rlwimi r0,r0,16,0xffff0000 # duplicatebeq- nop # no flags changed if count 0ROTATE_FLAGSsrw r0,r0,r3 # junk16:result16rlwimi flags,r0,12,CF_VALUEsthbrx r0,MEMrlwimi flags,r0,11,OF_XORNEXTrorl_imm: NEXTBYTE(r3)b 1frorl_cl: lbz r3,CL(state)b 1frorl_1: li r3,11: lwbrx r0,MEMandi. r4,r3,31neg r3,r3beq- nop # no flags changed if count 0ROTATE_FLAGSrotlw r0,r0,r3 # resultrlwimi flags,r0,28,CF_VALUEstwbrx r0,MEMrlwimi flags,r0,27,OF_XORNEXT/* Right arithmetic shifts: they clear OF whenever count!=0 */#define SAR_FLAGS CF_ZERO|OF_ZERO|RESLsarb_imm: NEXTBYTE(r3)b 1fsarb_cl: lbz r3,CL(state)b 1fsarb_1: li r3,11: lbzx r4,MEMandi. r3,r3,31addi r5,r3,-1extsb r4,r4beq- nop # no flags changed if count 0SET_FLAGS(SAR_FLAGS)sraw result,r4,r3srw r5,r4,r5stbx result,MEMrlwimi flags,r5,27,CF_VALUENEXTsarw_imm: NEXTBYTE(r3)b 1fsarw_cl: lbz r3,CL(state)b 1fsarw_1: li r3,11: lhbrx r4,MEMandi. r3,r3,31addi r5,r3,-1extsh r4,r4beq- nop # no flags changed if count 0SET_FLAGS(SAR_FLAGS)sraw result,r4,r3srw r5,r4,r5sthbrx result,MEMrlwimi flags,r5,27,CF_VALUENEXTsarl_imm: NEXTBYTE(r3)b 1fsarl_cl: lbz r3,CL(state)b 1fsarl_1: li r3,11: lwbrx r4,MEMandi. r3,r3,31addi r5,r3,-1beq- nop # no flags changed if count 0SET_FLAGS(SAR_FLAGS)sraw result,r4,r3srw r5,r4,r5stwbrx result,MEMrlwimi flags,r5,27,CF_VALUENEXT/* Left shifts are quite easy: they use the flag mechanism of add */shlb_imm: NEXTBYTE(r3)b 1fshlb_cl: lbz r3,CL(state)b 1fshlb_1: li r3,11: andi. r3,r3,31beq- nop # no flags changed if count 0lbzx op1,MEMSET_FLAGS(FLAGS_ADD(B))slw result,op1,r3addi op2,op1,0 # for OF computation only !stbx result,MEMNEXTshlw_imm: NEXTBYTE(r3)b 1fshlw_cl: lbz r3,CL(state)b 1fshlw_1: li r3,11: andi. r3,r3,31beq- nop # no flags changed if count 0lhbrx op1,MEMSET_FLAGS(FLAGS_ADD(W))slw result,op1,r3addi op2,op1,0 # for OF computation only !sthbrx result,MEMNEXT/* That one may be wrong */shll_imm: NEXTBYTE(r3)b 1fshll_cl: lbz r3,CL(state)b 1fshll_1: li r3,11: andi. r3,r3,31beq- nop # no flags changed if count 0lwbrx op1,MEMaddi r4,r3,-1SET_FLAGS(FLAGS_ADD(L))slw result,op1,r3addi op2,op1,0 # for OF computation only !slw op1,op1,r4 # for CF computationstwbrx result,MEMNEXT/* Right shifts are quite complex, because of funny flag rules ! */shrb_imm: NEXTBYTE(r3)b 1fshrb_cl: lbz r3,CL(state)b 1fshrb_1: li r3,11: andi. r3,r3,31beq- nop # no flags changed if count 0lbzx op1,MEMaddi r4,r3,-1SET_FLAGS(FLAGS_SHR(B))srw result,op1,r3srw r4,op1,r4li op2,-1 # for OF computation only !stbx result,MEMrlwimi flags,r4,27,CF_VALUE # Set CFNEXTshrw_imm: NEXTBYTE(r3)b 1fshrw_cl: lbz r3,CL(state)b 1fshrw_1: li r3,11: andi. r3,r3,31beq- nop # no flags changed if count 0lhbrx op1,MEMaddi r4,r3,-1SET_FLAGS(FLAGS_SHR(W))srw result,op1,r3srw r4,op1,r4li op2,-1 # for OF computation only !sthbrx result,MEMrlwimi flags,r4,27,CF_VALUE # Set CFNEXTshrl_imm: NEXTBYTE(r3)b 1fshrl_cl: lbz r3,CL(state)b 1fshrl_1: li r3,11: andi. r3,r3,31beq- nop # no flags changed if count 0lwbrx op1,MEMaddi r4,r3,-1SET_FLAGS(FLAGS_SHR(L))srw result,op1,r3srw r4,op1,r4li op2,-1 # for OF computation only !stwbrx result,MEMrlwimi flags,r4,27,CF_VALUE # Set CFNEXT/* Double length shifts, shldw uses FLAGS_ADD for simplicity */shldw_imm: NEXTBYTE(r3)b 1fshldw_cl: lbz r3,CL(state)1: andi. r3,r3,31beq- noplhbrx op1,MEMSET_FLAGS(FLAGS_ADD(W))lhbrx op2,REGrlwimi op1,op2,16,0,15 # op2:op1addi op2,op1,0rotlw result,op1,r3sthbrx result,MEMNEXTshldl_imm: NEXTBYTE(r3)b 1fshldl_cl: lbz r3,CL(state)1: andi. r3,r3,31beq- noplwbrx op1,MEMSET_FLAGS(FLAGS_DBLSH(L))lwbrx op2,REGsubfic r4,r3,32slw result,op1,r3srw r4,op2,r4rotlw r3,op1,r3or result,result,r4addi op2,op1,0rlwimi flags,r3,27,CF_VALUEstwbrx result,MEMNEXTshrdw_imm: NEXTBYTE(r3)b 1fshrdw_cl: lbz r3,CL(state)1: andi. r3,r3,31beq- noplhbrx op1,MEMSET_FLAGS(FLAGS_DBLSH(W))lhbrx op2,REGaddi r4,r3,-1rlwimi op1,op2,16,0,15 # op2:op1addi op2,op1,0srw result,op1,r3srw r4,op1,r4sthbrx result,MEMrlwimi flags,r4,27,CF_VALUENEXTshrdl_imm: NEXTBYTE(r3)b 1fshrdl_cl: lbz r3,CL(state)1: andi. r3,r3,31beq- noplwbrx op1,MEMSET_FLAGS(FLAGS_DBLSH(L))lwbrx op2,REGsubfic r4,r3,32srw result,op1,r3addi r3,r3,-1slw r4,op2,r4srw r3,op1,r3or result,result,r4addi op2,op1,0rlwimi flags,r3,27,CF_VALUEstwbrx result,MEMNEXT/* One operand multiplies: with result double the operand size, unsigned */mulb: lbzx op2,MEMlbz op1,AL(state)mullw result,op1,op2SET_FLAGS(FLAGS_MUL)subfic r3,result,255sthbrx result,AX,staterlwimi flags,r3,0,CF_VALUE|OF_VALUENEXTmulw: lhbrx op2,MEMlhbrx op1,AX,statemullw result,op1,op2SET_FLAGS(FLAGS_MUL)li r4,DXsrwi r3,result,16sthbrx result,AX,stateneg r5,r3sthbrx r3,r4,state # DXrlwimi flags,r5,0,CF_VALUE|OF_VALUENEXTmull: lwbrx op2,MEMlwbrx op1,EAX,statemullw result,op1,op2mulhwu. r3,op1,op2SET_FLAGS(FLAGS_MUL)stwbrx result,EAX,stateli r4,EDXstwbrx r3,r4,statebeq+ noporis flags,flags,(CF_SET|OF_SET)>>16NEXT/* One operand multiplies: with result double the operand size, signed */imulb: lbzx op2,MEMextsb op2,op2lbz op1,AL(state)extsb op1,op1mullw result,op1,op2SET_FLAGS(FLAGS_MUL)extsb r3,resultsthbrx result,AX,statecmpw r3,resultbeq+ noporis flags,flags,(CF_SET|OF_SET)>>16NEXTimulw: lhbrx op2,MEMextsh op2,op2lhbrx op1,AX,stateextsh op1,op1mullw result,op1,op2SET_FLAGS(FLAGS_MUL)li r3,DXextsh r4,resultsrwi r5,result,16sthbrx result,AX,statecmpw r4,resultsthbrx r5,r3,statebeq+ noporis flags,flags,(CF_SET|OF_SET)>>16NEXTimull: lwbrx op2,MEMSET_FLAGS(FLAGS_MUL)lwbrx op1,EAX,stateli r3,EDXmulhw r4,op1,op2mullw result,op1,op2stwbrx r4,r3,statesrawi r3,result,31cmpw r3,r4beq+ noporis flags,flags,(CF_SET|OF_SET)>>16NEXT/* Other multiplies */imulw_mem_reg: lhbrx op2,REGextsh op2,op2b 1fimulw_imm: NEXTWORD(op2)extsh op2,op2b 1fimulw_imm8: NEXTBYTE(op2)extsb op2,op21: lhbrx op1,MEMextsh op1,op1mullw result,op1,op2SET_FLAGS(FLAGS_MUL)extsh r3,resultsthbrx result,REGcmpw r3,resultbeq+ noporis flags,flags,(CF_SET|OF_SET)>>16NEXT # SF/ZF/AF/PF undefined !imull_mem_reg: lwbrx op2,REGb 1fimull_imm: NEXTDWORD(op2)b 1fimull_imm8: NEXTBYTE(op2)extsb op2,op21: lwbrx op1,MEMmullw result,op1,op2SET_FLAGS(FLAGS_MUL)mulhw r3,op1,op2srawi r4,result,31stwbrx result,REGcmpw r3,r4beq+ noporis flags,flags,(CF_SET|OF_SET)>>16NEXT # SF/ZF/AF/PF undefined !/* aad is indeed a multiply */aad: NEXTBYTE(r3)lbz op1,AH(state)lbz op2,AL(state)mullw result,op1,r3 # AH*immSET_FLAGS(FLAGS_LOG(B)) # SF/ZF/PF from resultadd result,result,op2 # AH*imm+ALslwi r3,result,8sth r3,AX(state) # AH=0NEXT # OF/AF/CF undefined/* Unsigned divides: we may destroy all flags */divb: lhbrx r4,AX,statelbzx r3,MEMsrwi r5,r4,8cmplw r5,r3bnl- _divide_errordivwu r5,r4,r3mullw r3,r5,r3sub r3,r4,r3stb r5,AL(state)stb r3,AH(state)NEXTdivw: li opreg,DXlhbrx r4,AX,statelhbrx r5,REGlhbrx r3,MEMinsrwi r4,r5,16,0cmplw r5,r3bnl- _divide_errordivwu r5,r4,r3mullw r3,r5,r3sub r3,r4,r3sthbrx r5,AX,statesthbrx r3,REGNEXTdivl: li opreg,EDX # Not yet fully implementedlwbrx r3,MEMlwbrx r4,REGlwbrx r5,EAX,statecmplw r4,r3bnl- _divide_errorcmplwi r4,0bne- 1fdivwu r4,r5,r3mullw r3,r4,r3stwbrx r4,EAX,statesub r3,r5,r3stwbrx r3,REGNEXT/* full implementation of 64:32 unsigned divide, slow but rarely used */1: bl _div_64_32stwbrx r5,EAX,statestwbrx r4,REGNEXT/** Divide r4:r5 by r3, quotient in r5, remainder in r4.* The algorithm is stupid because it won't be used very often.*/_div_64_32: li r7,32mtctr r71: cmpwi r4,0 # always subtract in caseaddc r5,r5,r5 # MSB is setadde r4,r4,r4blt 2fcmplw r4,r3blt 3f2: sub r4,r4,r3addi r5,r5,13: bdnz 1b/* Signed divides: we may destroy all flags */idivb: lbzx r3,MEMlhbrx r4,AX,statecmpwi r3,0beq- _divide_errordivw r5,r4,r3extsb r7,r5mullw r3,r5,r3cmpw r5,r7sub r3,r4,r3bne- _divide_errorstb r5,AL(state)stb r3,AH(state)NEXTidivw: li opreg,DXlhbrx r4,AX,statelhbrx r5,REGlhbrx r3,MEMinsrwi r4,r5,16,0cmpwi r3,0beq- _divide_errordivw r5,r4,r3extsh r7,r5mullw r3,r5,r3cmpw r5,r7sub r3,r4,r3bne- _divide_errorsthbrx r5,AX,statesthbrx r3,REGNEXTidivl: li opreg,EDX # Not yet fully implementedlwbrx r3,MEMlwbrx r5,EAX,statecmpwi cr1,r3,0lwbrx r4,REGsrwi r7,r5,31beq- _divide_erroradd. r7,r7,r4bne- 1f # EDX not sign extension of EAXdivw r4,r5,r3xoris r7,r5,0x8000 # only overflow case isorc. r7,r7,r3 # 0x80000000 divided by -1mullw r3,r4,r3beq- _divide_errorstwbrx r4,EAX,statesub r3,r5,r3stwbrx r3,REGNEXT/* full 64 by 32 signed divide, checks for overflow might be right now */1: srawi r6,r4,31 # absolute value of r4:r5srawi r0,r3,31 # absolute value of r3xor r5,r5,r6xor r3,r3,r0subfc r5,r6,r5xor r4,r4,r6sub r3,r3,r0subfe r4,r6,r4xor r0,r0,r6 # sign of resultcmplw r4,r3 # coarse overflow detectionbnl- _divide_error # (probably not necessary)bl _div_64_32xor r5,r5,r0 # apply sign to resultsub r5,r5,r0xor. r7,r0,r5 # wrong sign: overflowxor r4,r4,r6 # apply sign to remainderblt- _divide_errorstwbrx r5,EAX,statesub r4,r4,r6stwbrx r4,REGNEXT/* aam is indeed a divide */aam: NEXTBYTE(r3)lbz r4,AL(state)cmpwi r3,0beq- _divide_error # zero dividedivwu op2,r4,r3 # AL/imm8SET_FLAGS(FLAGS_LOG(B)) # SF/ZF/PF from ALmullw r3,op2,r3 # (AL/imm8)*imm8stb op2,AH(state)sub result,r4,r3 # AL-imm8*(AL/imm8)stb result,AL(state)NEXT # OF/AF/CF undefined_divide_error: li r3,code_divide_errb complex/* Instructions dealing with segment registers */pushw_sp_sr: li r3,SPrlwinm opreg,opcode,31,27,29addi r5,state,SELECTORS+2lhbrx r4,state,r3lhzx r0,r5,opregaddi r4,r4,-2sthbrx r4,state,r3clrlwi r4,r4,16sthbrx r0,r4,ssbNEXTpushl_sp_sr: li r3,SPrlwinm opreg,opcode,31,27,29addi r5,state,SELECTORS+2lhbrx r4,state,r3lhzx r0,r5,opregaddi r4,r4,-4sthbrx r4,state,r3clrlwi r4,r4,16stwbrx r0,r4,ssbNEXTmovl_sr_mem: cmpwi opreg,20addi opreg,opreg,SELECTORS+2cmpw cr1,base,state # Only registers are sensitivebgt- ud # to word/longword differencelhzx r0,REGbne cr1,1fstwbrx r0,MEM # Actually a registerNEXTmovw_sr_mem: cmpwi opreg,20 # SREG 0 to 5 onlyaddi opreg,opreg,SELECTORS+2bgt- udlhzx r0,REG1: sthbrx r0,MEMNEXT/* Now the instructions that modify the segment registers, note thatmove/pop to ss disable interrupts and traps for one instruction ! */popl_sp_sr: li r6,4b 1fpopw_sp_sr: li r6,21: li r7,SPrlwinm opreg,opcode,31,27,29lhbrx offset,state,r7addi opreg,opreg,SELBASESlhbrx r4,ssb,offset # new selectoradd offset,offset,r6bl _segment_loadsthbrx offset,state,r7 # update spcmpwi opreg,8 # is ss ?stwux r3,REGstw r4,SELECTORS-SELBASES(opreg)lwz esb,esbase(state)bne+ noplwz ssb,ssbase(state) # pop sscrmove RF,TF # prevent trapsNEXTmovw_mem_sr: cmpwi opreg,20addi r7,state,SELBASESbgt- udcmpwi opreg,4 # CS illegalbeq- udlhbrx r4,MEMbl _segment_loadstwux r3,r7,opregcmpwi opreg,8stw r4,SELECTORS-SELBASES(r7)lwz esb,esbase(state)bne+ noplwz ssb,ssbase(state)crmove RF,TF # prevent trapsNEXT.equ movl_mem_sr, movw_mem_sr/* The encoding of les/lss/lds/lfs/lgs is strange, opcode is c4/b2/c5/b4/b5for es/ss/ds/fs/gs which are sreg 0/2/3/4/5. And obviously there isno lcs instruction, it's called a far jump. */ldlptrl: lwzux r7,MEMli r4,4bl 1fstwx r7,REGNEXTldlptrw: lhzux r7,MEMli r4,2bl 1fsthx r7,REGNEXT1: cmpw base,statelis r3,0xc011 # es/ss/ds/fs/gsrlwinm r5,opcode,2,0x0c # 00/08/04/00/04mflr r0addi r3,r3,0x4800 # r4=0xc0114800rlwimi r5,opcode,0,0x10 # 00/18/04/10/14lhbrx r4,r4,offsetrlwnm opcode,r3,r5,0x1c # 00/08/0c/10/14 = sreg*4 !beq- ud # Only mem operands allowed !bl _segment_loadaddi r5,opcode,SELBASESstwux r3,r5,statemtlr r0stw r4,SELECTORS-SELBASES(r5)lwz esb,esbase(state) # keep shadow state in synclwz ssb,ssbase(state)blr/* Intructions that may modify the current code segment: the next optimization* might be to avoid calling C code when the code segment does not change. But* it's probably not worth the effort.*//* Far calls, jumps and returns */lcall_w: NEXTWORD(r4)NEXTWORD(r5)li r3,code_lcallwb complexlcall_l: NEXTDWORD(r4)NEXTWORD(r5)li r3,code_lcalllb complexlcallw: lhbrx r4,MEMaddi offset,offset,2lhbrx r5,MEMli r3,code_lcallwb complexlcalll: lwbrx r4,MEMaddi offset,offset,4lhbrx r5,MEMli r3,code_lcalllb complexljmp_w: NEXTWORD(r4)NEXTWORD(r5)li r3,code_ljmpwb complexljmp_l: NEXTDWORD(r4)NEXTWORD(r5)li r3,code_ljmplb complexljmpw: lhbrx r4,MEMaddi offset,offset,2lhbrx r5,MEMli r3,code_ljmpwb complexljmpl: lwbrx r4,MEMaddi offset,offset,4lhbrx r5,MEMli r3,code_ljmplb complexlretw_imm: NEXTWORD(r4)b 1flretw: li r4,01: li r3,code_lretwb complexlretl_imm: NEXTWORD(r4)b 1flretl: li r4,01: li r3,code_lretlb complex/* Interrupts */int: li r3,code_softint # handled by C codeNEXTBYTE(r4)b complexint3: li r3,code_int3 # handled by C codeb complexinto: EVAL_OFbf+ OF,nopli r3,code_intob complex # handled by C codeiretw: li r3,code_iretw # handled by C codeb complexiretl: li r3,code_iretlb complex/* Miscellaneous flag control instructions */clc: oris flags,flags,(CF_IN_CR|CF_STATE_MASK|ABOVE_IN_CR)>>16xoris flags,flags,(CF_IN_CR|CF_STATE_MASK|ABOVE_IN_CR)>>16NEXTcmc: oris flags,flags,(CF_IN_CR|ABOVE_IN_CR)>>16xoris flags,flags,(CF_IN_CR|CF_COMPLEMENT|ABOVE_IN_CR)>>16NEXTstc: oris flags,flags,\(CF_IN_CR|CF_LOCATION|CF_COMPLEMENT|ABOVE_IN_CR)>>16xoris flags,flags,(CF_IN_CR|CF_LOCATION|ABOVE_IN_CR)>>16NEXTcld: crclr DFNEXTstd: crset DFNEXTcli: crclr IFNEXTsti: crset IFNEXTlahf: bl _eval_flagsstb r3,AH(state)NEXTsahf: andis. r3,flags,OF_EXPLICIT>>16lbz r0,AH(state)beql+ _eval_of # save OF just in caserlwinm op1,r0,31,0x08 # AFrlwinm flags,flags,0,OF_STATE_MASKextsb result,r0 # SF/PFZF862ZF(r0)oris flags,flags,(ZF_PROTECT|ZF_IN_CR|SF_IN_CR)>>16addi op2,op1,0 # AFori result,result,0x00fb # set all except PFmtcrf 0x02,r0 # SF/ZFrlwimi flags,r0,27,CF_VALUE # CFxori result,result,0x00ff # 00 if PF set, 04 if clearNEXTpushfw_sp: bl _eval_flagsli r4,SPlhbrx r5,r4,stateaddi r5,r5,-2sthbrx r5,r4,stateclrlwi r5,r5,16sthbrx r3,ssb,r5NEXTpushfl_sp: bl _eval_flagsli r4,SPlhbrx r5,r4,stateaddi r5,r5,-4sthbrx r5,r4,stateclrlwi r5,r5,16stwbrx r3,ssb,r5NEXTpopfl_sp: li r4,SPlhbrx r5,r4,statelwbrx r3,ssb,r5addi r5,r5,4stw r3,eflags(state)sthbrx r5,r4,stateb 1fpopfw_sp: li r4,SPlhbrx r5,r4,statelhbrx r3,ssb,r5addi r5,r5,2sth r3,eflags+2(state)sthbrx r5,r4,state1: rlwinm op1,r3,31,0x08 # AFxori result,r3,4 # PFZF862ZF(r3) # cr6lis flags,(OF_EXPLICIT|ZF_PROTECT|ZF_IN_CR|SF_IN_CR)>>16addi op2,op1,0 # AFrlwinm result,result,0,0x04 # PFrlwimi flags,r3,27,CF_VALUE # CFmtcrf 0x6,r3 # IF,DF,TF,SF,ZFrlwimi result,r3,24,0,0 # SFrlwimi flags,r3,15,OF_VALUE # OFNEXT/* SETcc is slightly faster for setz/setnz */setz: EVAL_ZFbt ZF,1f0: cmpwi opreg,0bne- udstbx opreg,MEMNEXTsetnz: EVAL_ZFbt ZF,0b1: cmpwi opreg,0bne- udstbx one,MEMNEXT#define SETCC(cond, eval, flag) \set##cond: EVAL_##eval; bt flag,1b; b 0b; \setn##cond: EVAL_##eval; bt flag,0b; b 1bSETCC(c, CF, CF)SETCC(a, ABOVE, ABOVE)SETCC(s, SF, SF)SETCC(g, SIGNED, SGT)SETCC(l, SIGNED, SLT)SETCC(o, OF, OF)SETCC(p, PF, PF)/* No wait for a 486SX */.equ wait, nop/* ARPL is not recognized in real mode */.equ arpl, ud/* clts and in general control and debug registers are not implemented */.equ clts, unimplaaa: lhbrx r0,AX,statebl _eval_afrlwinm r3,r3,0,0x10SET_FLAGS(FLAGS_ADD(W))rlwimi r3,r0,0,0x0fli r4,0x106addi r3,r3,-10srwi r3,r3,16 # carry ? 0 : 0xffffandc op1,r4,r3 # carry ? 0x106 : 0add result,r0,op1rlwinm result,result,0,28,23 # clear high half of ALli op2,10 # sets AF indirectlysthbrx r3,AX,state # OF/SF/ZF/PF undefined !rlwimi result,op1,8,0x10000 # insert CFNEXTaas: lhbrx r0,AX,statebl _eval_afrlwinm r3,r3,0,0x10SET_FLAGS(FLAGS_ADD(W))rlwimi r3,r0,0,0x0f # AF:AL&0x0fli r4,0x106addi r3,r3,-10srwi r3,r3,16 # carry ? 0 : 0xffffandc op1,r4,r3 # carry ? 0x106 : 0sub result,r0,op1rlwinm result,result,0,28,23 # clear high half of ALli op2,10 # sets AF indirectlysthbrx r3,AX,state # OF/SF/ZF/PF undefined !rlwimi result,op1,8,0x10000 # insert CFNEXTdaa: lbz r0,AL(state)bl _eval_afrlwinm r7,r3,0,0x10bl _eval_cf # r3=CF<<8rlwimi r7,r0,0,0x0fSET_FLAGS(FLAGS_ADD(B))addi r4,r7,-10rlwinm r4,r4,3,0x06 # 6 if AF or >9, 0 otherwisesrwi op1,r7,1 # 0..4, no AF, 5..f AF setadd r0,r0,r4 # conditional addli op2,11 # sets AF depnding on op1or r0,r0,r3subfic r3,r0,159rlwinm r3,r3,7,0x60 # mask value to addadd result,r0,r3 # final result for SF/ZF/PFstb result,AL(state)rlwimi result,r3,2,0x100 # set CF if addedNEXTdas: lbz r0,AL(state)bl _eval_afrlwinm r7,r3,0,0x10bl _eval_cfrlwimi r7,r0,0,0x0fSET_FLAGS(FLAGS_ADD(B))addi r4,r7,-10rlwinm r4,r4,3,0x06srwi op1,r7,1 # 0..4, no AF, 5..f AF setsub r0,r0,r4 # conditional addli op2,11 # sets AF depending on op1or r4,r0,r3 # insert CFaddi r3,r4,-160rlwinm r3,r3,7,0x60 # mask value to addsub result,r4,r3 # final result for SF/ZF/PFstb result,AL(state)rlwimi result,r3,2,0x100 # set CFNEXT/* 486 specific instructions *//* For cmpxchg, only the zero flag is important */cmpxchgb: lbz op1,AL(state)SET_FLAGS(FLAGS_SUB(B)|ZF_IN_CR)lbzx op2,MEMcmpw cr6,op1,op2sub result,op1,op2bne cr6,1flbzx r3,REG # success: swapstbx r3,MEMNEXT1: stb op2,AL(state)NEXTcmpxchgw: lhbrx op1,AX,stateSET_FLAGS(FLAGS_SUB(W)|ZF_IN_CR)lhbrx op2,MEMcmpw cr6,op1,op2sub result,op1,op2bne cr6,1flhzx r3,REG # success: swapsthx r3,MEMNEXT1: sthbrx op2,AX,stateNEXTcmpxchgl: lwbrx op1,EAX,stateSET_FLAGS(FLAGS_SUB(L)|ZF_IN_CR|SIGNED_IN_CR)lwbrx op2,MEMcmpw cr6,op1,op2sub result,op1,op2bne cr6,1flwzx r3,REG # success: swapstwx r3,MEMNEXT1: stwbrx op2,EAX,stateNEXTxaddb: lbzx op2,MEMSET_FLAGS(FLAGS_ADD(B))lbzx op1,REGadd result,op1,op2stbx result,MEMstbx op2,REGNEXTxaddw: lhbrx op2,MEMSET_FLAGS(FLAGS_ADD(W))lhbrx op1,REGadd result,op1,op2sthbrx result,MEMsthbrx op2,REGNEXTxaddl: lwbrx op2,MEMSET_FLAGS(FLAGS_ADD(L))lwbrx op1,REGadd result,op1,op2stwbrx result,MEMstwbrx op2,REGNEXT/* All FPU instructions skipped. This is a 486 SX ! */esc: li r3,code_dna # DNA interruptb complex.equ hlt, unimpl # Cannot stop.equ invd, unimpl/* Undefined in real address mode */.equ lar, ud.equ lgdt, unimpl.equ lidt, unimpl.equ lldt, ud.equ lmsw, unimpl/* protected mode only */.equ lsl, ud.equ ltr, ud.equ movl_cr_reg, unimpl.equ movl_reg_cr, unimpl.equ movl_dr_reg, unimpl.equ movl_reg_dr, unimpl.equ sgdt, unimpl.equ sidt, unimpl.equ sldt, ud.equ smsw, unimpl.equ str, udud: li r3,code_udli r4,0b complexunimpl: li r3,code_udli r4,1b complex.equ verr, ud.equ verw, ud.equ wbinvd, unimplem86_end:.size em86_enter,em86_end-em86_enter#ifdef __BOOT__.data#define ENTRY(x,t) .long x+t-_jtables#else.section .rodata#define ENTRY(x,t) .long x+t#endif#define BOP(x) ENTRY(x,2) /* Byte operation with mod/rm byte */#define WLOP(x) ENTRY(x,3) /* 16 or 32 bit operation with mod/rm byte */#define EXTOP(x) ENTRY(x,0) /* Opcode with extension in mod/rm byte */#define OP(x) ENTRY(x,1) /* Direct one byte opcode/prefix *//* A few macros for the main table */#define gen6(op, wl, axeax) \BOP(op##b##_reg_mem); WLOP(op##wl##_reg_mem); \BOP(op##b##_mem_reg); WLOP(op##wl##_mem_reg); \OP(op##b##_imm_al); OP(op##wl##_imm_##axeax)#define rep7(l,t) \ENTRY(l,t); ENTRY(l,t); ENTRY(l,t); ENTRY(l,t); \ENTRY(l,t); ENTRY(l,t); ENTRY(l,t)#define rep8(l) l ; l; l; l; l; l; l; l;#define allcond(pfx, sfx, t) \ENTRY(pfx##o##sfx, t); ENTRY(pfx##no##sfx, t); \ENTRY(pfx##c##sfx, t); ENTRY(pfx##nc##sfx, t); \ENTRY(pfx##z##sfx, t); ENTRY(pfx##nz##sfx, t); \ENTRY(pfx##na##sfx, t); ENTRY(pfx##a##sfx, t); \ENTRY(pfx##s##sfx, t); ENTRY(pfx##ns##sfx, t); \ENTRY(pfx##p##sfx, t); ENTRY(pfx##np##sfx, t); \ENTRY(pfx##l##sfx, t); ENTRY(pfx##nl##sfx, t); \ENTRY(pfx##ng##sfx, t); ENTRY(pfx##g##sfx, t)/* single/double register sign extensions and other oddities */#define h2sextw cbw /* Half to Single sign extension */#define s2dextw cwd /* Single to Double sign extension */#define h2sextl cwde#define s2dextl cdq#define j_a16_cxz_w jcxz_w#define j_a32_cxz_w jecxz_w#define j_a16_cxz_l jcxz_l#define j_a32_cxz_l jecxz_l#define loopa16_w loopw_w#define loopa16_l loopw_l#define loopa32_w loopl_w#define loopa32_l loopl_l#define loopnza16_w loopnzw_w#define loopnza16_l loopnzw_l#define loopnza32_w loopnzl_w#define loopnza32_l loopnzl_l#define loopza16_w loopzw_w#define loopza16_l loopzw_l#define loopza32_w loopzl_w#define loopza32_l loopzl_l/* No FP support *//* Addressing mode table */.align 5# (%bx,%si), (%bx,%di), (%bp,%si), (%bp,%di)adtable: .long 0x00004360, 0x00004370, 0x80004560, 0x80004570# (%si), (%di), o16, (%bx).long 0x00004600, 0x00004700, 0x00002000, 0x00004300# o8(%bx,%si), o8(%bx,%di), o8(%bp,%si), o8(%bp,%di).long 0x00004360, 0x00004370, 0x80004560, 0x80004570# o8(%si), o8(%di), o8(%bp), o8(%bx).long 0x00004600, 0x00004700, 0x80004500, 0x00004300# o16(%bx,%si), o16(%bx,%di), o16(%bp,%si), o16(%bp,%di).long 0x00004360, 0x00004370, 0x80004560, 0x80004570# o16(%si), o16(%di), o16(%bp), o16(%bx).long 0x00004600, 0x00004700, 0x80004500, 0x00004300# register addressing modes do not use the table.long 0, 0, 0, 0, 0, 0, 0, 0#now 32 bit modes# (%eax), (%ecx), (%edx), (%ebx).long 0x00004090, 0x00004190, 0x00004290, 0x00004390# sib, o32, (%esi), (%edi).long 0x00003090, 0x00002090, 0x00004690, 0x00004790# o8(%eax), o8(%ecx), o8(%edx), o8(%ebx).long 0x00004090, 0x00004190, 0x00004290, 0x00004390# sib, o8(%ebp), o8(%esi), o8(%edi).long 0x00003090, 0x80004590, 0x00004690, 0x00004790# o32(%eax), o32(%ecx), o32(%edx), o32(%ebx).long 0x00004090, 0x00004190, 0x00004290, 0x00004390# sib, o32(%ebp), o32(%esi), o32(%edi).long 0x00003090, 0x80004590, 0x00004690, 0x00004790# register addressing modes do not use the table.long 0, 0, 0, 0, 0, 0, 0, 0#define jtable(wl, awl, spesp, axeax, name ) \.align 5; \jtab_##name: gen6(add, wl, axeax); \OP(push##wl##_##spesp##_sr); \OP(pop##wl##_##spesp##_sr); \gen6(or, wl, axeax); \OP(push##wl##_##spesp##_sr); \OP(_twobytes); \gen6(adc, wl, axeax); \OP(push##wl##_##spesp##_sr); \OP(pop##wl##_##spesp##_sr); \gen6(sbb, wl, axeax); \OP(push##wl##_##spesp##_sr); \OP(pop##wl##_##spesp##_sr); \gen6(and, wl, axeax); OP(_es); OP(daa); \gen6(sub, wl, axeax); OP(_cs); OP(das); \gen6(xor, wl, axeax); OP(_ss); OP(aaa); \gen6(cmp, wl, axeax); OP(_ds); OP(aas); \rep8(OP(inc##wl##_reg)); \rep8(OP(dec##wl##_reg)); \rep8(OP(push##wl##_##spesp##_reg)); \rep8(OP(pop##wl##_##spesp##_reg)); \OP(pusha##wl##_##spesp); OP(popa##wl##_##spesp); \WLOP(bound##wl); WLOP(arpl); \OP(_fs); OP(_gs); OP(_opsize); OP(_adsize); \OP(push##wl##_##spesp##_imm); WLOP(imul##wl##_imm); \OP(push##wl##_##spesp##_imm8); WLOP(imul##wl##_imm8); \OP(insb_##awl); OP(ins##wl##_##awl); \OP(outsb_##awl); OP(outs##wl##_##awl); \allcond(sj,_##wl,1); \EXTOP(grp1b_imm); EXTOP(grp1##wl##_imm); \EXTOP(grp1b_imm); EXTOP(grp1##wl##_imm8); \BOP(testb_reg_mem); WLOP(test##wl##_reg_mem); \BOP(xchgb_reg_mem); WLOP(xchg##wl##_reg_mem); \BOP(movb_reg_mem); WLOP(mov##wl##_reg_mem); \BOP(movb_mem_reg); WLOP(mov##wl##_mem_reg); \WLOP(mov##wl##_sr_mem); WLOP(lea##wl); \WLOP(mov##wl##_mem_sr); WLOP(pop##wl##_##spesp##_##awl); \OP(nop); rep7(xchg##wl##_##axeax##_reg,1); \OP(h2sext##wl); OP(s2dext##wl); \OP(lcall_##wl); OP(wait); \OP(pushf##wl##_##spesp); OP(popf##wl##_##spesp); \OP(sahf); OP(lahf); \OP(movb_##awl##_al); OP(mov##wl##_##awl##_##axeax); \OP(movb_al_##awl); OP(mov##wl##_##axeax##_##awl); \OP(movsb_##awl); OP(movs##wl##_##awl); \OP(cmpsb_##awl); OP(cmps##wl##_##awl); \OP(testb_imm_al); OP(test##wl##_imm_##axeax); \OP(stosb_##awl); OP(stos##wl##_##awl); \OP(lodsb_##awl); OP(lods##wl##_##awl); \OP(scasb_##awl); OP(scas##wl##_##awl); \rep8(OP(movb_imm_reg)); \rep8(OP(mov##wl##_imm_reg)); \EXTOP(shiftb_imm); EXTOP(shift##wl##_imm); \OP(ret##wl##_##spesp##_imm); OP(ret##wl##_##spesp); \WLOP(ldlptr##wl); WLOP(ldlptr##wl); \BOP(movb_imm_mem); WLOP(mov##wl##_imm_mem); \OP(enter##wl##_##spesp); OP(leave##wl##_##spesp); \OP(lret##wl##_imm); OP(lret##wl); \OP(int3); OP(int); OP(into); OP(iret##wl); \EXTOP(shiftb_1); EXTOP(shift##wl##_1); \EXTOP(shiftb_cl); EXTOP(shift##wl##_cl); \OP(aam); OP(aad); OP(ud); OP(xlatb_##awl); \rep8(OP(esc)); \OP(loopnz##awl##_##wl); OP(loopz##awl##_##wl); \OP(loop##awl##_##wl); OP(j_##awl##_cxz_##wl); \OP(inb_port_al); OP(in##wl##_port_##axeax); \OP(outb_al_port); OP(out##wl##_##axeax##_port); \OP(call##wl##_##spesp); OP(jmp_##wl); \OP(ljmp_##wl); OP(sjmp_##wl); \OP(inb_dx_al); OP(in##wl##_dx_##axeax); \OP(outb_al_dx); OP(out##wl##_##axeax##_dx); \OP(_lock); OP(ud); OP(_repnz); OP(_repz); \OP(hlt); OP(cmc); \EXTOP(grp3b); EXTOP(grp3##wl); \OP(clc); OP(stc); OP(cli); OP(sti); \OP(cld); OP(std); \EXTOP(grp4b); EXTOP(grp5##wl##_##spesp); \/* Here we start the table for twobyte instructions */ \OP(ud); OP(ud); WLOP(lar); WLOP(lsl); \OP(ud); OP(ud); OP(clts); OP(ud); \OP(invd); OP(wbinvd); OP(ud); OP(ud); \OP(ud); OP(ud); OP(ud); OP(ud); \rep8(OP(ud)); \rep8(OP(ud)); \OP(movl_cr_reg); OP(movl_reg_cr); \OP(movl_dr_reg); OP(movl_reg_dr); \OP(ud); OP(ud); OP(ud); OP(ud); \rep8(OP(ud)); \/* .long wrmsr, rdtsc, rdmsr, rdpmc; */\rep8(OP(ud)); \rep8(OP(ud)); \/* allcond(cmov, wl); */ \rep8(OP(ud)); rep8(OP(ud)); \rep8(OP(ud)); rep8(OP(ud)); \/* MMX Start */ \rep8(OP(ud)); rep8(OP(ud)); \rep8(OP(ud)); rep8(OP(ud)); \/* MMX End */ \allcond(j,_##wl, 1); \allcond(set,,2); \OP(push##wl##_##spesp##_sr); OP(pop##wl##_##spesp##_sr); \OP(ud) /* cpuid */; WLOP(bt##wl##_reg_mem); \WLOP(shld##wl##_imm); WLOP(shld##wl##_cl); \OP(ud); OP(ud); \OP(push##wl##_##spesp##_sr); OP(pop##wl##_##spesp##_sr); \OP(ud) /* rsm */; WLOP(bts##wl##_reg_mem); \WLOP(shrd##wl##_imm); WLOP(shrd##wl##_cl); \OP(ud); WLOP(imul##wl##_mem_reg); \BOP(cmpxchgb); WLOP(cmpxchg##wl); \WLOP(ldlptr##wl); WLOP(btr##wl##_reg_mem); \WLOP(ldlptr##wl); WLOP(ldlptr##wl); \WLOP(movzb##wl); WLOP(movzw##wl); \OP(ud); OP(ud); \EXTOP(grp8##wl); WLOP(btc##wl##_reg_mem); \WLOP(bsf##wl); WLOP(bsr##wl); \WLOP(movsb##wl); WLOP(movsw##wl); \BOP(xaddb); WLOP(xadd##wl); \OP(ud); OP(ud); \OP(ud); OP(ud); OP(ud); OP(ud); \rep8(OP(bswap)); \/* MMX Start */ \rep8(OP(ud)); rep8(OP(ud)); \rep8(OP(ud)); rep8(OP(ud)); \rep8(OP(ud)); rep8(OP(ud)); \/* MMX End */.align 5 /* 8kb of tables, 32 byte aligned */_jtables: jtable(w, a16, sp, ax, www) /* data16, addr16 */jtable(l, a16, sp, eax, lww) /* data32, addr16 */jtable(w, a32, sp, ax, wlw) /* data16, addr32 */jtable(l, a32, sp, eax, llw) /* data32, addr32 *//* The other possible combinations are only required by protected modecode using a big stack segment *//* Here are the auxiliary tables for opcode extensions, note thatall entries get 2 or 3 added. */#define grp1table(bwl,t,s8) \grp1##bwl##_imm##s8:; \ENTRY(add##bwl##_imm##s8,t); ENTRY(or##bwl##_imm##s8,t); \ENTRY(adc##bwl##_imm##s8,t); ENTRY(sbb##bwl##_imm##s8,t); \ENTRY(and##bwl##_imm##s8,t); ENTRY(sub##bwl##_imm##s8,t); \ENTRY(xor##bwl##_imm##s8,t); ENTRY(cmp##bwl##_imm##s8,t)grp1table(b,2,)grp1table(w,3,)grp1table(w,3,8)grp1table(l,3,)grp1table(l,3,8)#define shifttable(bwl,t,c) \shift##bwl##_##c:; \ENTRY(rol##bwl##_##c,t); ENTRY(ror##bwl##_##c,t); \ENTRY(rcl##bwl##_##c,t); ENTRY(rcr##bwl##_##c,t); \ENTRY(shl##bwl##_##c,t); ENTRY(shr##bwl##_##c,t); \OP(ud); ENTRY(sar##bwl##_##c,t)shifttable(b,2,1)shifttable(w,3,1)shifttable(l,3,1)shifttable(b,2,cl)shifttable(w,3,cl)shifttable(l,3,cl)shifttable(b,2,imm)shifttable(w,3,imm)shifttable(l,3,imm)#define grp3table(bwl,t) \grp3##bwl: ENTRY(test##bwl##_imm,t); OP(ud); \ENTRY(not##bwl,t); ENTRY(neg##bwl,t); \ENTRY(mul##bwl,t); ENTRY(imul##bwl,t); \ENTRY(div##bwl,t); ENTRY(idiv##bwl,t)grp3table(b,2)grp3table(w,3)grp3table(l,3)grp4b: BOP(incb); BOP(decb); \OP(ud); OP(ud); \OP(ud); OP(ud); \OP(ud); OP(ud)#define grp5table(wl,spesp) \grp5##wl##_##spesp: \WLOP(inc##wl); WLOP(dec##wl); \WLOP(call##wl##_##spesp##_mem); WLOP(lcall##wl##); \WLOP(jmp##wl); WLOP(ljmp##wl); \WLOP(push##wl##_##spesp); OP(ud)grp5table(w,sp)grp5table(l,sp)#define grp8table(wl) \grp8##wl: OP(ud); OP(ud); OP(ud); OP(ud); \WLOP(bt##wl##_imm); WLOP(bts##wl##_imm); \WLOP(btr##wl##_imm); WLOP(btc##wl##_imm)grp8table(w)grp8table(l)#ifdef __BOOT___endjtables: .long 0 /* Points to _jtables after relocation */#endif
