1/1
i386 x40
by RickCHodgin on Nov 12, 2014 |
RickCHodgin
Posts: 5 Joined: Nov 12, 2014 Last seen: Mar 28, 2022 |
||
I have an idea for an i386-based core extended out to 40 bits, allowing for a Terabyte of addressable RAM. I drop support for real mode and virtual-8086 mode, and deal only with protected mode. For backward compatibility with existing real-mode BIOS, a version of Bochs can be ported which emulates the real-mode environment. The default boot-up environment would be 32-bit mode. An extension would be created to allow 40-bit mode: ; An atomic operation cli mov EAX,CR0 or EAX,0x100h ; Enable 40-bit loads mov CR0,EAX ; Setup 40-bit environment mov EAX,CR0 or EAX,0x200h ; Enable 40-bit processing mode mov CR0,EAX sti ; Now in 40-bit mode ; A new 40-bit W-series register base is created (wide registers) ----- I have also come up with the idea for a WEX prefix (instead of the REX prefix). WEX stands for "Window Extentions" and relates to register windows. The same EAX..EDX/ESI/EDI are available, but they are extended out to 40 bits making them WAX..WDX/WSI/WDI and WBP/WSP/WIP. They operate with the same Mod/R/RM and SIB instruction encodings, but they map through a register window that is set by staking a claim to some of the unused EFLAGS bits. This allows for a large register set to be used with the one-byte opcode WEX prefixes being injected before instruction groups which set bits in the EFLAGS register. These bits are then sticky and persist until changed, or until a branching instruction with resets them. This differs from REX in that they are not required on every instruction. Here's the WEX table with opcode and proposed instruction name thus far: WEX opcode Window Extension Always Same Instruction ---------- --------------------------------- ----------- ------------ 0100.0000 WAX WBX WCX WDX WSI WDI WBP WSP wexbase 0100.0001 WAX WBX WCX WDX W1 W7 WBP WSP wexsd1 0100.0010 WAX WBX WCX WDX W13 W19 WBP WSP wexsd13 0100.0011 WAX WBX WCX WDX W25 W31 WBP WSP wexsd25 0100.0100 WAX W1 W7 W13 W19 W25 WBP WSP wexa1 0100.0101 WBX W31 W37 W43 W49 W55 WBP WSP wexa31 0100.0110 W1 W2 W3 W4 W5 W6 WBP WSP wex1 0100.0111 W7 W8 W9 W10 W11 W12 WBP WSP wex7 0100.1000 W13 W14 W15 W16 W17 W18 WBP WSP wex13 0100.1001 W19 W20 W21 W22 W23 W24 WBP WSP wex19 0100.1010 W25 W26 W27 W28 W29 W30 WBP WSP wex25 0100.1011 W31 W32 W33 W34 W35 W36 WBP WSP wex31 0100.1100 W37 W38 W39 W40 W41 W42 WBP WSP wex37 0100.1101 W43 W44 W45 W46 W47 W48 WBP WSP wex43 0100.1110 W49 W50 W51 W52 W53 W54 WBP WSP wex49 0100.1111 W55 W56 W57 W58 W59 W60 WBP WSP wex55A small handful of new instructions would also be defined which would spill WAX..WDX/WSI/WDI out to one of the W-number bases (W1..W6, W7..W12, W13..W18, etc.), and then fill back into the wxbase registers. ----- In addition, a new simpler paging model would be created. Rather than using CR3 for a page table base address, it would point to a new structure which holds bits for accessed, dirty, and not present for each selector that's loaded, and these bits relate to the granularity size of the descriptor table (which by default in 40-bit mode would yield 1MB pages, though there would be 8 new bits available in the descriptor which could be purposed to allow for smaller or larger pages based on the application's anticipated memory needs and usage model). The pages in this paging system aren't physically not present, and there is no re-arrangement of requested addresses within the selector's limit then into a page-mapped linear addresses, but instead everything would be translated completely linearly. This would exist because we are now entering a time of a heavily-memory'd availability in runtime environments where it would not be difficult to have linear real memory for even huge demands in a 40-bit address model. And because we're using an extension of the i386 protected mode architecture, FS and GS would be segment selectors again, allowing for some additional sticky bits which map each of the loaded DS,ES,FS,GS registers to then be mapped to an alternate logical selector, meaning operations take place using the standard DS: and ES: requirements today can be mapped to other selectors, like FS: and GS:, and without having to actually change or reload the selectors (and their underlying descriptors), but only load a register mapping into some new bits in EFLAGS which hold the translation. ----- And there are some more extensions I've considered. I'm working on the specific details relating to the internals in a document that's similar to the Intel IA-32 instruction manual format, specifically volume 3. But, the general model/idea is here. You can read a little more about it (specifically more about the paging model) on comp.arch under the [Core proposal, i386 40-bit dubbed "x40"] thread. ----- I would be curious for feedback and thoughts. An openCores.org user, Aleksander Osmand from Poland already has an 486SX (no FPU) design implemented that is FPGA-proven (see http://opencores.org/project,ao486,Overview). I look forward to hearing some feedback. Thank you. Best regards, Rick C. Hodgin |
RE: i386 x40
by RickCHodgin on Nov 15, 2014 |
RickCHodgin
Posts: 5 Joined: Nov 12, 2014 Last seen: Mar 28, 2022 |
||
RE: i386 x40
by RickCHodgin on Nov 21, 2014 |
RickCHodgin
Posts: 5 Joined: Nov 12, 2014 Last seen: Mar 28, 2022 |
||
I have been teaching myself Verilog and am getting ready to begin work on a series of small CPU cores (James 4:15, Lord willing). Here is the post from comp.arch: I've been verilogging myself up and will soon begin working on a tiny CPU project called Oppie (for "little opcode"). I am beginning this independently from any published tutorials (apart from the samples on EDA Playground and some examination of Aleksander Osman's ao486 CPU). Any help/advice would be appreciated. My work will be published on GitHub in the oppie sub-directories (when they're created):GitHub i386-x40 ----- I plan on several stages, the first two being: (1) Oppie-1, a simple CPU able to execute eight opcodes, processing 8-bits at a time, with 11 bit absolute addresses, and 10-bit relative addresses. I plan on four 8-bit registers, 2 KB of RAM, and these instructions: 8-bit Regs Flags -------------- --------- r1 -- 00000000 zero? r2 -- 00000000 carry? r3 -- 00000000 r4 -- 00000000 Opcode ASM Instruction Bytes Opcode Bit Encoding (x=unused, 0) ------------------- ------ --------------------------------- * Register encodings are 2 bits mov reg8,[address] 2 000 .r8.000:00000000 mov reg8,reg8 1 001 .x.r8d.r8s (dest,src) add reg8,reg8 1 010 .x.r8d.r8s adc reg8,reg8 1 011 .x.r8d.r8s mov [address],reg8 2 100 .r8.000:00000000 cmp reg8,reg8 1 101 .x.r8l.r8r (left,right) jz +/- 1KB 2 110 .xx.s.00:00000000 jmp +/- 1KB 2 111 .xx.s.00:00000000 (2) Oppie-2, a CPU with eight 32-bit registers and additional instructions supporting for 32-bit unsigned integer processing. ----- Future plans, all subject to change: (3) Oppie-3 will add a stack pointer and stack. (4) Oppie-4 will add integer support and interrupts. (5) Oppie-5 will add multiple tasks / processes. (6) Oppie-6 will add multiple cores. The ultimate target of i386-x40 is a four-core design, each able to address 1 TB of memory connected to its own core, and for all cores to be able to share an additional 1 TB of memory commonly. Best regards, Rick C. Hodgin |
RE: i386 x40
by RickCHodgin on Nov 27, 2014 |
RickCHodgin
Posts: 5 Joined: Nov 12, 2014 Last seen: Mar 28, 2022 |
||
Much has happened in development. You can track progress on what I am now calling LibSF 386-x40 at a new sub-directory below. The general goal for Oppie-1 thru Oppie-7 cores has been defined at the link below (including simple diagrams in PNG or DIA Diagram Editor). I have switched from Verilog to C/C++ for developing the basic core designs and getting the logic debugged on the advice of someone from comp.arch. Only Oppie-1 has been translated into Verilog, but it has not yet been debugged yet (Nov.27.2014). Work proceeds on re-defining the LibSF 386-x40's opcodes. In general it will use the same environment with some modifications on the FPU (no stack) and I add a SIMD unit which is a 4-wide 32-bit floating point / 32-bit integer processor, with limited 64-bit fp and integer support. Top-level sub-directory: LibSF 386-x40 on GitHub Current development on the Oppie cores: Oppie-1 thru Oppie-7 (planned) Current development on the Oppie-1 debugger (called "Debo-1"): Oppie-1 C Simulation Best regards, Rick C. Hodgin |
RE: i386 x40
by ultro on Jan 15, 2015 |
ultro
Posts: 1 Joined: Oct 21, 2012 Last seen: May 14, 2024 |
||
Hi,
interesting idea, you are going for flat adress mode ? typically such large memory space goes along with page support with MMU? about MMU: Do you know on x86-32 you can adress physically 64GByte with pae or pse36. Still, OSes limits logical process to 32bit/4Gbyte, and linux uses only pae. Probably , dealing with 64bits even with a subset of x86-64 functionality is also to consider. Good luck , rgds |
RE: i386 x40
by RickCHodgin on Jan 15, 2015 |
RickCHodgin
Posts: 5 Joined: Nov 12, 2014 Last seen: Mar 28, 2022 |
||
Hi,
interesting idea, you are going for flat adress mode ? typically such large memory space goes along with page support with MMU? It is a flat address mode, and I do not support paging in the traditional sense as it exists today on the i386/AMD64 models. However, I've created something like paging called memory range monitor, which allow the same effect as paging in software. There are optional interrupt on read and write based on a bitmap flag for the granularity of the memory range allowing for "page" sizes of 512 bytes, 4KB, 1MB, and 4MB. If the bit is set for the read or write monitor, it will signal the interrupt before the read or write takes place, allowing for that block to be populated by the OS with its missing data, or to save a 'before' copy for later comparison, etc. See the images on GitHub, and specifically the paging_cr0_cr4.png file.
about MMU:
Do you know on x86-32 you can adress physically 64GByte with pae or pse36. Still, OSes limits logical process to 32bit/4Gbyte, and linux uses only pae. Yes. PAE allows more memory per machine, but still only 4GB per process/task. The 386-x40 allows for true 40-bit compute, and 40-bit addressing when enabled. By default it maintains 32-bit compute and 32-bit addressing. I do not support traditional paging in any regard, but it is always using the new paging model.
Probably , dealing with 64bits even with a subset of x86-64 functionality is also to consider.
I've tried to figure out what would be required from x86-64 that I do not possess in the LibSF 386-x40 design. Apart from virtualization, I think the simplicity of the redesign is greatly appealing. We'll see though. A lot of work. I am currently working on an Altera Cyclone V-based product. Oppie-1 works in simulation, though is still buggy. I've devoted my time to other projects lately, but will come back to LibSF 386-40 from time to time.
Good luck ,
rgds Thank you! :-) Best regards, Rick C. Hodgin |
1/1