OpenCores
no use no use 1/1 no use no use
i386 x40
by RickCHodgin on Nov 12, 2014
RickCHodgin
Posts: 5
Joined: Nov 12, 2014
Last seen: Mar 28, 2022
I have an idea for an i386-based core extended out to 40 bits, allowing for a Terabyte of addressable RAM.

I drop support for real mode and virtual-8086 mode, and deal only with protected mode. For backward compatibility with existing real-mode BIOS, a version of Bochs can be ported which emulates the real-mode environment. The default boot-up environment would be 32-bit mode. An extension would be created to allow 40-bit mode:

; An atomic operation
    cli
    mov  EAX,CR0
    or   EAX,0x100h    ; Enable 40-bit loads
    mov  CR0,EAX

    ; Setup 40-bit environment

    mov  EAX,CR0
    or   EAX,0x200h    ; Enable 40-bit processing mode
    mov  CR0,EAX
    sti
    ; Now in 40-bit mode
    ; A new 40-bit W-series register base is created (wide registers)

-----
I have also come up with the idea for a WEX prefix (instead of the REX prefix). WEX stands for "Window Extentions" and relates to register windows. The same EAX..EDX/ESI/EDI are available, but they are extended out to 40 bits making them WAX..WDX/WSI/WDI and WBP/WSP/WIP. They operate with the same Mod/R/RM and SIB instruction encodings, but they map through a register window that is set by staking a claim to some of the unused EFLAGS bits. This allows for a large register set to be used with the one-byte opcode WEX prefixes being injected before instruction groups which set bits in the EFLAGS register. These bits are then sticky and persist until changed, or until a branching instruction with resets them. This differs from REX in that they are not required on every instruction.

Here's the WEX table with opcode and proposed instruction name thus far:
    WEX opcode          Window Extension             Always Same   Instruction
    ----------   ---------------------------------   -----------   ------------
    0100.0000    WAX   WBX   WCX   WDX   WSI   WDI     WBP  WSP      wexbase
    0100.0001    WAX   WBX   WCX   WDX    W1    W7     WBP  WSP      wexsd1
    0100.0010    WAX   WBX   WCX   WDX   W13   W19     WBP  WSP      wexsd13
    0100.0011    WAX   WBX   WCX   WDX   W25   W31     WBP  WSP      wexsd25
    0100.0100    WAX    W1    W7   W13   W19   W25     WBP  WSP      wexa1
    0100.0101    WBX   W31   W37   W43   W49   W55     WBP  WSP      wexa31
    0100.0110     W1    W2    W3    W4    W5    W6     WBP  WSP      wex1
    0100.0111     W7    W8    W9   W10   W11   W12     WBP  WSP      wex7
    0100.1000    W13   W14   W15   W16   W17   W18     WBP  WSP      wex13
    0100.1001    W19   W20   W21   W22   W23   W24     WBP  WSP      wex19
    0100.1010    W25   W26   W27   W28   W29   W30     WBP  WSP      wex25
    0100.1011    W31   W32   W33   W34   W35   W36     WBP  WSP      wex31
    0100.1100    W37   W38   W39   W40   W41   W42     WBP  WSP      wex37
    0100.1101    W43   W44   W45   W46   W47   W48     WBP  WSP      wex43
    0100.1110    W49   W50   W51   W52   W53   W54     WBP  WSP      wex49
    0100.1111    W55   W56   W57   W58   W59   W60     WBP  WSP      wex55
A small handful of new instructions would also be defined which would spill WAX..WDX/WSI/WDI out to one of the W-number bases (W1..W6, W7..W12, W13..W18, etc.), and then fill back into the wxbase registers.

-----
In addition, a new simpler paging model would be created. Rather than using CR3 for a page table base address, it would point to a new structure which holds bits for accessed, dirty, and not present for each selector that's loaded, and these bits relate to the granularity size of the descriptor table (which by default in 40-bit mode would yield 1MB pages, though there would be 8 new bits available in the descriptor which could be purposed to allow for smaller or larger pages based on the application's anticipated memory needs and usage model). The pages in this paging system aren't physically not present, and there is no re-arrangement of requested addresses within the selector's limit then into a page-mapped linear addresses, but instead everything would be translated completely linearly. This would exist because we are now entering a time of a heavily-memory'd availability in runtime environments where it would not be difficult to have linear real memory for even huge demands in a 40-bit address model.

And because we're using an extension of the i386 protected mode architecture, FS and GS would be segment selectors again, allowing for some additional sticky bits which map each of the loaded DS,ES,FS,GS registers to then be mapped to an alternate logical selector, meaning operations take place using the standard DS: and ES: requirements today can be mapped to other selectors, like FS: and GS:, and without having to actually change or reload the selectors (and their underlying descriptors), but only load a register mapping into some new bits in EFLAGS which hold the translation.

-----
And there are some more extensions I've considered. I'm working on the specific details relating to the internals in a document that's similar to the Intel IA-32 instruction manual format, specifically volume 3. But, the general model/idea is here.

You can read a little more about it (specifically more about the paging model) on comp.arch under the [Core proposal, i386 40-bit dubbed "x40"] thread.

-----
I would be curious for feedback and thoughts. An openCores.org user, Aleksander Osmand from Poland already has an 486SX (no FPU) design implemented that is FPGA-proven (see http://opencores.org/project,ao486,Overview).

I look forward to hearing some feedback. Thank you.

Best regards,
Rick C. Hodgin

RE: i386 x40
by RickCHodgin on Nov 15, 2014
RickCHodgin
Posts: 5
Joined: Nov 12, 2014
Last seen: Mar 28, 2022
You can track my progress at:

        GitHub i386x40

Best regards,
Rick C. Hodgin
RE: i386 x40
by RickCHodgin on Nov 21, 2014
RickCHodgin
Posts: 5
Joined: Nov 12, 2014
Last seen: Mar 28, 2022
I have been teaching myself Verilog and am getting ready to begin work on a series of small CPU cores (James 4:15, Lord willing). Here is the post from comp.arch:

I've been verilogging myself up and will soon begin working on a tiny
CPU project called Oppie (for "little opcode").  I am beginning this
independently from any published tutorials (apart from the samples on
EDA Playground and some examination of Aleksander Osman's ao486 CPU).

Any help/advice would be appreciated.  My work will be published on
GitHub in the oppie sub-directories (when they're created):
        GitHub i386-x40
-----
I plan on several stages, the first two being:

    (1)  Oppie-1, a simple CPU able to execute eight opcodes,
         processing 8-bits at a time, with 11 bit absolute
         addresses, and 10-bit relative addresses.

         I plan on four 8-bit registers, 2 KB of RAM, and
         these instructions:

               8-bit Regs          Flags
             --------------      ---------
             r1 -- 00000000        zero?
             r2 -- 00000000        carry?
             r3 -- 00000000
             r4 -- 00000000

                              Opcode
         ASM Instruction       Bytes    Opcode Bit Encoding (x=unused, 0)
         -------------------  ------    ---------------------------------
                                        * Register encodings are 2 bits
         mov  reg8,[address]     2      000 .r8.000:00000000
         mov  reg8,reg8          1      001 .x.r8d.r8s  (dest,src)
         add  reg8,reg8          1      010 .x.r8d.r8s
         adc  reg8,reg8          1      011 .x.r8d.r8s
         mov  [address],reg8     2      100 .r8.000:00000000
         cmp  reg8,reg8          1      101 .x.r8l.r8r  (left,right)
         jz   +/- 1KB            2      110 .xx.s.00:00000000
         jmp  +/- 1KB            2      111 .xx.s.00:00000000

    (2)  Oppie-2, a CPU with eight 32-bit registers and additional
         instructions supporting for 32-bit unsigned integer
         processing.

-----

Future plans, all subject to change:

    (3)  Oppie-3 will add a stack pointer and stack.
    (4)  Oppie-4 will add integer support and interrupts.
    (5)  Oppie-5 will add multiple tasks / processes.
    (6)  Oppie-6 will add multiple cores.

The ultimate target of i386-x40 is a four-core design, each able to
address 1 TB of memory connected to its own core, and for all cores
to be able to share an additional 1 TB of memory commonly.

Best regards,
Rick C. Hodgin
RE: i386 x40
by RickCHodgin on Nov 27, 2014
RickCHodgin
Posts: 5
Joined: Nov 12, 2014
Last seen: Mar 28, 2022
Much has happened in development. You can track progress on what I am now calling LibSF 386-x40 at a new sub-directory below.

The general goal for Oppie-1 thru Oppie-7 cores has been defined at the link below (including simple diagrams in PNG or DIA Diagram Editor). I have switched from Verilog to C/C++ for developing the basic core designs and getting the logic debugged on the advice of someone from comp.arch. Only Oppie-1 has been translated into Verilog, but it has not yet been debugged yet (Nov.27.2014).

Work proceeds on re-defining the LibSF 386-x40's opcodes. In general it will use the same environment with some modifications on the FPU (no stack) and I add a SIMD unit which is a 4-wide 32-bit floating point / 32-bit integer processor, with limited 64-bit fp and integer support.

Top-level sub-directory:
        LibSF 386-x40 on GitHub

Current development on the Oppie cores:
        Oppie-1 thru Oppie-7 (planned)

Current development on the Oppie-1 debugger (called "Debo-1"):
        Oppie-1 C Simulation

Best regards,
Rick C. Hodgin
RE: i386 x40
by ultro on Jan 15, 2015
ultro
Posts: 1
Joined: Oct 21, 2012
Last seen: May 14, 2024
Hi,
interesting idea, you are going for flat adress mode ?
typically such large memory space goes along with page support with MMU?

about MMU:
Do you know on x86-32 you can adress physically 64GByte with pae or pse36.
Still, OSes limits logical process to 32bit/4Gbyte, and linux uses only pae.

Probably , dealing with 64bits even with a subset of x86-64 functionality is also to consider.

Good luck ,
rgds


RE: i386 x40
by RickCHodgin on Jan 15, 2015
RickCHodgin
Posts: 5
Joined: Nov 12, 2014
Last seen: Mar 28, 2022
Hi,
interesting idea, you are going for flat adress mode ?
typically such large memory space goes along with page support with MMU?

It is a flat address mode, and I do not support paging in the traditional sense as it exists today on the i386/AMD64 models. However, I've created something like paging called memory range monitor, which allow the same effect as paging in software.

There are optional interrupt on read and write based on a bitmap flag for the granularity of the memory range allowing for "page" sizes of 512 bytes, 4KB, 1MB, and 4MB. If the bit is set for the read or write monitor, it will signal the interrupt before the read or write takes place, allowing for that block to be populated by the OS with its missing data, or to save a 'before' copy for later comparison, etc.

See the images on GitHub, and specifically the paging_cr0_cr4.png file.

about MMU:
Do you know on x86-32 you can adress physically 64GByte with pae or pse36.
Still, OSes limits logical process to 32bit/4Gbyte, and linux uses only pae.

Yes. PAE allows more memory per machine, but still only 4GB per process/task. The 386-x40 allows for true 40-bit compute, and 40-bit addressing when enabled. By default it maintains 32-bit compute and 32-bit addressing. I do not support traditional paging in any regard, but it is always using the new paging model.

Probably , dealing with 64bits even with a subset of x86-64 functionality is also to consider.

I've tried to figure out what would be required from x86-64 that I do not possess in the LibSF 386-x40 design. Apart from virtualization, I think the simplicity of the redesign is greatly appealing. We'll see though. A lot of work. I am currently working on an Altera Cyclone V-based product. Oppie-1 works in simulation, though is still buggy. I've devoted my time to other projects lately, but will come back to LibSF 386-40 from time to time.

Good luck ,
rgds

Thank you! :-)

Best regards,
Rick C. Hodgin
no use no use 1/1 no use no use
© copyright 1999-2025 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.