OpenCores

Project maintainers

Details

Name: taar
Created: Dec 16, 2017
Updated: Nov 29, 2021
SVN: No files checked in
Bugs: 1 reported / 0 solved
Star5you like it: star it!

Other project properties

Category:System on Chip
Language:Other
Development status:Planning
Additional info:
WishBone compliant: No
WishBone version: n/a
License: GPL

Project Spice - General overview ( project formerly known as Taar )

This project comprises of a SoC type microprocessor and a microkernel-based operating system both of which are open source. Both are called Spice. The Spice processor and OS are being designed to complement each other.

The projects are still being designed and I welcome anyone to contribute. This document is a design specification based on which further processor and OS architecture will be defined.

In this document the processor's details are given first and then the OS'.

The name of the project comes from the material that is one of the center points in the sublime Dune book series :

Spice Melange

melange (me'-lange also ma,lanj) n-s, original uncertain (thought to derive from ancient Terran Franzh): a. mixture of spices; b. spice of Arrakis (Dune) with geriatric properties first noted by Yanshuph Ashkoko, royal chemist in reign of Shakkad the Wise; Arrakeen melange, found only in deepest desert sands of Arrakis, linked to prophetic visions of Paul Muad'Dib (Atreides), first Fremen Mahdi; also employed by Spacing Guild Navigators and the Bene Gesserit.

General processor overview

The Spice processor is styled as a System-on-Chip ( SoC ) whose main processor unit is meant to be a simplified single-core with clock-less design. The other subsystems are RAM controller, multimedia processing unit, USB, LED display interfaces, camera interfaces, WiFi transceiver, general purpose timer, system reset timer and real time clock. The SoC is meant to be used in a wearable computer, in embedded and server systems. The main processor's design, beyond commonalities with all designs, is not related to any specific processor.

To accept analog phenomena into the system, instead of an ADC chip a camera is to be used. The camera is placed in proximity of an analog device and the analog data is taken in visually. This opens up possibilities like receiving and processing analog data like from analog meters and taking input from multiple analog streams ( for example multiple pulsing LEDs placed in front of the camera - the LEDs can be connected to different microphones ). For the system digital data to induce external analog phenomena, instead of a DAC chip the other LED display is to be used for example with multiple LDRs placed in front of the display ( for example the LDRs connected to different surround sound speakers ). Connected to this concept, for data that cannot come in as an image directly to the camera - like from a parking radar sensor in a vehicle or the engine temperature sensor in a vehicle - fiber optics can be used where the sensor digitizes and serializes the data and sends it through the fiber and the fiber end point is placed near the camera after which the software reading the camera deserializes that pulsed-light data and processes it.

The wearable computer's main side will be styled as a spectacles-based display which will also have a user input subsystem. The user input will be through a camera attached to the spectacles and will sense touches and movement through intelligent processing that will allow three ways of usage : (a). Button selection, menu selection and regular typing, (b). Free hand colored drawing with a finger or stylus, (c). Colored drawing with geometric instruments like a scale. The input will be made either in the space just in front of the camera or on a flat surface for drawing. Multiple input point capture is also possible with short range radar in this context but with a visual camera short range augmented reality is also made possible by the same unit. Two things in one.

The bus system between the main processor and most of the other subsystems will be USB.

At the moment there are less than 30 instructions but with furthering of the CPU core design, the multimedia unit design and also furthering of the kernel design more instructions will be added. After a comfortable round of design the processor can be ready for a FPGA implementation.

Project contributors

There is one contributor to the processor project : Vishal Zuluk.

Register set

The main processor unit contains eleven types of registers as listed below along with their bit length :

Current Process ---------------------------- 32 bits ( system register )

Current Page Directory Page Number - 32 bits ( system register )

Process Status ----------------------------- 32 bits ( system register )

Generated Physical Page Address ----- 32 bits ( system register )

Instruction ---------------------------------- 128 bits ( system register - to contain the entire instruction so that it can be decoded )

Data ----------------------------------------- 32 bits ( system register - to contain the read data so that it can be decoded )

R1 -------------------------------------------- 32 bits ( general register - use explained later )

R2 -------------------------------------------- 32 bits ( general register - use explained later )

R3 -------------------------------------------- 32 bits ( general register - use explained later )

Instruction Pointer ------------------------ 32 bits ( general register - to hold the next instruction's address )

Loop Counter ------------------------------ 32 bits ( general register )

The first three registers will appear at three addresses in kernel space which means that the normal load and store instructions can be used to operate on them and no special instructions are required to achieve this.

Memory Management

The MMU is paging-based. A page directory in Spice processor is sized 4 KB and is flexible in the sense that it can be connected to a 4 KB page table or to a 4 KB data page.

Structure of a page directory entry :
-------> Status ( 8 bits )
-------> Page number ( 24 bits )

The Status bits in the page directory will consist of :
-------> Bit 0 - Entry present
-------> Bit 1 - Entry dirty for call communications page ( for cache mechanism )
-------> Bits 2 to 7 - Unused

Structure of a page table entry :
-------> Status ( 8 bits )
-------> Page number ( 24 bits )

The Status bits in the page table will consist of :
-------> Bit 0 - Entry present
-------> Bit 1 - Entry dirty for pages not from among those in the page directory ( for cache mechanism )
-------> Bits 2 to 7 - Unused

Memory map of a process ( the arrangement of entries in the page directory ) :
-------> Kernel Page tables from 0 address to 32 MB
-------> Call communications page next
-------> Client process' I/O buffer's page table next
-------> Process code page tables from 500 MB to 1 GB
-------> Process private data page tables from 1 GB to 1.5 GB
-------> Process common data page tables from 1.5 GB to 4 GB

Though some cache settings are mentioned above it would be very good if caching is discarded altogether. Will add to further simplification of the SoC design.

A Spice OS process can modify its own code through the implementation of the concept called Self-modifying Code ( SMC ). Here the to-be-described-below instruction store-r1 is used. During programming, in the assembly code the target instruction is identified using a text label which is used with the store-r1 instruction. During compile time the compiler converts the label into an absolute address. At run time through this instruction the next instruction opcode or operand is written into the address of the relevant instruction. However, this technique cannot be used with concurrently-running code sections of the same process ( child processes using the same code at the same time and service functions ) because one concurrent instance modifying the code will corrupt the code for another concurrent instance, so any usage of SMC will need code serialization.

[ To be done ]

Instructions format

At present there are 21 instructions as given below as a single list :

mov-r1-r2, loop-forever, load-loop-counter, loop-next, load-r1-imm, load-r1-mem, store-r1, resolve-ptr, copyword, add, sub, and, or, xor, lsh, rsh, countofonebits, appendonebit, switchprocesses, kenter, kexit

Further, each instruction described in detail. Every Spice processor instruction is 128 bits long. The instructions are all of this fixed length and fixed format to allow the processor logic to read instructions at a determinate rate and keep things simple.

The instruction format is listed below in a vertical manner. The first line below is the first field of the instruction format from the right. The operands are being presented without names here because their usages are different for different Operation Code ( hereforth called opcode ). The numbers within the brackets are their bit-length :

Operation code ------- ( 32 bits )
Operand1 -------------- ( 32 bits )
Operand2 -------------- ( 32 bits )
Operand3 -------------- ( 32 bits )

Only the first byte of the opcode contains the instruction number. The rest of this field should be filled with zeroes.

The memory load instruction and its counterpart the store instruction, exist because the mathematic and logic instructions do not access memory directly. The mathematic and logic instructions don’t write back the result to memory after execution and therefore the code will have to use the store instruction if a memory write-back is needed. Having a separate write-back will allow code that only compares and does not need the result value in memory to not take time for a write-back. Such an arrangement also keeps the ISA simple and clean.

mov-r1-r2 instruction

Copy content of R1 register into R2

mov-r1-r2 :

Usage ( lsb on top ) :
-------> opcode ( mov-r1-r2 )
-------> zero
-------> zero
-------> zero

Example :
---> mov-r1-r2 0, 0, 0

Loop instructions

These instructions allow for program loops.

loop-forever : This is a simple unconditional jump to the instruction which is the start of the loop. Its equivalent in C language is the 'do while(1)' loop.

Usage for loop-forever ( lsb on top ) :
-------> opcode ( loop-forever )
-------> loop start address
-------> zero
-------> zero

load-loop-counter : Loads count into the Loop Counter register. The count should be a positive number and be present in R1. If the counter is a zero then the instruction effectively resets the counter :

Usage for load-loop-counter ( lsb on top ) :
-------> opcode ( load-loop-counter )
-------> zero
-------> zero
-------> zero

loop-next : Subtracts one from the counter register and if non-zero goes back to the loop start else goes to the next instruction after the loop. Its equivalent in C language is the 'do while( count > zero )' loop :

Usage for loop-next ( lsb on top ) :
-------> opcode ( loop-next )
-------> loop start address
-------> zero
-------> zero

load-r1-imm instruction

Copy into register R1 the immediate value from Operand1.

load-r1-imm :

Usage ( lsb on top ) :
-------> opcode ( load-r1-imm )
-------> immediate value
-------> zero
-------> zero

Example :
---> load-r1-imm 0xfeedf00d 0, 0

load-r1-mem instruction

Copy into register R1 the value from the memory address pointed to via Operand1 + Operand2.

load-r1-mem :

Usage ( lsb on top ) :
-------> opcode ( load-r1-mem )
-------> the starting address
-------> the offset
-------> zero

Example :
---> load-r1-mem [0xf00df00d], 20, 0

store-r1 instruction

Copy a word into a memory address the value from the register R1.

store-r1 :

Usage ( lsb on top ) :
-------> opcode ( store-r1 )
-------> the starting address
-------> the offset
-------> zero

Example :
---> store-r1 [0xdead0000], 10, 0

resolve-ptr instruction

Copy into R1 the address pointed-to via Operand1 + Operand2.

resolve-ptr :

Usage ( lsb on top ) :
-------> opcode ( resolve-ptr )
-------> the pointer address
-------> the offset
-------> zero

Example :
---> resolve-ptr [0xfeedfade], 20, 0

copyword instruction

Copies a word from the address pointed to by R1 into the address pointed to by R2.

copyword :

Usage ( lsb on top ) :
-------> opcode ( copyword )
-------> if increment source
-------> if increment destination
-------> zero

Example :
---> copyword 0, 1, 0

Once the word is copied the R1 and R2 are incremented if operand 1 and operand 2 declare them so. The increments happen by four - the byte-length of a word. This instruction when asked to do increments has to be used in a loop.

The nine mathematic and logic instructions : add, sub, and, or, xor, lsh, rsh, countofonebits, appendonebit

The input for these instructions are taken from the registers R1 and R2. The result of operation is copied from R3 register to R1 register which allows the continuation of the mathematic or logic instructions in a branch out.

add == addition two numbers
sub == subtraction of two numbers
and == logical and'ing of two numbers
or == logical or'ing of two numbers
xor == logical exclusive or'ing of two numbers
lsh == left shifting of a number by so many places given in R2
rsh == right shifting a number by so many places given in R2
countofonebits == count the number of 1 bits in the word given in R1. The bits are counted from the LSB
appendonebit == append a 1 bit in the word given in R1. The appending is done from the LSB

Usage ( lsb on top ) :
-------> opcode ( the mathematic or logic instruction )
-------> equal compare value
-------> address for jump on less-than
-------> address for jump on greater-than

After the operation, the instruction will act as below :

a. If the result contains the equal compare value, the automatically jumped-to address is of the instruction very next to the current instruction.

b. If the result is lesser than equal compare value, the 'address for jump-on-less-than' field is used to automatically jump to the relevant instruction's address.

c. Otherwise, the result is considered greater than the equal compare value and the 'address for jump-on-greater-than' field is used to automatically jump to the relevant instruction's address.

This system is called “Conditional Jumps”. The automatically jumped-to addresses are absolute addresses.

kenter and kexit instructions

The kenter instruction when called by a process, changes the process' execution path from user mode to kernel mode so that the kernel can perform various actions according to the provided arguments which should be at the top of the the process' call communications page. The kexit instruction changes the process' execution path back to user mode.

Usage for kenter ( lsb on top ) :
-------> opcode ( kenter )
-------> zero
-------> zero
-------> zero

Usage for kexit ( lsb on top ) :
-------> opcode ( kexit )
-------> zero
-------> zero
-------> zero

Example :
---> kenter 0, 0, 0
---> kexit 0, 0, 0

Instructions for critical sections

[ To be done ]

Process context instruction

The switchprocesses instruction normally switches between two processes by saving the process context ( Process Status register, R1, R2, R3 registers, Instruction Pointer register and Loop Counter register ) in the process descriptor pointed to by the Current Process register, loading the context of the new process by taking the process descriptor provided as the second parameter and updating its Current Process register, the Current Page Directory register and the Process Status. If the first parameter is provided as '1' then this can be used in two situations : (a). When the OS starts and the very first process is to be started which means that no context saving will be done and the second parameter will be used directly, (b). When calling a service via the instruction reloading the current process' registers but the kernel having priorly changed the page directory of the current process to be the service's directory's copy and the "Instruction Pointer" field in the current process' context to be the service function ( more info on this in the OS' synchronous IPC calls section ). If not these two situations then a '0' is to be provided as the first parameter.

Usage for switchprocesses ( lsb on top ) :
-------> opcode ( switchprocesses )
-------> if at system start or if want to run a service
-------> address of process descriptor of process to load
-------> zero

Interrupt handling

[ To be done ]

List of instructions that can be executed only in kernel mode

switchprocesses
kexit

[ To be done ]


The Spice operating system overview

The OS as said earlier is based on microkernel architecture which means the kernel has just a few facilities which are process creation and scheduling, synchronous IPC, interrupt redirection, timers and critical section synchronization. The rest of the services are facilitated by user-mode server processes. This simple division of work makes the software system more reliable.

Below are the syscalls and other OS elements whose shape may be modified as per development in processor ISA and OS design :

Process scheduling and process management calls

Below are the 16 process priorities possible :

  1. Highest priority processes == priority 0
  2. Interrupt handler processes == priority 1
  3. Normal priority processes == priorities 2 to 15

Minimum number of pages alloted to a process : 7 pages = 28 KB : Process descriptor ( 4 KB ), page directory ( 4 KB ), code page table ( 4 KB ), data page table ( 4 KB ), code page ( 4 KB ), regular data page ( 4 KB ), call communication page ( 4 KB ).

Minimum number of pages alloted to child processes : 2 pages = 8 KB : process descriptor ( 4 KB ), call communications page ( 4 KB ).

Structure of call communications page : At zero address will be the syscall number, then arguments. At half-way point will be error code, then return values. So unlike in other OS' like Linux, in Spice a syscall can return a number of values through this page.

[ To be done ]

The synchronous IPC calls

Every server process can have 10 named services. The service call will work mostly within the memory context of the server process except for the client's I/O buffer simply and directly attached to it and the client's call communication page attached to it. This negates the need to do kernel-buffering of the message itself nor is there the need for the kernel to copy the message between the client process and the server process. The service call will work within the process context of the client. The service call's invocation will directly jump to the service function's address and will run in this part-server part-client memory and process context until either the service call does not return to the client voluntarily or the call times out. This negates the need for a server to have threads / co-processes to service each client. A service may invoke at max one more server's service but the process context will remain the original client. Ideally the service call should return as fast as possible but there is the potential of a service's resource being in contention with the service calls of other client processes that request from the same service. The resources can be such things like filesystem cache and device port memory addresses. So this contention is resolved by the service call useing the Mutex resource synchronization calls which are described in the next section.

All in all the OS' IPC calls are like in a monolithic kernel systems because of the service being automatically multi-threaded without the service explicitly creating separate threads / processes but the IPC calls are also like in microkernel systems because a service call is performed within the protected memory bounds of a microkernel-based service.

serviceAdd(Service Name, Service Function)

Service Number = serviceConnect(Service Name)

The above function will also create a copy of the service's page directory and associate it with the service's entry in the client process so that when the service is called the calling client process will be attached with the service's page directory instead of its original though yes the caller's original page directory association will be saved, and restored when the service call returns to the client. During the process of assigning the service's page directory to the client process the client's I/O buffer's page table is attached to the relevant address entry in this particular service page directory other than of course attaching here the process' call communication page. Any new service connection in the process system will create a copy of the service's page directory for each process in the process system.

serviceInvoke(Service Number, Buffer Address, Buffer Size, Timeout Duration)

The above call will use the switchprocesses processor instruction.

serviceReturnToClient()

[ To be done ]

Mutual exclusion calls

mutexTake(Mutex Number)

mutexRelease(Mutex Number)

There is no need to specially create a mutex. Every process when created is allocated with 10 mutexes which can be shared with all the process' child processes.

Mutexes within a non-service process environment are accessed within the process context of the parent process and its child processes. Mutexes in a service environment though called from service invoker client processes are accessed within the process context of the service process.

A non-service process and its child processes will have the same fixed priority. But in case of service invocations the same section of service code can run at different priorities concurrently because the code will run within the process context of the calling client process which may have a different priority to another client process which has invoked the same service. In such a situation if an in-service process wants to take a mutex and there is a lower priority in-service process that has already taken that mutex then the current priority of the current owner is increased to that of the higher priority in-service process that wants to wait for the mutex i.e the lower priority process will inherit the priority of the higher priority process. The current owner's current priority is again adjusted to its original when it releases the mutex. This temporary upping of priorities is called Priority Inheritance and ensures avoidance of the Priority Inversion problem. A good explanation of Priority Inversion and Priority Inheritance is given on the Geeks for Geeks website on this page.

Program executable file format

The simple A.out format - https://wiki.osdev.org/A.out

[ To be done ]