Project maintainers


Name: taar
Created: Dec 16, 2017
Updated: Sep 28, 2021
SVN: No files checked in
Bugs: 1 reported / 0 solved
Star5you like it: star it!

Other project properties

Category:System on Chip
Development status:Planning
Additional info:
WishBone compliant: No
WishBone version: n/a
License: GPL

Project Sapphire - General overview ( project formerly known as Taar )

This project comprises of a SoC type microprocessor and a microkernel-based operating system both of which are open source. Both are called Sapphire.

The projects are still being designed and I welcome anyone to contribute. This document is a design specification based on which further processor and OS architecture will be defined.

The Sapphire processor is the first processor and the Sapphire OS is the second OS project to be locally designed in India. The first OS was a very simple one written by me a few years ago. The Sapphire processor and OS are being designed to complement each other.

In this document the processor's details are given first and then the OS'.

General processor overview

The Sapphire processor is meant to be a simplified, single core CPU, clock-less design. There will be a 3D graphics processing unit as part of the SoC. The processor is presently meant to be used in a wearable computer, embedded and server systems. The design, beyond commonalities with all designs, is not related to any specific processor.

The chip is styled as a System-on-Chip ( SoC ) whose I/O subsystems are two LED display interfaces, USB, WiFi transceiver, general purpose timer, system reset timer, general purpose 0-to-1 output pins and general purpose 0-to-1 input pins.

To accept analog phenomena into the system, instead of an ADC chip a USB-connected camera is to be used. The camera is placed in proximity of an analog device and the external data is taken in visually. This opens up possibilities like receiving and processing analog data like from analog meters and taking input from multiple analog streams ( for example multiple pulsing LEDs afixed to the camera - the LEDs can be connected to different microphones ). For the system digital data to induce external analog phenomena, instead of a DAC chip the other LED display is to be used for example with multiple LDRs afixed to the display ( for example the LDRs connected to different surround sound speakers ).

At the moment there are less than 30 instructions but with furthering of the CPU core design, the GPU design, enabling DSP applications and also furthering of the kernel design more instructions will be added. After a comfortable round of design the processor can be ready for a FPGA implementation.

Project contributors

There used to be one contributor to the processor project, Vishal Zuluk, and his contributions were :

  1. The USB subsytem is the result of general discussions with him.

Register set

The ALU ontains nine types of registers, as listed below along with their bit-length :

ALU busy status -------- 1 bit

Process Status ------ 32 bits

Instruction ------------ 128 bits ( to contain the entire instruction so that it can be decoded )

Data --------------------- 40 bits ( to contain the read data so that it can be decoded )

R1 --------------------- 32 bits ( use explained later )

R2 --------------------- 32 bits ( use explained later )

R3 --------------------- 32 bits ( use explained later )

Instruction pointer ---- 32 bits ( to hold the next instruction's address )

Loop counter ---- 32 bits

Memory Management

The MMU is paging-basesd. A page directory in Sapphire processor is sized 5 KB and is flexible in the sense that it can be connected to a 5 KB page table or a 4 KB data page or to a 4 MB data page.

Structure of a page directory entry and a page table entry :
-------> Status ( 8 bits )
-------> Base address ( 32 bits )

The Status bits in the page directory will consist of :
-------> Bit 0 - Entry present
-------> Bit 1 - Kernel page table
-------> Bit 2 - Syscall communications page
-------> Bit 3 - 4 KB client I/O buffer page
-------> Bit 4 - 4 MB client I/O buffer page
-------> Bit 5 - Process code page table
-------> Bit 6 - Process data page table
-------> Bit 7 - Reserved

The Status bits in the page table will consist of :
-------> Bit 0 - Entry present
-------> Bits 1 to 7 - Reserved

Each page directory and page table will consist of 1024 entries hence their size will be 5120 bytes ( 5 KB ).

Memory map of a process :
-------> Kernel Page tables from 0 address to 24 MB
-------> Syscall communications page next
-------> Client I/O buffer page next
-------> Process code page tables from 1 GB to 1.5 GB
-------> Process data page tables from 2 GB to 4 GB

The memory allocator function in the kernel should maintain two memory pools :

(a). Blocks of 1024 bytes for page directories, page tables and 4 KB data pages ( including for 4 KB client buffers ), (b). Blocks of 4 MB for 4MB client buffer data pages.

The Status word is the first word in any Process Descriptor and the first byte of the word is a bit-field which will indicate the memory bank that the process's page directory is located in. There can be eight memory banks possible which individually can be as large as 4 GB. Therefore 8 banks x 4 GB each is 32 GB total memory possible in the system.

[ To be done ]

Instructions format

At present there are 25 instructions as given below as a single list :

load-imm-r1, load-imm-r2, load-mem-r1, load-mem-r2, load-mem-byte-r1, load-ptr-r1, load-ptr-r2, store-r1, store-r2, store-byte-r1, store-ptr-r1, store-ptr-r2, add, sub, and, or, xor, lsh, rsh, inv, loop-forever, load-loop-counter, loop-next, sysenter, sysexit

Further, each instruction described in detail. Every Sapphire processor instruction is 128 bits long. The instructions are all of this fixed length and fixed format to allow the processor logic to read instructions at a determinate rate and keep things simple.

The instruction format is listed below in a vertical manner. The first line below is the first field of the instruction format from the right. The operands are being presented without names here because their usages are different for different Operation Code ( hereforth called opcode ). The numbers within the brackets are their bit-length :

Operation code -------- ( 32 bits )
Operand1 -------------- ( 32 bits )
Operand2 -------------- ( 32 bits )
Operand3 -------------- ( 32 bits )

The opcode field contains the precise instruction number that has to be executed by the execution core. Any user-mode program filling wrong operation code will be trapped at this point and terminated. A code section in the Main Control Program ( the kernel ) can possibly fill this field with wrong operation code. In that case, the processor entirely should halt and an appropriate external pin should become active or low. Actually only the first byte of the opcode contains the instruction number. The rest of this field should be filled with zeroes by software ( compiler ).

load-imm-r1, load-imm-r2 instructions

Copy into register R1 or R2 the immediate value from Operand1.

load-imm-r1 and load-imm-r2 :

Usage ( lsb on top ) :
-------> opcode ( load-imm-r1 or load-imm-r2)
-------> immediate value
-------> zero
-------> zero

Example :
---> load-imm-r1 0xfeedf00d 0, 0

load-mem-r1, load-mem-r2 instructions

Copy into register R1 or R2 the word value from the memory address pointed to via Operand1 + Operand2.

load-mem-r1 and load-mem-r2 :

Usage ( lsb on top ) :
-------> opcode ( load-mem-r1 or load-mem-r2 )
-------> the starting address
-------> the offset
-------> zero

Example :
---> load-mem-r1 [0xf00df00d], 20, 0

load-mem-byte-r1 instruction

Copy into register R1 the byte value from the memory address pointed to via Operand1 + Operand2.

load-mem-byte-r1 :

Usage ( lsb on top ) :
-------> opcode ( load-mem-byte-r1 )
-------> the starting address
-------> the offset
-------> zero

Example :
---> load-mem-byte-r1 [0xf00df0ed], 20, 0

load-ptr-r1, load-ptr-r2 instructions

Copy into register R1 or R2 the value from the memory address second-level-pointed-to via Operand1 + Operand2.

load-ptr-r1 and load-ptr-r2 :

Usage ( lsb on top ) :
-------> opcode ( load-ptr-r1 or load-ptr-r2 )
-------> the pointer address
-------> the offset into the second-level address
-------> zero

Example :
---> load-ptr-r1 [0xfeedfade], 20, 0

The load instructions and their counterpart, the store instructions ( explained next ), exist because the mathematic and logic instructions do not access memory directly. Such partitioning allows keeping the ISA simple and clean. It also allows faster mathematic and logic instruction execution in one core versus relatively slower memory access in another core.

store-r1, store-r2 instructions

Copy a word into a memory address the value from the register R1 or R2.

The mathematic and logic instructions don’t write back the result to memory after execution and therefore the code will have to use this instruction if a memory write-back is needed. Having a separate write-back will allow most instructions to access memory without too much queuing thus increasing hardware-level parallelism. A second effect is that the code that only compares and does not need the result value in memory, needn't take time for a write-back.

Usage ( lsb on top ) :
-------> opcode ( store-r1 or store-r2 )
-------> the starting address
-------> the offset
-------> zero

Example :
---> store-r1 [0xdead0000], 10, 0

store-byte-r1 instruction

Copy a byte into the memory address the value from the register R1

Usage ( lsb on top ) :
-------> opcode ( store-byte-r1 )
-------> the starting address
-------> the offset
-------> zero

Example :
---> store-byte-r1 [0xdead000d], 10, 0

store-ptr-r1, store-ptr-r2 instructions

Copy from register R1 or R2 the value into the memory address second-level-pointed-to via Operand1 + Operand2.

store-ptr-r1 and store-ptr-r2 :

Usage ( lsb on top ) :
-------> opcode ( store-ptr-r1 or store-ptr-r2 )
-------> the pointer address
-------> the offset into the second-level address
-------> zero

Example :
---> store-ptr-r1 [0xfeedfade], 20, 0

The eight mathematic and logic instructions : add, sub, and, or, xor, lsh, rsh, inv

The input for these instructions are taken from the registers R1 and R2. The result of operation is copied from R3 register to R1 register which allows the continuation of the mathematic or logic instructions in a branch out.

add == addition two numbers
sub == subtraction of two numbers
and == logical and'ing of two numbers
or == logical or'ing of two numbers
xor == logical exclusive or'ing of two numbers
lsh == left shifting of a number by so many places given in R2
rsh == right shifting a number by so many places given in R2
inv == invert a number

Usage ( lsb on top ) :
-------> opcode ( the mathematic or logic instruction )
-------> equal compare value
-------> address for jump on less-than
-------> address for jump on greater-than

After the operation, the instruction will act as below :

a. If the result contains the equal compare value, the automatically jumped-to address is of the instruction very next to the current instruction.

b. If the result is lesser than equal compare value, the 'address for jump-on-less-than' field is used to automatically jump to the relevant instruction's address.

c. Otherwise, the result is considered greater than the equal compare value and the 'address for jump-on-greater-than' field is used to automatically jump to the relevant instruction's address.

This system is called “Conditional Jumps”. The automatically jumped-to addresses are absolute addresses.

The design of these eight instructions could have included another word element to provide the R2 value to the instruction but perhaps addition of another word to the already four words in the instruction will provide for more latency which of course is not desired. If a change is desired in the R2 register then the load-imm-r2 or load-mem-r2 instructions can be used. This keeps the system simple.

Loop instructions

These instructions allow for program loops.

loop-forever : This is a simple unconditional jump to the instruction which is the start of the loop. Its equivalent in C language is the 'do while(1)' loop.

Usage for loop-forever ( lsb on top ) :
-------> opcode ( loop-forever )
-------> loop start address
-------> zero
-------> zero

load-loop-counter : Loads count into the loop counter register. The count should be a positive number :

Usage for load-loop-counter ( lsb on top ) :
-------> opcode ( load-loop-counter )
-------> count
-------> zero
-------> zero

loop-next : Subtracts one from the counter register and if non-zero goes back to the loop start else goes to the next instruction after the loop. Its equivalent in C language is the 'do while( count > zero )' loop :

Usage for loop-next ( lsb on top ) :
-------> opcode ( loop-next )
-------> loop start address
-------> zero
-------> zero

Process context instructions

[ To be done ]

sysenter and sysexit instructions

The sysenter instruction changes the process' execution path from user mode to kernel mode so that the kernel can perform various actions according to the provided arguments which should be at the top of the the process' syscall arguments page. The sysexit instruction changes the process' execution path back to user mode.

Usage for sysenter ( lsb on top ) :
-------> opcode ( sysenter )
-------> zero
-------> zero
-------> zero

Usage for sysexit ( lsb on top ) :
-------> opcode ( sysexit )
-------> zero
-------> zero
-------> zero

Example :
---> sysenter 0, 0, 0
---> sysexit 0, 0, 0

Interrupt handling

[ To be done ]

Process scheduling

[ To be done ]

List of instructions that can be executed only in kernel mode


[ To be done ]

The Sapphire operating system overview

The OS as said earlier is based on microkernel architecture which means the kernel has just a few facilities which are process creation and scheduling, synchronous IPC, interrupt redirection, timers and critical section synchronization. The rest things are facilitated by user-mode server processes.

Below are the syscalls and other OS elements whose shape may be modified as per development in processor ISA and OS design :

Process management calls

Minimum number of pages alloted to a process : 6 pages = 26 KB : Page directory ( 5 KB ), page table ( 5 KB ), regular data page ( 4 KB ), code page ( 4 KB ), process descriptor ( 4 KB ), syscall communication page ( 4 KB ).

Minimum number of pages alloted to child processes : 2 pages = 8 KB : process descriptor ( 4 KB ), syscall communications page ( 4 KB ).

Structure of syscall communications page : At zero address will be syscall number, then arguments. At half-way point will be error code, then return values. So unlike in other OS' like Linux, in Sapphire a syscall can return a number of values through this page.

[ To be done ]

The synchronous IPC calls

Every server process can have 20 named services. Once each service is connected to by a client the client can ask for five general sub-services ( Open, Send, Receive, Control, Close ) and many server-specific sub-sub-services when invoking a service. The service call will work mostly within the memory context of the server process except for the client's I/O buffer simply and directly attached to it and the client's syscall communication page attached to it. This negates the need to do kernel-buffering of the message itself nor is there the need for the kernel to copy the message between the client process and the server process. The service call will work within the process context of the client. The service call's invocation will directly jump to the service function's address and will run in this part-server part-client memory and process context until either the service call does not return to the client voluntarily or the call times out. This negates the need for a server to have threads / co-processes to service each client. A service may invoke at max one more server's service but the process context will remain the original client. Ideally the service call should return as fast as possible but there is the potential of a service's resource being in contention with the service calls of other client processes that request from the same service. The resources can be such things like filesystem cache and device port memory addresses. So this contention is resolved by the service call useing the Mutex resource synchronization calls which are described in the next section.

All in all the OS' IPC calls are like in a monolithic kernel systems because of the service being automatically multi-threaded without the service explicitly creating separate threads / processes but the IPC calls are also like in microkernel systems because a service call is performed within the protected memory bounds of a microkernel-based service.

serviceAdd(Service Name Word Count, Service Name, Service Function, Expected Client Buffer Size)

Service Number, Expected Client Buffer Size = serviceConnect(Service Name)

serviceInvoke(Service Number, Service Type, Buffer Size, Buffer Address, Timeout)

The Buffer Size parameter takes either a 4 KB setting or a 4 MB setting to allow for two situations :

-------> a. Where the entire server is set up to service its clients with 4 KB I/O buffers because such clients can make do with smaller 4 KB buffers as appropriate by application type.

-------> b. Where the server will service its clients at each service level with either 4 KB I/O buffer or 4 MB I/O buffer depending on the service type.

The two points above will remove the need for all clients and all services to use uniform 4 MB I/O buffers thereby saving system memory.


[ To be done ]

Mutual exclusion calls

mutexTake(Mutex Number)

mutexRelease(Mutex Number)

There is no need to specially create a mutex. Every process when created is allocated with 10 mutexes which can be shared with all the process' child processes.

Mutexes within a non-service process environment are accessed within the process context of the parent process and its child processes. Mutexes in a service environment though called from service invoker client processes are accessed within the process context of the service process.

A non-service process and its child processes will have the same fixed priority. But in case of service invocations the same section of service code can run at different priorities concurrently because the code will run within the process context of the calling client process which may have a different priority to another client process which has invoked the same service. In such a situation if an in-service process wants to take a mutex and there is a lower priority in-service process that has already taken that mutex then the current priority of the current owner is increased to that of the higher priority in-service process that wants to wait for the mutex i.e the lower priority process will inherit the priority of the higher priority process. The current owner's current priority is again adjusted to its original when it releases the mutex. This temporary upping of priorities is called Priority Inheritance and ensures avoidance of the Priority Inversion problem. A good explanation of Priority Inversion and Priority Inheritance is given on the Geeks for Geeks website on this page.

Program executable file format

The simple A.out format -

[ To be done ]