OpenCores

Project maintainers

Details

Name: taar
Created: Dec 16, 2017
Updated: Mar 22, 2023
SVN: No files checked in
Bugs: 1 reported / 0 solved
Star6you like it: star it!

Other project properties

Category:System on Chip
Language:Other
Development status:Planning
Additional info:
WishBone compliant: No
WishBone version: n/a
License: GPL

Project Kosmos - Introduction ( project formerly known as Taar )

This project comprises a SoC ( System-on-Chip ) type microprocessor and a microkernel-based operating system which are being designed to complement each other. A necessary factor here is to keep the processor and OS architectures as simple as possible, general purpose, reliability-capable and elegant and this has resulted in the processor architecture going away from traditions in multiple things. The project is not related to any other architecture in most things.

This document is a design specification based on which further processor and OS architecture will be defined. In this document the processor's details are given first and then the OS'.

The name of the project comes from two sources :

  1. The Russian word Kosmos meaning Space, what is beyond the limits of Earth or any world, to indicate that this project is also intended to be used in spacecraft, spacesuits and vehicles and habitats meant to work on other worlds - all in due time, but also to indicate the large universality of the application of the project.
  2. The Greek word Kosmos whose one meaning I find appropriate : "1. Kosmos is found in Greek writings from Homer down with the basic meaning of "an apt and harmonious arrangement or constitution". A condition of orderliness, orderly arrangement, order. It denotes what is well assembled or constructed from its individual parts. " The derived meaning indicating in the project that the processor hardware is meant to work in harmony with the operating system software written specifically for the processor - the OS software complementing the processor hardware.

The humanist, simplifying, thoughtful and Nature-abiding ideology of the Kosmos project will be enhanced by having as its anthem the song Shob lokey koy, a beautiful mixing in Coke Studio Bangla of the thoughts of two South Asian philosophers of older times - Lalon Fakir and Kabir Das. As one of the comments on the video page puts it, this song should really be adopted as the anthem of the United Nations Organization.

Project participants other than me

  1. Apoorva JR.
  2. Vikram - for the logo.
  3. The website Freepik for the logo.

Kosmos processor version 1 overview

The Kosmos version 1 SoC processor has two broad parts : (a). Digital data-processing part, (b). I/O.

In the digital data-processing part the processor is arranged as a vector processor with an Instruction Dispatch Unit and multiple Vector Units. The Instruction Dispatch Unit doesn't operate on a clock and all Vector Units have a common design and execute in lock step upon the same instruction but using different data. The number of Vector Units can vary in chip production depending on the application of the SoC. The instruction execution happens on a 32-bit addressing and 32-bit data processing basis and depending on the instruction the data can be integer or real. Integers will be used only for bitwise logic operations like 'and' and 'xor'. For mathematical operations like 'add' only real numbers will be used and real numbers here will be computed through the recent Posit Number System which I am still getting to know and will use the system as is or a simplified variation. Integer-based mathematical operations aren't present because I think the system will be simpler if there is consistent circuitry and just one programming data type for mathematical operations by using just Posit. Further, multimedia operations will be done through special generic instructions which I don't think need fixed integer-based mathematical instructions ( add, sub, mul ) and these generic multimedia instructions will be vectorized to enable fast, software-based decoding of multimedia formats. There will be addition of instructions also with the furthering of the kernel design. There is no subroutine stack in the processor i.e. Kosmos is a stackless processor.

The I/O part of the SoC is the Molecule host subsystem. As an SoC subsystem it enables the addresses and data generated within the SoC to be exchanged with the devices outside the SoC and also routes data between the devices outside the SoC. So the Molecule host circuitry is also a central routing system.

The SoC is overall a simpler processor in the larger scheme of things and can be used in modular form like a main SoC for regular programs and then it can be connected to one or more Kosmos co-processors implemented on the same board. The co-processors are every bit same as the main processor but just configured for a separate role.

After a comfortable round of design the SoC can be ready for an FPGA implementation and the source code(s) presented through the opencores.org channels.

Kosmos processor keywords

Clock-less instruction execution, 32-bit addressing, 32-bit data processing, vector processor, Posit number system, stack-less programming, simple, few instructions, new serial-based I/O system with in-built interrupt controller, paging-based 4 GB memory model, meant for microkernel-based OS, modular.

Kosmos processor and operating system memory management

Memory management is paging-based and the total addressable space is 4 GB. A page directory in Kosmos processor is sized 4 KB, has 1024 four-byte-long entries as is traditional but is flexible in the sense that it can be connected to a 4 KB page table or to a 4 KB data page or to a less than 4 KB data "page". The paging system's view of memory is primarily of a set of 4 KB pages throughout. The traditional data cache and code cache have been discarded altogether and in their place the Kosmos processor will be stacked with the DRAM memory module via a stacking technique like HBM. This simplifies the memory system and eliminates the traditional cache management circuitry. The addressing is 32-bit and thus addressable space is limited to 4 GB as the idea is to have small modules of stacked processor + memory which will allow small, triple-redundancy-participating and parallel-processing network of Kosmos vector processors.

Structure of a page directory entry :
-------> Status ( 8 bits )
-------> Page number ( 24 bits )

The Status bits in the page directory will consist of :
-------> Bit 0 - Kernel page table
-------> Bit 1 - Native process' process descriptor "page"
-------> Bit 2 - Native process' call communications page
-------> Bit 3 - Client process' process descriptor "page"
-------> Bit 4 - Client process' call communications page
-------> Bit 5 - Client process' I/O buffer's page table
-------> Bit 6 - Native process' code page table
-------> Bit 7 - Native process' data page table

Structure of a page table entry :
-------> Status ( 8 bits )
-------> Page number ( 24 bits )

The Status bits in the page table will consist of :
-------> Bit 0 - Entry present
-------> Bits 1 to 7 - Unused

Memory map of a process ( the arrangement of entries in the page directory ) :
-------> Kernel page tables ( 0th entry to 7th entry - 32 MB space - accessible only by kernel )
-------> Native process' process descriptor "page" ( 8th entry - accessible only by kernel )
-------> Native process' call communications page ( 9th entry - accessible by native calling process, kernel )
-------> Client process' process descriptor "page" ( 10th entry - accessible only by kernel )
-------> Client process' call communications page ( 11th entry - accessible by native calling process, kernel, service process )
-------> Client process' I/O buffer's page table ( 12th entry - accessible by native calling process, kernel, service process )
-------> Process code page tables ( 13th entry - 500 MB space - accessible by kernel, native any process )
-------> Process private data page tables ( entry position to be written - 4 MB space - accessible by kernel, native subprocess )
-------> Process common data page tables ( entry position to be written - 2.5 GB space - accessible by kernel, native any process )

As one measure of processor-level support for secure user-mode code execution i.e. when the If In Kernel Mode bit in the Process Status register ( described in below section ) does not contain a one the instruction has to be from within a process code page ( where the page's page directory entry's status bits is set for code i.e. bit 6 ). If a program's instruction is not from a code page the processor will generate an appropriate error which will call the relevant process in the desktop process group to handle this error essentially meaning terminating that user error process and generating a dump record of the process to be seen by the system administrator.

[ To be done ]

Kosmos processor register set ( under draft )

In Kosmos processor except in one instruction there are no registers exposed to the programs ( the kernel or a user-mode process ). The programs most times provide the instructions with bit-based settings and / or data addresses and / or immediate values and / or branch instruction addresses or nothing at all and the instructions do the work silently or show the result by automatically branching out to the provided code branch's address, as appropriate. This keeps the processor design simple. The Instruction Dispatch Unit which sends out the instructions to be executed, is clock-less but I think the speed bottleneck currently will be through the speed limitation of the clocked main DRAM memory module but the Kosmos processor's architecture is future-proof for the soon-coming time when main RAM will combine the speed of SRAM, the density of DRAM and the non-volatility of Flash. There is no legacy architectural complexity in Kosmos processor so vector processing is built-in from the start so this is unlike processors like ARM and RISC-V where there is an older, base scalar instruction set and vector instructions had to be brought in separately as extensions to the base.

The IDU ( Instruction Dispatch Unit ) contains x number of registers as listed below along with their bit length :

Current Process' Descriptor Address --------------- 32 bits

Current VU Set Pointer ---------------------------------- 32 bits

Current Page Directory Page Number -------------- 32 bits

Process Status --------------------------------------------- 32 bits ( From LSB : 1 bit for In Kernel Mode, 1 bit for In Interrupt, 1 bit for Breakpoint Enabled For Instruction Type, 1 bit for Breakpoint Enabled For Instruction Address, 1 bit for Breakpoint Enabled For Datum Address Access, 1 bits for Using Eight Vector Units, 2 bits for Vector Address Increment Method, 1 bit for If Blocked In Channel I/O, 9 bits Unused, 14 bits for Loop Counter )

Current Instruction Content --------------------------- 128 bits ( For instruction decoding, sufficient space to contain the largest instruction in the ISA which is setinstcfg )

Next-Instruction Address ------------------------------ 32 bits ( To hold the next instruction's address of the process )

IDU Scratchpad 1 ---------------------------------------- 32 bits

IDU Scratchpad 2 ---------------------------------------- 32 bits

IDU Scratchpad 3 ---------------------------------------- 32 bits

IDU Scratchpad 4 ---------------------------------------- 32 bits

IDU Scratchpad 5 ---------------------------------------- 32 bits

Source 1 Start Virtual Address ---------------------- 32 bits

Source 2 Start Virtual Address ---------------------- 32 bits

Destination Virtual Address -------------------------- 32 bits

Debugger Channel Address -------------------------- 32 bits

Each of the eight VUs ( Vector Units - computation units ) operate on three registers as below which are enclosed within every process descriptor ( which you will read in a further section ) :

VU Scratchpad 1 -- 32 bits

VU Scratchpad 2 -- 32 bits

VU Scratchpad 3 -- 32 bits

Kosmos processor instructions

At present there are 25 instructions as given below :

setinstcfg, copyimm, copyword, jmp, jmptbl, load, store, add, sub, mul, and, or, xor, lsh, rsh, getcountofonebits, getcountofzerobits, setfromzerobit, unsetbitsat, jmpresult, takeklock, saveprocess, loadprocess, kcall, kexit.

Each instruction described in detail further below. Except for Operation Code ( henceforth called opcode ) the Kosmos processor instructions are of variable length but to allow the processor logic to read instructions at a determinate rate and keep things simple as much as possible every instruction memory read will read 128 bits which is the length of the setinstcfg instruction - the longest instruction in the ISA. Once the the opcode is known and it is not of setinstcfg the remainder bits are disregarded.

Instruction format :
Opcode ------ ( 8 bits )
Operands --- ( variable number of bits according to opcode )

setinstcfg instruction

setinstcfg : Sets the configuration for the instruction stream in a process to be used by subsequent vector and scalar instructions ( like looping, arithmetic, logic and data copy ).

Usage ( lsb on top ) :
-------> opcode ( setinstcfg )
-------> number of vector units to use -- 1 bit
-------> update loop counter -- 1 bit
-------> update addresses -- 1 bit
-------> loop count -- 14 bits
-------> address increment method -- 2 bits
-------> unused -- 5 bits
-------> source address 1 -- 32 bits
-------> source address 2 -- 32 bits
-------> destination address -- 32 bits

"number of vector units to use" : 0b for 1 unit, 1b for 8 units.

"update loop counter" : If set then the Loop Counter field in the Process Setting register will be updated from the relevant operand in this instruction. This because sometimes the already-configured looping should not be affected by the changed values in other operands in the instruction.

"update addresses" : If set then the source and destination addresses already configured are updated.

"address increment method" : 00b for no increment, 01b for increment sources only, 10b for increment destination only, 11b for increment both sources and destination.

Example :
-------> setinstcfg 1b, 1b, 1b, 10, 11b, 00000b, 20, 40, 100

copyimm instruction

copyimm : Copy an immediate value to the specified VU register within the specified VU set.

"VU set number" : Value between 0 to 7 to represent one of the eight VU register sets. A check is done for whether the process is in kernel mode or user mode because in k-mode there is only one VU set being used.

"VU register x enable copying" : If a 1 then enables the instruction to copy the "VU register x immediate value" to the appropriate VU register.

Usage ( lsb on top ) :
-------> opcode ( copyimm )
-------> VU set number -- 3 bits
-------> VU register 1 enable copying -- 1 bit
-------> VU register 2 enable copying -- 1 bit
-------> VU register 3 enable copying -- 1 bit
-------> unused -- 2 bits
-------> VU register 1 immediate value -- 32 bits
-------> VU register 2 immediate value -- 32 bits
-------> VU register 3 immediate value -- 32 bits

Example :
-------> copyimm 0 1b 0b 0b 00b 198000 0 0

copyword instruction

copyword : Copy a memory word from the address in the first source to the destination address as per the configuration.

Usage ( lsb on top ) :
-------> opcode ( copyword )

Example :
-------> copyword

load instruction

load : Load data from memory into the first and second VU registers in the process descriptor as per the configuration.

store instruction

store : Copy data from the third VU registers in the process descriptor as per configuration and also update the addresses in the IDU as per the configuration.

jmp instruction

jmp : If the first operand is zero then the instruction performs an unconditional jump to the provided address. If however the first operand is a one then this instruction allows for counted loops where it subtracts one from the Loop Counter bits in the Process Status register which if non-zero makes the instruction to go to the instruction pointed by the second operand else if Loop Counter is zero goes to the next instruction after this instruction.

Usage ( lsb on top ) :
-------> opcode ( jmp )
-------> if a counted jump -- 8 bits
-------> address to jump to

"if a counted jump" : Either a one or a zero.

Example :
-------> jmp 0 _kadi_tay_hass_bol_vay

jmptbl instruction

jmptbl : This instruction provides a fast and convenient way of executing functions based on a numerical function identifier. For example in most OSes user-level processes call kernel calls by providing the call identifier number in a register and this register can be checked by individual 'if' statements in the kernel to arrive upon the correct kernel call associated with the call identifier and the call executed accordingly but kernel calls are many, so, many a number of the 'if' statements takes a lot of processor time and repetitive code space, so to avoid these two situations a call jump table can be used where in case of Kosmos processor and Kosmos OS the user process fills the kernel call number in the 0th word ( byte 0 ) in the call communications page and executes the kcall instruction whereupon the relevant kernel code is jumped-to which :

  1. Fills the 1st word in the call communications page with the the number that defines the number limit of the kernel calls.
  2. Fills the 2nd word in the call communications page with the address of the table of pointers to the kernel calls where the table has to exist in kernel space.

And then the kernel calls the jmptbl instruction which reads the call communications page's 0th word into IDU Scratchpad 1 and checks if it has a number greater than the kernel call number limit ( which is stored in the call communications page's 1st word ) and if so then the very next instruction is executed which has to take appropriate action, essentially returning to the user process with an error code in the error code word in the call communications page ( halfway down the page ). If however the 0th word in the call communications page has a number within the zero to limit-number range then the instruction goes to the jump table ( addressed from the call communications page's 2nd word ) which has pointers to the kernel calls. The content of the IDU Scratchpad 1 is multiplied by four ( the word size of instruction addresses ) to arrive upon the correct index into the jump table and the content i.e. the kernel call address retrieved in another IDU scratchpad and prepared for execution.

This procedure is similarly followed for jump tables used in user processes where the table exists in the process' code space. About the instruction addresses stored in the table for user-mode processes see note in the Memory Management section above.

Usage ( lsb on top ) :
-------> opcode ( jmptbl )

Example :
-------> jmptbl

The mathematic, logic and bitwise instructions

The inputs are taken from the source VU scratchpad registers and the result of operation go into in the destination VU scratchpad ( the third scratchpad ) as well as the first scratchpad.

add : Addition possibly of a matrix element.

sub : Subtraction possibly of a matrix element.

mul : Simple multiplication possibly of a matrix element.

and : Logical and'ing possibly of a matrix element.

or : Logical or'ing possibly of a matrix element.

xor : Logical exclusive or'ing possibly of a matrix element.

lsh : Left shifting of a number given in first VU register by so many places given in source second VU register.

rsh : Right shifting of a number given in first VU register by so many places given in second VU register.

Usage ( lsb on top ) :
-------> opcode ( the mathematical or logic or bitwise instruction )

Example :
-------> add

Other instructions for counting, setting and unsetting bits

getcountofonebits : Count the number of 1 bits in the word given in first VU scratchpad register. The bits are counted from LSB. Return the count in third VU scratchpad register.

Usage ( lsb on top ) :
-------> opcode ( getcountofonebits )

Example :
-------> getcountofonebits

getcountofzerobits : Count the number of 0 bits in the word given in first VU scratchpad register. The bits are counted from LSB. Return the count in third VU scratchpad register.

Usage ( lsb on top ) :
-------> opcode ( getcountofzerobits )

Example :
-------> getcountofzerobits

setfromzerobit : In the given 32-bit number in first VU scratchpad register find the first valid zero bit position and set the number of bits required to be set and copy the resulting number to the third VU scratchpad register.

Usage ( lsb on top ) :
-------> opcode ( setfromzerobit )
-------> bit position to ignore -- 8 bits
-------> number of contiguous bits to set -- 8 bits

"bit position to ignore" : After starting the search for zero from LSB ( LSB is bit zero ) if this position is found to have a zero it will be ignored and the search carried on. If still by the end of the word there is no zero bit then no error is raised because "ones" are anyway set in the word . But if an after-skipped "zero" has been found then the required number of bits are set as per the the "number of contiguous bits to set" operand and if these contiguous bits overshoot the remaining bits in the word then no error is raised.

Example :
-------> setfromzerobit 25, 3

unsetbitsat : Generally to be used after using setfromzerobit. The word to operate on is in the first VU scratchpad register. If the bits to unset are already zeroes then they are left as is. The result is copied to the third VU scratchpad register.

Usage ( lsb on top ) :
-------> opcode ( unsetbitsat )
-------> beginning bit position to unset from -- 8 bits
-------> number of contiguous bits to unset -- 8 bits

Example :
-------> unsetbitsat 8, 3

Comparison and branching instruction for the two above instructions sections

jmpresult : This instruction will work only with the first VU register set where the first operand will be compared with the third VU register and the instruction jumps or doesn't jump as per the result of the comparison.

Usage ( lsb on top ) :
-------> opcode ( jmpresult )
-------> equal compare value -- 32 bits
-------> address for jump on less-than
-------> address for jump on greater-than

Example :
-------> jmpresult 2909, _yay_mera_deewaanapan_hai, _bulbuli

Instruction for kernel critical sections

takeklock : "Take" a 32-bit in-kernel lock memory word if possible. There are two rules of the instruction : (a) Before any kernel code calls this instruction it should fill into the process descriptor's 0th address the address of the word to be locked, (b). If the lock word's value is 0x00000000 it is an unlocked lock and if the value is 0xFFFFFFFF it is a taken lock.

The instruction operates as follow :

  1. Reads into IDU Scratchpad 1 the value in the IDU register Current Process' Descriptor Address. That value is also offset 0 into the process descriptor and is the descriptor field 'Address Of Lock Word'.
  2. Copies IDU Scratchpad1 to IDU Scratchpad 2 which now also has the lock word's address.
  3. Reads into IDU Scratchpad 3 the value from the address found in IDU Scratchpad 2. IDU Scratchpad 3 now has the "old value" of the lock.
  4. If "old value" in IDU Scratchpad 3 is 0x00000000 then IDU Scratchpad 2 is set to 0xFFFFFFFF and copied to the lock's address found in IDU Scratchpad 1. The lock is now considered taken. The instruction jumps to the very next instruction address which starts the code for the lock-taken situation.
  5. However, if IDU Scratchpad 3 ( "old value" ) is 0xFFFFFFFF ( lock already taken by some other code ) then operand 1 is picked up as the address of the instruction to execute which will start the code for the lock-not-taken situation.

Usage ( lsb on top ) :
-------> opcode ( takeklock )
-------> address of code for if lock not taken

Example :

_che_vuole_questa_musica_stasera : // try to take lock
takeklock _jab_bhi_yay_dil_udaas_hota_hai
_aaj_jaanay_ki_zidd_na_karo // lock taken
...
...
_jab_bhi_yay_dil_udaas_hota_hai : // lock not taken
jmp 0 _che_vuole_questa_musica_stasera // try to take lock again

Usage and structure of the Call Communications Page

This page is mainly used to transfer call arguments and return values between user processes, kernel calls and the user-mode service calls. At byte 0 will be the kernel call and then the parameters. At halfway point in the page will be error code number, then return values. So unlike in other OSes like Linux which use processor registers to transfer call values, in Kosmos OS such an external call can transfer more number of values through this page. The page can also be used to store temporary values.

Instructions for process context operations

saveprocess : When called by the kernel this instruction will save the changing context of the current process i.e. via the "native" process descriptor address available at a fixed address in the current page directory. The changing context being the registers Process Status, the user-mode VU set and Next-Instruction Address.

Usage ( lsb on top ) :
-------> opcode ( saveprocess )

Example :
-------> saveprocess

loadprocess : When called by the kernel this will load a new process by taking the new process' descriptor's address from a fixed memory address ( not yet decided ) in kernel space and updating the process' Current Page Directory Page Number register, changing context and breakpoint information.

Usage ( lsb on top ) :
-------> opcode ( loadprocess )

Example :
-------> loadprocess

kcall : When called by a user level process this will save the user-mode Process Status register and the user-mode Next-Instruction Address into the native process descriptor and write a one to the If In Kernel Mode bit in the Process Status register, appropriately point the Current VU Set Pointer register and change the process' execution path from user mode to kernel mode by jumping to the kernel call handler function so that the kernel can perform various actions according to the provided arguments which should be in the call communications page starting from byte 0.

Usage ( lsb on top ) :
-------> opcode ( kcall )

Example :
-------> kcall

kexit : Called by the kernel to get a process back from kernel call to user mode. When called in the kernel the instruction will read the current native process' descriptor to reload the user-mode Process Status making the If In Kernel Mode bit set to zero and will then read the user-mode Next-Instruction Address word to use the content address as the instruction in user space to get the process back into user mode.

Usage ( lsb on top ) :
-------> opcode ( kexit )

Example :
-------> kexit

[ To be done ]

Interrupt handling

[ To be done ]

List of instructions that can operate on a vector

add
sub
mul
and
or
xor
lsh
rsh
getcountofonebits
getcountofzerobits
setfromzerobit
unsetbitsat

List of instructions that can be executed only in kernel mode

saveprocess
loadprocess
kexit
takeklock

[ To be done ]

Molecule I/O system

Outside the SoC at physical layer the Molecule system will electrically include four pairs of differential-signaling-based single-lane bi-directional serial interfaces connecting the SoC and the devices or the SoC and peer Kosmos SoC.

At data link layer each of the serial interfaces can transceive data prefixed with an eight-byte common protocol header which is as below :

-------> Packet Type ( 3 bits )
-------> Packet Direction ( 1 bit )
-------> Packet Priority ( 1 bit )
-------> Source Device Type ( 5 bits )
-------> Destination Device Type ( 5 bits )
-------> Unused ( 1 bit )
-------> Count Of Message Blocks Being Transferred ( 8 bits )
-------> Number Of The Message Block Being Currently Transferred ( 8 bits )
-------> Start Address ( 32 bits )

After the header will come a data block which is either 4 bytes long or 16 bytes long or 256 bytes long where the data block is specific to the device and can carry formatted or unformatted data. The data block is called a Message.

In the header :

-------> "Packet Type" can be one of : Get Device Information = 000b; Configure Device = 001b; 4ByteMessage = 010b; 16ByteMessage = 011b; 256ByteMessage = 100b; 4096ByteMessage = 101b; Acknowledgement = 110b; Device Error = 111b.
-------> "Packet Direction" : Device To Host = 1b; Otherwise = 0.
-------> "Packet Priority" : 1b = High priority. Perhaps when a 4 kilobyte page is being transferred from memory this field can indicate to the host SoC that this transfer is more important than other packet types.
-------> About "Device Type" : Currently there are 18 device types that the Molecule system connects to : Main SoC; Co SoC; Electronic RAM for data / program; System power control and status; Button; General purpose timer; Watchcat timer; Flash storage; System environment interface; Camera; LiDAR; Imagery output unit; WiFi transceiver; Optical communication transceiver; Microphone; Speaker; Aerial / satellite based locator; Lateral locator; String search.
-------> Examples of "Device Type" settings are : Button = 00001b, WiFi = 00100b.
-------> About "Count Of Message Blocks Being Transferred" : A zero count means one message block which can be either of four bytes or 16 bytes or 256 bytes or 4096 bytes as per the "Packet Type". 4 bytes, 16 bytes and 4096 bytes are transferred singly but about 256-byte transfers, since this field is 8 bits long a 256-byte message can be part of a transfer session of 64 kilobytes cut into 256 byte chunks. The 16-byte chunks can be Kosmos processor instructions because the biggest instruction in the Kosmos ISA is 128 bits long i.e. 16 bytes. A total transfered 64-kilobyte data can be for imagery transfer either from camera or to the output device. A 4096-byte transfer can be for page transfers from RAM to main SoC.
-------> About "Start Address" : This is the device-specific section or memory address to start the transfer from. For example, a section in the imagery output area.

The four serial lanes will be for : Inter-SoC communication; Data / Program RAM; Streaming devices ( imagery output, WiFi etc ); Non-streaming devices ( LiDAR, locator etc ).

Molecule interrupts will be generated within the SoC as per the transfer session.

The device type "Button" has the high priority. The device type "System environment interface" refers to sensors for temperature and electrical radiation.


Kosmos operating system version 1 overview

The OS as said earlier is based on microkernel architecture which means the kernel has just a few facilities which are :

  1. Process creation, control and deletion.
  2. Synchronous IPC ( Event Waiting / Posting and Buffered I/O ).
  3. Interrupt redirection.
  4. Timers.
  5. System control.

The rest of the services are facilitated by user-mode server processes. This simple division of work makes the software system more reliable.

Below are the kernel calls, system error codes and other OS elements whose shape may be modified as per development in processor ISA and OS design :

Process management

Below are the 16 process priorities possible :

  1. Highest priority processes == priority 0
  2. Normal priority processes == priorities 1 to 15

Minimum amount of memory alloted to a main process : 6 pages ( 24 KB ) + x bytes : process descriptor ( x bytes ), page directory, code page table, data page table, code page, regular data page, call communications page.

Minimum amount of memory allotted to every sub process : x bytes + 4 KB : process descriptor ( x bytes ), call communications page.

The "x bytes" for a process descriptor are "x" because as of now the process descriptor is not yet complete.

Process identifier method : .

[ To be done ]

Process descriptor fields ( under draft )

The process descriptor will be stored in special SRAM within the SoC to enable fast access to the process' vector scratchpad registers which are stored within the process descriptor. These scratchpads are stored this way because otherwise if they would have been stored in main RAM then during process context switches the saving and restoring of these vector scratchpads would have slowed down the operations much

Address Of Lock Word ---------------------------------------------------------------------------- 32 bits
Lock Word -------------------------------------------------------------------------------------------- 32 bits
Page Directory Page Number ------------------------------------------------------------------ 32 bits
User Mode Process Status ---------------------------------------------------------------------- 32 bits
User Mode Next-Instruction Address -------------------------------------------------------- 32 bits
Kernel Mode Process Status -------------------------------------------------------------------- 32 bits
Kernel Mode Next-Instruction Address ------------------------------------------------------ 32 bits
Original Priority ------------------------------------------------------------------------------------- 32 bits
Partition Address ----------------------------------------------------------------------------------- 32 bits
Main Process' Descriptor Address ------------------------------------------------------------- 32 bits
Opcode Of Instruction For Breakpoint ------------------------------------------------------- 32 bits
Address Of Instruction For Breakpoint ------------------------------------------------------- 32 bits
Address Of Datum For Breakpoint ------------------------------------------------------------ 32 bits
Address For Debug Target Register and Process Descriptor State Dump -------- 32 bits
Serialized Code 1 Instruction Address ------------------------------------------------------- 32 bits
Serialized Code 2 Instruction Address ------------------------------------------------------- 32 bits
Serialized Code 3 Instruction Address ------------------------------------------------------- 32 bits
Serialized Code 4 Instruction Address ------------------------------------------------------- 32 bits
Serialized Code Slot Address ------------------------------------------------------------------- 32 bits
User Mode Vector Scratchpad Registers ---------------------------------------------------- 768 bits
Kernel Mode Vector Scratchpad Registers -------------------------------------------------- 96 bits

[ To be done ]

Process partitioning

Partitioning is mainly used to ensure three things about untrustworthy process groups ( main process and sub processes ) or unresponsive processes or misbehaving interrupt handler processes or the user overusing the resources :

  1. That they do not take up all the processor's execution time and leave the other processes starved of execution. In Kosmos OS a partition execution quantum is 2 seconds from which the administrator can assign different percentages of execution time limits to different processes and within every 2 seconds these processes execute for only those times.
  2. That they are prevented from over-allocating memory to themselves and this is done by allotting a memory quota to the partition. This allows trustworthy processes to have sufficient memory to allocate to themselves.
  3. That they are prevented from accessing kernel services except those they are permitted. This is done through the "Capabilities" system.

List of partition "Capabilities" ( under draft )

CAP_PROCESS_DEBUGGER ( be able to set instruction and / or data breakpoint info on a target process, get process descriptor and register state of the process being debugged, receive notifications for processes failing some Capability settings )
CAP_PROCESS_CREATE
CAP_PROCESS_PRIORITY_CONTROL ( increase or decrease process priority )
CAP_PROCESS_STOP_AND_RESUME
CAP_PROCESS_DELETE
CAP_PROCESS_HW_INTERRUPT_CONNECT ( connect a process to a hardware device interrupt )
CAP_PROCESS_PARTITION_CONTROL ( create a partition, edit it, remove it )
CAP_PROCESS_RANDOM_GENERATE ( be able to generate random number )
CAP_PROCESS_MALLOC
CAP_PROCESS_TIMEOUTS
CAP_SERVICE_GET_LIST
CAP_SERVICE_ADD
CAP_SERVICE_RECEIVE
CAP_SERVICE_SEND
CAP_SERVICE_OBJECT_CREATE ( request a service to create a file, network connection, maybe a graphical element etc )
CAP_SERVICE_OBJECT_GET_INFO
CAP_SERVICE_OBJECT_SHRED ( primarily to shred a file but maybe there are other things to shred )
CAP_SERVICE_OBJECT_DELETE

In summary, a partition is created by the administrator and allotted processor time quota, memory quota and capabilities and then process groups are assigned to that partition and these process groups will run within those three bounds. This prevents any potential or active misbehaving process groups from negatively affecting the rest of the system and can be either terminated and then perhaps debugged, or be allowed to execute but preventive analysis can be done. The latter case can be in totally external situation like a legitimate network service process being sent too many packets from outside via denial-of-service attack or there can be a semi external situation like a system board device malfunctioning etc ( "Semi external" because the device is outside of the SoC ). However, a process can be legitimate yet can try to allocate too much memory ( like a communications program wanting to open too many tabs ) so must be preventatively notified. All in all, partitioning improves system reliability and responsiveness.

[ To be done ]

OS error codes

EBUSY ( kernel resources are busy )

[ To be done ]

Kernel call for setting timeouts and sleep

In Kosmos OS there is no need to explicitly create a timer because every process comes with its own timer which will be used for any timer-using kernel call for this process including to sleep it. There is just one, common kernel call for this purpose :

timerConfigure(Timer Duration, If Sleep Now)

If the second argument is a 0x00000001 ( a one ) then the first argument is used as the duration to block the calling process right there i.e. the process will sleep for that duration. If however the second argument is not a one then the call becomes a non-blocking call where the first argument is used to configure the process' timer for any subsequent kernel call which uses timeout and the timer will start at an appropriate point in that subsequent kernel call. If both arguments are zeroes then the timer is disabled both for sleep and for the subsequent kernel calls which use timeouts.

User-mode services and the IPC calls

The Kosmos microkernel provides only a few fundamental services so all other common services in the OS like graphical UI, storage and networking are to be provided by user-mode services which are nothing but process groups that know how to work on a particular resource and use the kernel's simple IPC mechanisms to provide the controlled use of that resource to other process groups. The kernel contains a text-based list of all the main services in a particular Kosmos SoC and each record in that list is called a Channel... Channel that can be connected to by other process groups for use. Each service can have multiple channels for different purposes ( the max number of channels in each service process is not yet decided ). Processes can get the list of the channels using the below kernel call :
-------> channelGetList(List Buffer)

For a service to add a channel into the kernel-based channel list the below kernel call is used :
-------> channelAdd(Channel Name, Channel Type, Service-local Channel Info Table Index)
-------> "Channel Type" is either :
------->>> Event
------->>> Regular I/O
------->>> Debug I/O
-------> "Service-local Channel Info Table Index" is a convenient index into a table that the service process maintains within its user-mode data space, containing any service-specific info about the channels it creates. This index is recorded into the channel's kernel structure and is part of the information returned to the service process by the unblocked channelWaitIO call described below. The table should not use the zeroth element in the table because zero index in this case will be returned by the kernel as an error indication.
-------> Channels have no priority.

And for a process to connect to a particular channel the below kernel call is used :
-------> Connection ID = channelConnect(Channel Name)

There are two ways that a service provides use of the channel and a client process utilizes the channels :

  1. Event Wait and Event Broadcast based IPC : Usually used to broadcast an event like the change of state in a data block within a service to multiple processes interested to know that change. The service associates a data structure within it with a channel and this can be specific to a service. Client processes interested in knowing any change to that structure will call a "wait" kernel call on that channel and block. Thus multiple processes will block on that channel. Once the service process makes change to that data structure it will make a "post" kernel call on that channel, unblocking all the processes waiting on that channel, in effect broadcasting the state change / event to all the waiting processes. The waiting client processes can be outside the service's process group or within. This is a simple system where only two 32-bit words ( the type of the event and the datum pertaining to the event ) are broadcast from the service to all the waiting processes to notify them of state change or an event. Below are the two kernel calls for this :

    -------> Event Type, Event Datum = channelWaitEvent(Connection ID)
    -------> channelPostEvent(Channel ID, Event Type, Event Datum)

  2. I/O-Buffer based IPC : Here a service process running at the priority given to the service's process group does a general channelWaitIO kernel call within a loop, without specifying any of the I/O type channels it has created and begins waiting for client I/O requests. The service process is removed from the running processes list. A client process requiring substantial I/O work from one of that service's channels, calls the channelDoIO kernel call on that channel. The kernel puts the client process into I/O-blocked mode and brings back the waiting service process into execution mode to process the client's I/O request with the following attributes :

    The service process is notified as to which of its channels has been sent the client I/O request.

    There is no copying of data from the client to the service because though the service works mostly within the memory context of the service process ( the service's page directory ) the client's call communications page and the client I/O buffer's entire page table are simply and directly attached to the service's page directory at two fixed addresses which means that the service process will be directly able to access the relevant client memory. This negates the need to do kernel buffering of the message itself nor is there the need for the kernel to copy the message between the client process and the server process. What must be said is that since the client buffer's page table is attached to the service's page directory it means that at a time the client can request 4 MB of I/O with the service so if the client buffer is larger than 4 MB then the client process will simply have to pass to the service that buffer after the 4 MB boundary.

    The service call will work within the process context of the service except the client's priority becoming the priority of the service call for the duration of the call. If however the currently-job-servicing service process is pre-empted by a higher priority process that wants to get service from that same channel this new process is not only attached to the channel's processes waiters list as before but the service process' priority is raised to become the higher priority of the new process. This is the Priority Inheritance implementation and is to negate the Priority Inversion issue which is nicely explained on the Geeks for Geeks website on this page.

    After the service has peformed work for the current client and has called channelWaitIO for the next client, the next queued client process will be chosen according to its priority and FIFO place in that priority's list.

    The service call will run in this part-server part-client memory and process context until either the service call returns to the client voluntarily through the channelReplyIO kernel call or the client call times out.

    The three kernel calls used for this I/O buffer based message passing mechanism are below, the first one and third one to be used by the service / channel handler process and the second by the client :
    -------> Service-local Channel Info Table Index, Client State, Transaction Number, Buffer Start, Buffer Length = channelWaitIO()
    -------> "Client State" is either :
    ------->>> CSTATE_NORMAL_CALL
    ------->>> CSTATE_CALL_TIMEOUT
    ------->>> CSTATE_PROCESS_DELETE
    ------->>> CSTATE_DEBUG_TRIGGER
    -------> channelDoIO(Connection ID, Service Type, I/O Buffer Start, Buffer Length)
    -------> channelReplyIO(Transaction Number)
    Any client process that has not been replied to, if its channelDoIO call times out or that process has been deleted, ChannelWaitIO returns with the relevant information to the service process on those situations. In the other situation where a transaction has been completed successfully the service process should not keep the transaction as a permanent record because the kernel can reuse the same transaction number for another transaction.
    The service process that has unblocked from a successful channelWaitIO may in turn do at most one channelDoIO to call the channel of another service to maybe fulfill the original client's channelDoIO.

// Begin section to be edited

Kernel calls for serialized code

These will be scodeEnter() and scodeExit() and will be described soon.

Debugging of user-mode programs

In the Kosmos OS implementation there will be no separate debugger program because that facility will be inbuilt into the desktop process which can do it because it has the capability bit CAP_PROCESS_DEBUGGER set in its main process. Programs can be debugged in two ways :

  1. The desktop process will present a password-protected user interface function to select a program to start with to-be-debugged mode set. In case of an embedded Kosmos implementation the in-all-implementations-running desktop process will get the debug request from the network. The desktop function will start the target program using the processCreate kernel call which sets the target's process descriptor with appropriate data in its Instruction Breakpoint Base Address field, Data Breakpoint Base Address field and the Break Point bit in the Process Status field set. The target program will halt at the appropriate breakpoints and the relevant desktop window visually shows the register state of the program and also its process descriptor state or the info is sent to the network.
  2. An already running program / process can also be set for debugging from either the Process Manager desktop function or through the network. In this case either of the two means will use the processControl kernel call which similar to above sets in the target process' process descriptor the instruction and data breakpoint fields and the Process Setting register's Break Point bit. This too will be password-protected.

The Instruction Breakpoint Base Address and Data Breakpoint Base Address are called "Base Address" because these addresses are the beginning of ranges where in case of instruction breakpointing any instruction about to be executed and falling within this range will trigger execution of the breakpoint handling process within the desktop process group, and in case of data breakpointing any read or write address falling within the range will trigger execution of the same breakpoint process. The breakpointing address range for both instruction and data will be within 1024 bytes from the base address. Breakpointing is checked during the various logics done during the preparation to execute an instruction in user mode. As to whether instruction breakpointing will be checked first or data breakpointing it will be instruction breakpointing.

// End section to be edited

List of all kernel calls ( under draft )

processCreate
processControl
timerConfigure(Timer Duration, If Sleep Now)
channelGetList
channelAdd
channelConnect
channelWaitEvent
channelPostEvent
channelWaitIO
channelDoIO
channelReplyIO
scodeEnter
scodeExit
systemControl

Program executable file format

The format will be a simplified derivation from the simple A.out format that is described here.

[ To be done ]