OpenCores

Project maintainers

Details

Name: taar
Created: Dec 16, 2017
Updated: Jul 10, 2021
SVN: No files checked in
Bugs: 1 reported / 0 solved
Star5you like it: star it!

Other project properties

Category:System on Chip
Language:Other
Development status:Planning
Additional info:
WishBone compliant: No
WishBone version: n/a
License: GPL

Project 'Ad Astra' - General overview ( project formerly known as Taar )

This project comprises of a SoC type microprocessor and a microkernel-based operating system both of which are open source. Both are called Ad Astra. The name is the English translation of the Latin words "Ad Astra" meaning "To the stars" and depicts my desire for the project to be used in space flight systems.

The projects are still being designed and I welcome anyone to contribute. This document is a design specification based on which further processor and OS architecture will be defined.

The Ad Astra processor is the first processor and the Ad Astra OS is the second OS project to be locally designed in India. The first OS was a very simple one written by me a few years ago. The Ad Astra processor and OS are being designed to complement each other.

In this document the processor's details are given first and then the OS'.

General processor overview

The Ad Astra processor is meant to be a simplified, single core CPU, clock-less design. There will be a 3D graphics processing unit as part of the SoC. The processor is presently meant to be used in a wearable computer, embedded and server systems. The design, beyond commonalities with all designs, is not related to any specific processor.

The chip is styled as a System-on-Chip ( SoC ) whose I/O subsystems are LED display interface, USB, WiFi transciever, timer, general purpose 0-to-1 output pins, general purpose 0-to-1 input pins, ADC channel and DAC channel.

Other than these, there will be a System Reset Timer which will reset the system in case of system hangs.

At the moment there are less than 30 instructions but with furthering of the CPU core design, the GPU design and also the kernel design more instructions will be added. Of course, only after the first round of design will the processor be ready for a FPGA implementation.

Project contributors

There used to be one contributor to the processor project, Vishal Zuluk, and his contributions were :

  1. The USB subsytem is the result of general discussions with him.

Register set for each ALU

The ALU ontains seven types of registers, as listed below along with their bit-length :

ALU busy status -------- 1 bit

Instruction ------------ 128 bits ( to contain the entire instruction so that it can be decoded )

Data --------------------- 40 bits ( to contain the read data so that it can be decoded )

R1 --------------------- 32 bits ( use explained later )

R2 --------------------- 32 bits ( use explained later )

R3 --------------------- 32 bits ( use explained later )

Instruction pointer ---- 32 bits ( to hold the next instruction's address )

Loop counter ---- 32 bits

Memory Management

The MMU is paging-basesd. A page directory in Ad Astra processor is flexible in the sense that it can be connected to a 4 MB data page or to a 5 KB page table.

Structure of a page directory entry and a page table entry :
------->Status ( 8 bits )
------->Base address ( 32 bits )

The Status bits in the page directory will consist of :

-------> Bit 0 - Entry present
-------> Bit 1 - Kernel-only access
-------> Bit 2 - Connected to 4 MB page
-------> Bit 3 to Bit 7 - Reserved

Each page directory and page table will consist of 1024 entries hence their size will be 5120 bytes ( 5 KB ).

The memory allocator function in the kernel should maintain two memory pools :

(a). Blocks of 1024 bytes for page directories, page tables and 4 KB data pages, (b). Blocks of 4 MB for service client-buffer data pages.

The Status word is the first word in any Thread Descriptor and the first byte of the word is a bit-field which will indicate the memory bank that the process's page directory is located in. There can be eight memory banks possible which individually can be as large as 4 GB. Therefore 8 banks x 4 GB each is 32 GB total memory possible in the system.

[ To be done ]

Instructions format

At present there are 25 instructions as given below as a single list :

load-imm-r1, load-imm-r2, load-mem-r1, load-mem-r2, load-mem-byte-r1, load-ptr-r1, load-ptr-r2, store-r1, store-r2, store-byte-r1, store-ptr-r1, store-ptr-r2, add, sub, and, or, xor, lsh, rsh, inv, loop-forever, load-loop-counter, loop-next, sysenter, sysexit

Further, each instruction described in detail. Every Ad Astra processor instruction is 128 bits long. The instructions are all of this fixed length and fixed format to allow the processor logic to read instructions at a determinate rate and keep things simple.

The instruction format is listed below in a vertical manner. The first line below is the first field of the instruction format from the right. The operands are being presented without names here because their usages are different for different Operation Code ( hereforth called opcode ). The numbers within the brackets are their bit-length :

Operation code -------- ( 32 bits )
Operand1 -------------- ( 32 bits )
Operand2 -------------- ( 32 bits )
Operand3 -------------- ( 32 bits )

The opcode field contains the precise instruction number that has to be executed by the execution core. Any user-mode program filling wrong operation code will be trapped at this point and terminated. A code section in the Main Control Program ( the kernel ) can possibly fill this field with wrong operation code. In that case, the processor entirely should halt and an appropriate external pin should become active or low. Actually only the first byte of the opcode contains the instruction number. The rest of this field should be filled with zeroes by software ( compiler ).

load-imm-r1, load-imm-r2 instructions

Copy into register R1 or R2 the immediate value from Operand1.

load-imm-r1 and load-imm-r2 :

Usage ( lsb on top ) :
-------> opcode ( load-imm-r1 or load-imm-r2)
-------> immediate value
-------> zero
-------> zero

Example :
---> load-imm-r1 0xfeedf00d 0, 0

load-mem-r1, load-mem-r2 instructions

Copy into register R1 or R2 the word value from the memory address pointed to via Operand1 + Operand2.

load-mem-r1 and load-mem-r2 :

Usage ( lsb on top ) :
-------> opcode ( load-mem-r1 or load-mem-r2 )
-------> the starting address
-------> the offset
-------> zero

Example :
---> load-mem-r1 [0xf00df00d], 20, 0

load-mem-byte-r1 instruction

Copy into register R1 the byte value from the memory address pointed to via Operand1 + Operand2.

load-mem-byte-r1 :

Usage ( lsb on top ) :
-------> opcode ( load-mem-byte-r1 )
-------> the starting address
-------> the offset
-------> zero

Example :
---> load-mem-byte-r1 [0xf00df0ed], 20, 0

load-ptr-r1, load-ptr-r2 instructions

Copy into register R1 or R2 the value from the memory address second-level-pointed-to via Operand1 + Operand2.

load-ptr-r1 and load-ptr-r2 :

Usage ( lsb on top ) :
-------> opcode ( load-ptr-r1 or load-ptr-r2 )
-------> the pointer address
-------> the offset into the second-level address
-------> zero

Example :
---> load-ptr-r1 [0xfeedfade], 20, 0

The load instructions and their counterpart, the store instructions ( explained next ), exist because the mathematic and logic instructions do not access memory directly. Such partitioning allows keeping the ISA simple and clean. It also allows faster mathematic and logic instruction execution in one core versus relatively slower memory access in another core.

store-r1, store-r2 instructions

Copy a word into a memory address the value from the register R1 or R2.

The mathematic and logic instructions don’t write back the result to memory after execution and therefore the code will have to use this instruction if a memory write-back is needed. Having a separate write-back will allow most instructions to access memory without too much queuing thus increasing hardware-level parallelism. A second effect is that the code that only compares and does not need the result value in memory, needn't take time for a write-back.

Usage ( lsb on top ) :
-------> opcode ( store-r1 or store-r2 )
-------> the starting address
-------> the offset
-------> zero

Example :
---> store-r1 [0xdead0000], 10, 0

store-byte-r1 instruction

Copy a byte into the memory address the value from the register R1

Usage ( lsb on top ) :
-------> opcode ( store-byte-r1 )
-------> the starting address
-------> the offset
-------> zero

Example :
---> store-byte-r1 [0xdead000d], 10, 0

store-ptr-r1, store-ptr-r2 instructions

Copy from register R1 or R2 the value into the memory address second-level-pointed-to via Operand1 + Operand2.

store-ptr-r1 and store-ptr-r2 :

Usage ( lsb on top ) :
-------> opcode ( store-ptr-r1 or store-ptr-r2 )
-------> the pointer address
-------> the offset into the second-level address
-------> zero

Example :
---> store-ptr-r1 [0xfeedfade], 20, 0

The eight mathematic and logic instructions : add, sub, and, or, xor, lsh, rsh, inv

The input for these instructions are taken from the registers R1 and R2. The result of operation is copied from R3 register to R1 register which allows the continuation of the mathematic or logic instructions in a branch out.

add == addition two numbers
sub == subtraction of two numbers
and == logical and'ing of two numbers
or == logical or'ing of two numbers
xor == logical exclusive or'ing of two numbers
lsh == left shifting of a number by so many places given in R2
rsh == right shifting a number by so many places given in R2
inv == invert a number

Usage ( lsb on top ) :
-------> opcode ( the mathematic or logic instruction )
-------> equal compare value
-------> jump on less-than
-------> jump on greater-than

After the operation, the instruction will act as below :

a. If the result contains the equal compare value, the automatically jumped-to address is of the instruction very next to the current instruction.

b. If the result is lesser than equal compare value, the jump-on-less-than field is used to automatically jump to the relevant instruction's address.

c. Otherwise, the result is considered greater than the equal compare value and the jump-on-greater-than field is used to automatically jump to the relevant instruction's address.

This system is called “Conditional Jumps”. The automatically jumped-to addresses are absolute addresses.

The design of these eight instructions could have included another word element to provide the R2 value to the instruction but perhaps addition of another word to the already four words in the instruction will provide for more latency which of course is not desired. If a change is desired in the R2 register then the load-imm-r2 or load-mem-r2 instructions can be used. This keeps the system simple.

Loop instructions

These instructions allow for program loops.

loop-forever : This is a simple unconditional jump to the instruction which is the start of the loop. Its equivalent in C language is the 'do while(1)' loop.

Usage for loop-forever ( lsb on top ) :
-------> opcode ( loop-forever )
-------> loop start address
-------> zero
-------> zero

load-loop-counter : Loads count into the loop counter register. The count should be a positive number :

Usage for load-loop-counter ( lsb on top ) :
-------> opcode ( load-loop-counter )
-------> count
-------> zero
-------> zero

loop-next : Subtracts one from the counter register and if non-zero goes back to the loop start else goes to the next instruction after the loop. Its equivalent in C language is the 'do while( count is a positive )' loop :

Usage for loop-next ( lsb on top ) :
-------> opcode ( loop-next )
-------> loop start address
-------> zero
-------> zero

Thread context instructions

[ To be done ]

sysenter and sysexit instructions

The sysenter instruction changes the thread's execution path from user mode to kernel mode so that the kernel can perform various actions according to the provided arguments which should be at the top of the the thread's syscall arguments page. The sysexit instruction changes the thread's execution path back to user mode.

Usage for sysenter ( lsb on top ) :
-------> opcode ( sysenter )
-------> zero
-------> zero
-------> zero

Usage for sysexit ( lsb on top ) :
-------> opcode ( sysexit )
-------> zero
-------> zero
-------> zero

Example :
---> sysenter 0, 0, 0
---> sysexit 0, 0, 0

Interrupt handling

[ To be done ]

Thread scheduling

[ To be done ]

List of instructions that can be executed only in kernel mode

sysexiit

[ To be done ]


The Ad Astra operating system overview

The OS as said earlier is based on microkernel architecture which means the kernel has just a few facilities which are process / thread creation and scheduling, synchronous IPC and asynchronous notification for a few things, interrupt redirection, timers and critical section synchronization. The rest things are facilitated by user-mode server processes.

Below are syscalls and other OS elements whose shape may be modified as per development in processor ISA and OS design :

Process / Thread management calls

Minimum number of pages alloted to the first thread of a process : 6 "pages" = 26 KB : Page directory ( 5 KB ), page table ( 5 KB ), regular data page ( 4 KB ), code page ( 4 KB ), thread descriptor ( 4 KB ), syscall communication page ( 4 KB ).

Minimum number of pages alloted to subsequent threads : 2 "pages" = 8 KB : thread descriptor ( 4 KB ), syscall communications page ( 4 KB ).

Structure of syscall communications page : At zero address will be syscall number, then arguments. At address 1024 will be error code, then return values. So unlike in other OS' like Linux, in Ad Astra a syscall can return a number of values through this page.

[ To be done ]

The Synchronous IPC calls

Every server process can have eight services ( Send, Receive, Control etc ) and each service can have eight channels each of whom is associated with a separate client buffer attachment address so that the main server thread loop can assign separate threads to each message sender client thus enabling a multi-threaded server. Each separate serving thread can do a msgReply() to unblock the requesting client thread.

Example, a 8-tab-max web browser which will be served by the multi-threaded network server.

msgAddService(Service Name Word Count, Service Name, Client Buffer Attachment Address 1)

The above system call needs only the first Client Buffer Attachment Address and will assume that the remaining seven addresses are the next seven entries in the page directory.

Service Number = msgConnectToService(Service Name, Buffer Size, Buffer Address, Send Timeout Duration)

msgSend(Service Number)

Buffer Address and Transaction Number = msgWait()

msgReply(Transaction Number)

The message passing mechanism here does not need kernel-buffering of the message nor does the kernel copy the message between the client process and the server process. Instead the physical address of the client's buffer ( the 4 MB page ) is attached to one of the attachment addresses in the server process so that the server process can directly access the client buffer page. This is a faster form of IPC / message passing.

[ To be done ]

Mutual exclusion calls

mutexTake(Mutex Number)

mutexRelease(Mutex Number)

There is no need to specially create a mutex. Every process when created is allocated with eight mutexes which can be shared with all the threads of the process.

Program executable file format

A.OUT - https://wiki.osdev.org/A.out

The three data types supported by Ad Astra OS are :

  1. Word [ Four bytes ]

  2. RegularBuffer[ Number Of Words ]

  3. ClientBuffer[ 4 MB page ]

[ To be done ]