Description
This project is concerned with developing an object-oriented processor, but what does that actually mean?
An "object-oriented processor" must have the following generic properties, in order to qualify as such:
- However an "object" is constructed, the data and operations must be tightly-bound. That is to say, they exist and can be handled as a single entity.
- More than one object can exist on the system, as a whole, at any one time.
- Any object may have zero or more instances, where instances are independent in data and identical in functionality.
- One object can inherit data and operations from another object.
Ok, so far, so good. But this isn't how conventional processors work, at all. At least, not at this scale. Whilst it is certainly true that the distinction between software and hardware is largely superficial, that does not mean that producing high-level processors is practical or even a sensible approach. Plenty of "high-level" processors have been developed in the past, yet none survive. The most successful design, in terms of usability, has been the RISC architecture, which is as low-level as you can get.
This leads to the following specific design, for this project:
- Each processor core will hold one object class.
- Each instruction will hold one method (loaded in as microcode).
- Each Virtual Data Register will hold one stored variable.
- Each Virtual Address Register will hold a pair of values - the address of a given Virtual Data Register, and the address of the next instance's copy of that Virtual Address Register.
- Each Physical Address Register will hold an address of the Virtual Address Register of the first instance.
- "Virtual Registers" are held in on-board RAM, and constitute the only directly-reachable registers on the processor.
- There is no "stack" or "instruction register" on this processor. It is driven by external signals, and NOT by a fetch/execute cycle. All operations are atomic.
- "Inheritance" (the key to OO programming) is handled by message passing. The instruction is passed to the outermost processor, which decides if the instruction is to it or not. If not, the instruction is forwarded to the processor it "inherits" from. This continues until either the instruction is rejected at all levels, or it is accepted by a processor.
- An "inheritance" register contains the identifier of whatever processor this processor inherits from. Nothing is copied, or transferred, EXCEPT to mask on the "inheriting" processor those variables and instructions that it has inherited. Without unique identifiers, there would be no way to tell what a caller referred to.
- Networking is handled by a very simplified routable networking protocol. For the purposes of the theoretical description, it is irrelevant as to what this protocol is. However, something akin to IPv6 is envisaged.
- There is no need for "main memory", or other external components. Since the entire program and data space has now been loaded into a network of processors, there is nothing for a "main memory" to do. External connections to I/O devices is best done via a traditional PC emulating a processor on the network.
- Everything is event-driven, as has been mentioned. To run a program, you generate a signal on the network, which spawns other signals, and so on. Because of this, processors are pictured as monitoring for signals of interest to them, rather than pulling out of some central store the instructions necessary.
- This design element intentionally left blank
- Because everything is event-driven, a processor may receive more than one instruction at the same time, over different network connections. (It's pictured that something similar to the Transputer, where four connections are available, would be used.)
- The way this would be handled is for instructions to be pushed onto a queue. These instructions would be filtered via the microcode, and pushed onto further queues.
- The underlying hardware would pluck the data off the relevent queues as and when it became free to do so. In consequence, the processor could genuinely multi-task, for any tasks not involving the same low-level hardware.
- Where resources do not exist to execute two instructions in parallel, one is executed and the other shall be left on the queue until processor capacity becomes available.
- This low-level structure of, effectively, one-instruction processors, gives you a set-up whereby a synchronous design would make no sense. Each "processor" needs to handle it's own timings, and the outer processor, therefore, will have no real timings at all. Instructions may be executed in order, or not in order, depending on what hardware is required and how long that instruction takes to execute.
- Each "one instruction processor" has two built-in queues - one for incoming data, one for outcoming data. Each queue entry contains the raw data, plus the tags for the instance & method which generated it, so that the out-data can be sent to the right place. By using queues, you avoid the problems of using the same hardware for two different tasks at the same time, without having to rely on locks.
- Once the method has been completed, a network "packet" is generated, containing the data to be sent, plus the destination. The destination may or may not be the same as the original caller.
- Since the network protocol is pictured as being similar to IPv6, it can also be pictured as being capable of multicasting. Thus, a call may be to more than one processor, as may a response. (The two are effectively the same, anyway!)
- Each "object-level" processor will be minimalistic, containing ONLY the basic elements necessary to execute the most common parts of the microcode for a given method. Everything else is farmed out, using the inheritance mechanism. The idea behind this is to ensure that most of the computing elements in the system are used most of the time, with minimal redundancy.
- This is the complete opposite of a lot of modern designs, where speed is gained by having a large amount of redundancy. At first, cutting out the duplication would seem to slow things down, as there is a lot more communication involved. This would be true, if many methods wished to access the same computing elements at the same time. However, it is anticipated that this will NOT be the case, simply because different methods generally do different things. Given that, the communications overhead should not be excessive. This would have to be verified by direct measurement, though.
- In order to handle multiple instances, a memory manager must be devised, for each heap, such that each instance is kept separate from the others in that heap. This memory manager is envisaged as simply allocating each instance a fixed amount of space, where that fixed amount is decided at the time the object is loaded onto the processor.
- As can be seen from the above notes, some of the core functionality of a typical Operating System are being moved into hardware, in order to support the management of the FROOP network. These components are going to be used frequently, by all the processors, which is going to mean that they will need to be duplicated. One of the big deciding factors in the practicality of this design is in whether the overhead of these components will completely swamp any potential benefit of this design.
- Last, but by no means least, this design calls for a VERY large-scale network of processors, to be able to do anything of any substance. This poses the question that this project MUST answer, at some point - is it possible to build such a large-scale network, in a way as to produce a general-purpose computational device?
The outline of each processor is therefore as follows:
- Each processor will have four input queues. Method invocations are pulled from each queue in a round-robin fashion. If a queue is empty, it is skipped.
- The method identifier is used to identify the set of instructions relating to that method. The instance identifier is used to identify the set of registers that represent the variables for that instance.
- The instructions are executed in sequence, by calling the relevent computational elements within the processor.
- If a return value is indicated, the processor will generate a packet containing the relevent variable, and send it to the originating processor.
Routing will be performed as follows:
- A processor pulls the method invocation.
- If it is for this processor:
- It is executed locally, as above, by injecting into the queue
- If it is for a "parent" object:
- The object ID is replaced with the parent object ID
- The packet continues on to the general routing algorithm
- If it is for any other processor:
- The routing table is queried for the next hop to the destination
- The packet is routed to the next hop identified
There is nothing to prevent multiple instances being executed at the same time, and therefore all internal operations have to include some state information. Internal communication is expected to look something like this:
- Request ID
- Instance ID
- Method ID
- Low-level Instruction
- Operand(s) required for instruction
Intermediate results are placed in the internal variable/register space on the chip. Returned values are delivered in the same way as any other packet. Values returned to an object class that is inherited by the calling class are delivered by way of the calling class. Values returned to an object class that performs the call but was inherited by another object at that time are returned to the class that made the call. All return values go to the instance/object that made the call, regardless of the structure or dynamics of the code.
At this point, one thing has not been addressed - data that is inherited. When data is inherited, but is accessed via operations that are not, then return values will go to the object that performed the call, NOT the object storing the data.
In this case, the object performing the call will need to re-route the data, as per the routing to a parent object, described in the second of the routing methods. No attempt will be made to circumvent this, as that would introduce excessive complexity for overridden and polymorphic data.
Features
- The FROOP design will allow the direct execution of OO software
- The distinction between hardware, OS and application is removed
- The design borrows hardware concepts from the Transputer, iAPX 432 and ARM
- The design borrows software concepts from OpenMOSIX, Globus, OpenMP and MPI
- The basic computing element is an underlying Turing-complete core
- The core shall be based around the Processor-In-Memory (PIM) architecture
- The core shall support at least four built-in interconnects on the chip
- Support for multiple instances of a given object on a given core
- Support for uploading an object onto a core, via microcode
- Support for complex topologies of FROOP processors
- Support for one core to inherit the capabilities of another
- No external chipsets, memory or devices are "necessary"
- Cores are event-driven, rather than running a fetch/execute cycle
- Programs are inherently executed in parallel over the system as a whole
- Instances on a single core are not guaranteed to be run in parallel
- The architecture performs part of the parallelization of the program
- Clusters would be "infinitely" scalable and may have any topology
- Insert Idea Here
- The design would permit an asynchronous, "clockless" system
- Unrelated activities can spawn multiple threads which are executed in parallel
- The level of sequential coherence is controllable, rather than imposed