OpenCores

PDP-11/70 CPU core and SoC :: Performance

Performance

Microarchitecture

Because the w11a microarchitecture is very similar to the original 11/70 processor, the KB11-C CPU, the instruction timing in clock cycles is also very similar. A register-register operation takes two clock cycles, more a involved case like an "add @r1,a(r2)" for example takes 12 cycles. Notable exceptions are the MUL (5 instead of 22 cycles) and DIV (23 instead of 46 cycles) instructions.

Clock Rate

On Spartan class FPGA's the w11a systems run with at a clock frequency of at least 50 MHz. Specifically:
  FPGA          board     system     clock   comment
  xc3s1000-4    S3BOARD   w11a_s3   50 MHz   no DCM
  xc3s1200e-4   Nexys2    w11a_n2   58 MHz   uses DCM
  xc6slx16-2    Nexys3    w11a_n3   80 MHz   uses DCM

Expected Performance

Compared to KB11-C CPU: The KB11-C CPU had a 150 ns micro cycle time. Both the w11a and the 11/70 have a cache, which greatly reduces the impact of memory latencies. So one expects that the w11a is about a factor 50/6.7 or 7.5 faster than the original PDP-11/70.

Compared to J11 CPU: This later ASIC implementation of the 11/70 ran with up to 20 MHz clock rate. It needed 4 clocks per microcycle, resulting in a 200 ns micro cycle time. However, the J11 had a significantly improved microarchitecture yielding an up to a factor two better cpi (cycles-per-instruction) value. So one expects that the w11a is at least a factor (50/20)*(4/1)*(1/2) or 5 faster than the fastest J11 based system, the PDP-11/93.

Benchmarks

The Dhrystone 2 and Tower of Hanoi benchmark codes taken from the 'BYTE UNIX Benchmark' were used to compare the w11a with real PDP-11's and other processors. The w11a values were determined for both boards, the comparison values obtained from Michael Schneider's benchmark collection:
  Type             OS        CPU        (Mhz)   Dhry2   Hanoi   Dhry Hanoi  Dhry
                                                [lps]   [lps]   /MHz  /MHz  /Han

  w11a_s3 V0.5     BSD 2.11  w11a        (50)   11510   160.8    230   3.2  71.6
  w11a_n2 V0.5     BSD 2.11  w11a        (50)   11519   160.4    230   3.2  71.8
  w11a_n2 V0.51    BSD 2.11  w11a        (58)   13218   186.1    228   3.2  71.0
  w11a_n3 V0.54    BSD 2.11  w11a        (80)   17797   252.2    222   3.1  70.6

  pdp-11/53+       BSD 2.11  KDJ11-SD   (4.5)*    828    12.2    184   2.7  67.8
  Mac SE/30        A/UX      68030       (16)    3042    81.8    190   5.1  37.2
  SUN 3/60         NetBSD    68020       (20)    6934   121.3    346   6.1  57.3
  DECstation 2100  NetBSD    R2000       (12)   13206   155.5   1100  13.0  85.2
  NeXT N1100       NetBSD    68040       (25)   26882   386.1   1075  15.4  69.6
  HP 9000/433t     NetBSD    68040       (40)   55763   960.3   1394  24.0  58.1
  NCR system 3230  NetBSD    i486DX/2    (66)   63464   993.1    961  15.0  63.9
  NCR system 3230  NetBSD    i486DX/4   (100)   75010  1022.3    750  10.2  73.4

  Power Mac G4     Gentoo    PPC7455   (1400)    3713k   46.6k  2652  33.3  79.6
  Lenovo TS S10    Gentoo    i686 E8400(3000)   16464k  262.6k  5488  87.5  62.7
Note that the J11 system is listed with the effective microcycle rate of 4.5 MHz rather the chip clock rate of 18 MHz. This is also consistent with Bob Supnik's notes on the J11 were the J11 is classified as '4.5 MHz'. This gives a more meaningful values for the Dhry/MHz or 'Dhrystone per MHz' column. For a fair comparison it is also important to remark that the PDP-11/53+ systems didn't have a cache and were therefore about a factor 2.3 slower than a PDP-11/93 with cache (see comparison), explaining the large factor between the w11a_s3 and the 11/53 benchmark results.

The Dhrystone, Tower of Hanoi and 'syscall' benchmarks were also run on a simulated PDP-11 using simh version V3.8-1 and natively on a Linux system. In both cases a Kubuntu 10.4 system with an Intel Core2 Duo E8400 CPU was used, cpufreg was fixed to 3 GHz.

  System       Platform              (MHz)  Dhry2    Hanoi  syscall
                                            [lps]    [lps]    [lps]

  2.11BSD      w11a_s3 V0.5          (50)   11510    160.8     7080
  2.11BSD      w11a_n2 V0.5          (50)   11519    160.4     6888
  2.11BSD      w11a_n2 V0.51         (58)   13218    186.1     7616
  2.11BSD      w11a_n3 V0.54         (80)   17797    252.2    10375

  2.11BSD      simh on Intel E8400   (--)   17174    250.0    10713
  Ubuntu 10.4  Intel E8400         (3000)   10785k    74.1k    1020k
Some observations are:
  • The Nexys2 and Nexys3 boards have a significantly larger main memory latency than the S3BOARD. Because Dhry2 and Hanoi run almost completely in cache and do rarely writes they execute with equal speed. syscall is more sensitive to memory latencies, either due to more cache misses or delays from write-thru's.
  • The simulated PDP-11 on a modern 3 GHz PC is about as fast as the current FPGA implementation on a Spartan-6. However, there is certainly room for improvement on the FPGA side, either with a faster devices (e.g. a Virtex-6 instead of a Spartan-6) and/or an improved microarchitecture (like J11 or even better).
  • Comparing the arithmetic benchmarks (Dhry2 and Hanoi) with the 'syscall' benchmark on simh-simulated 2.11BSD and native Linux suggests that the system call overhead, normalized by processor speed, is larger for contemporary Linux than for 2.11BSD.

© copyright 1999-2012 OpenCores.org, equivalent to ORSoC AB, all rights reserved. OpenCores®, registered trademark.