OpenCores
URL https://opencores.org/ocsvn/simpcon/simpcon/trunk

Subversion Repositories simpcon

Compare Revisions

  • This comparison shows the changes necessary to convert path
    /
    from Rev 18 to Rev 19
    Reverse comparison

Rev 18 → Rev 19

/trunk/doc/figures/sc_pipe_level.xar Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream
trunk/doc/figures/sc_pipe_level.xar Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/figures/sc_basic_rd.pdf =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: trunk/doc/figures/sc_basic_rd.pdf =================================================================== --- trunk/doc/figures/sc_basic_rd.pdf (revision 18) +++ trunk/doc/figures/sc_basic_rd.pdf (nonexistent)
trunk/doc/figures/sc_basic_rd.pdf Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/figures/sc_basic_rd.xar =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: trunk/doc/figures/sc_basic_rd.xar =================================================================== --- trunk/doc/figures/sc_basic_rd.xar (revision 18) +++ trunk/doc/figures/sc_basic_rd.xar (nonexistent)
trunk/doc/figures/sc_basic_rd.xar Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/figures/sc_sram_prd.pdf =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: trunk/doc/figures/sc_sram_prd.pdf =================================================================== --- trunk/doc/figures/sc_sram_prd.pdf (revision 18) +++ trunk/doc/figures/sc_sram_prd.pdf (nonexistent)
trunk/doc/figures/sc_sram_prd.pdf Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/figures/sc_timing.xar =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: trunk/doc/figures/sc_timing.xar =================================================================== --- trunk/doc/figures/sc_timing.xar (revision 18) +++ trunk/doc/figures/sc_timing.xar (nonexistent)
trunk/doc/figures/sc_timing.xar Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/figures/sc_sram_prd.xar =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: trunk/doc/figures/sc_sram_prd.xar =================================================================== --- trunk/doc/figures/sc_sram_prd.xar (revision 18) +++ trunk/doc/figures/sc_sram_prd.xar (nonexistent)
trunk/doc/figures/sc_sram_prd.xar Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/figures/sc_wait_states.xar =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: trunk/doc/figures/sc_wait_states.xar =================================================================== --- trunk/doc/figures/sc_wait_states.xar (revision 18) +++ trunk/doc/figures/sc_wait_states.xar (nonexistent)
trunk/doc/figures/sc_wait_states.xar Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/figures/sc_rd_ws.pdf =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: trunk/doc/figures/sc_rd_ws.pdf =================================================================== --- trunk/doc/figures/sc_rd_ws.pdf (revision 18) +++ trunk/doc/figures/sc_rd_ws.pdf (nonexistent)
trunk/doc/figures/sc_rd_ws.pdf Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/figures/sc_rd_ws.xar =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: trunk/doc/figures/sc_rd_ws.xar =================================================================== --- trunk/doc/figures/sc_rd_ws.xar (revision 18) +++ trunk/doc/figures/sc_rd_ws.xar (nonexistent)
trunk/doc/figures/sc_rd_ws.xar Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/figures/sc_wr_ws.pdf =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: trunk/doc/figures/sc_wr_ws.pdf =================================================================== --- trunk/doc/figures/sc_wr_ws.pdf (revision 18) +++ trunk/doc/figures/sc_wr_ws.pdf (nonexistent)
trunk/doc/figures/sc_wr_ws.pdf Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/figures/sc_sram.pdf =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: trunk/doc/figures/sc_sram.pdf =================================================================== --- trunk/doc/figures/sc_sram.pdf (revision 18) +++ trunk/doc/figures/sc_sram.pdf (nonexistent)
trunk/doc/figures/sc_sram.pdf Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/figures/sc_wr_ws.xar =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: trunk/doc/figures/sc_wr_ws.xar =================================================================== --- trunk/doc/figures/sc_wr_ws.xar (revision 18) +++ trunk/doc/figures/sc_wr_ws.xar (nonexistent)
trunk/doc/figures/sc_wr_ws.xar Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/figures/sc_sram.xar =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: trunk/doc/figures/sc_sram.xar =================================================================== --- trunk/doc/figures/sc_sram.xar (revision 18) +++ trunk/doc/figures/sc_sram.xar (nonexistent)
trunk/doc/figures/sc_sram.xar Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/figures/sc_pipe_level.pdf =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: trunk/doc/figures/sc_pipe_level.pdf =================================================================== --- trunk/doc/figures/sc_pipe_level.pdf (revision 18) +++ trunk/doc/figures/sc_pipe_level.pdf (nonexistent)
trunk/doc/figures/sc_pipe_level.pdf Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: trunk/doc/simpcon.tex =================================================================== --- trunk/doc/simpcon.tex (revision 18) +++ trunk/doc/simpcon.tex (nonexistent) @@ -1,875 +0,0 @@ -\documentclass[a4paper,12pt]{scrartcl} -\usepackage{pslatex} % -- times instead of computer modern - -\usepackage[colorlinks=true,linkcolor=black,citecolor=black]{hyperref} -\usepackage{booktabs} -\usepackage{graphicx} - -\usepackage[latin1]{inputenc} - -\newcommand{\code}[1]{{\textsf{#1}}} -\newcommand{\sign}[1]{{\texttt{#1}}} - - -\begin{document} - -\title{SimpCon -- a Simple SoC Interconnect\\Draft} -\author{Martin Schoeberl\\ martin@jopdesign.com} -\maketitle \thispagestyle{empty} - -\begin{abstract} - -This document proposes a simple interconnection standard for -system-on-chip (SoC) components. It is intended to provide pipelined -access to devices such on-chip peripherals and on-chip memory -controller with minimum hardware resources. - - -\end{abstract} - -\section{Introduction} - -The intention of the following SoC interconnect standard is to be -simple and efficient with respect to implementation resources and -transaction latency. - -SimpCon is a fully synchronous standard for on-chip -interconnections. It is a point-to-point connection between a master -and a slave. The master starts either a read or write transaction. -Master commands are single cycle to free the master to continue on -internal operations during an outstanding transaction. The slave has -to register the address when needed for more than one cycle. The -slave also registers the data on a read and provides it to the -master for more than a single cycle. This property allows the master -to delay the actual read if it is busy with internal operations. - -The slave signals the end of the transaction through a novel -\emph{ready counter} to provide an early notification. This early -notification simplifies the integration of peripherals into -pipelined masters. - -Slaves can also provide several levels of pipelining. This feature -is announced by two static output ports (one for read and one write -pipeline levels). - -Off-chip connections (e.g.\ main memory) are device specific and -need a slave to perform the translation. Peripheral interrupts are -not covered by this specification. - -\subsection{Feature} - -\begin{itemize} - \item Master/slave point-to-point connection - \item Synchronous operation - \item Read and write transactions - \item Early pipeline release for the master - \item Pipelined transactions - \item Open-source specification - \item Low implementation overheads -\end{itemize} - -\subsection{Basic Read Transaction} - -Figure~\ref{fig:sc:basic:rd} shows a basic read transaction for a -slave with one cycle latency. The acknowledge signals are omitted -from the figure. In the first cycle, the address phase, the -\sign{rd} signals the slave to start the read transaction. The -address is registered by the slave. During the following cycle, the -read phase, the slave performs the read and registers the data. Due -to the register in the slave the data is available in the third -cycle, the result phase. To simplify the master, the read data stays -valid till the next read request response. - -\begin{figure} - \centering - \includegraphics{figures/sc_basic_rd} - \caption{Basic read transaction} - \label{fig:sc:basic:rd} -\end{figure} - -\subsection{Basic Write Transaction} - -A write transaction consists of a single cycle address/command phase -started by assertion of \sign{wr} where the address and the write -data are valid. \sign{address} and \sign{wr\_data} are usually -registered by the slave. The end of the write cycle is signalled to -the master by the slave with \sign{rdy\_cnt}. See section -\ref{sec:ack} and an example in Figure~\ref{fig:sc:wr:ws}. - -\section{SimpCon Signals} - -This sections defines the signals used by the SimpCon connection. -Some of the signals are optional and may not be present on a -peripheral device. - -All signals are a single direction point-to-point connection between -a master and a slave. The signal details are described by the device -that drives the signal. Table~\ref{tab:sc:signals} lists the signals -that define the SimpCon interface. The column Direction indicates -wether the signal is driven by the master or the slave. - -\begin{table} - \centering - - \begin{tabular}{lrlll} - \toprule - Signal & Width & Direction & Required & Description \\ - \midrule - \sign{address} & 1--32 & Master & No & Address lines from the - master\\ - & & & & to the slave port\\ - \sign{wr\_data} & 32 & Master & No & Data lines from the - master\\ - & & & & to the slave port\\ - \sign{rd} & 1 & Master & No & Start of a read transaction \\ - \sign{wr} & 1 & Master & No & Start of a write transaction \\ - \sign{rd\_data} & 32 & Slave & No & Data lines from the - slave\\ - & & & & to the master port\\ - \sign{rdy\_cnt} & 2 & Slave & Yes & Transaction end signalling \\ - \sign{rd\_pipeline\_level} & 2 & Slave & No & Maximum pipeline - level\\ - & & & & for read transactions \\ - \sign{wr\_pipeline\_level} & 2 & Slave & No & Maximum pipeline - level\\ - & & & & for write transactions \\ - \bottomrule - - \end{tabular} - \caption{SimpCon port signals} - \label{tab:sc:signals} - -\end{table} - - - -\subsection{Master Signal Details} - -This section describes the signals that are driven by the master to -initiate a transaction. - -\subsubsection{address} - -Master addresses represent word addresses as offsets in the slaves -address range. \sign{address} is valid a single cycle either with -\sign{rd} for a read transaction or with \sign{wr} and -\sign{wr\_data} for a write transaction. - -The number of bits for \sign{address} depend on the slaves address -range. For a single port slave \sign{address} can be omitted. - -\subsubsection{wr\_data} - -The \sign{wr\_data} signals carry the data for a write transaction. -It is valid for a single cycle together with \sign{address} and -\sign{wr}. The signal is typically 32 bits wide. Slaves can ignore -upper bits when the slave port is less than 32 bits. - -\subsubsection{rd} - -The \sign{rd} signal is asserted a single clock cycle to start a -read transaction. \sign{address} has to be valid in the same cycle. - -\subsubsection{wr} - -The \sign{wr} signal is asserted a single clock cycle to start a -write transaction. \sign{address} and \sign{wr\_data} have to be -valid in the same cycle. - -\subsubsection{sel\_byte} - -The \sign{sel\_byte} signal is reserved for future versions of the -SimpCon specification to add individual byte enables. - -\subsection{Slave Signal Details} - -This section describes the signals that are driven by the slave as a -response to transaction initiated by the master. - -\subsubsection{rd\_data} - -The \sign{wr\_data} signals carry the result for a read transaction. -The data is valid when \sign{rdy\_cnt} reaches 0 and stays valid -till a new read result is available. The signal is typically 32 bits -wide. Slaves that provide less than 32 bits should pad the upper -bits with 0. - -\subsubsection{rdy\_cnt} - -The \sign{rdy\_cnt} signal provides the number of cycles till the -pending transaction will finish. A 0 means that either read data is -available or a write transaction has been finished. Values of 1 and -2 mean the the transaction will finish in at least 1 or 2 cycles. -The maximum value is 3 and means the the transaction will finish in -3 or \emph{more} cycles. Note that not all values have to be used in -a transaction. Each monotonic sequence of \sign{rdy\_cnt} values is -legal. - -\subsubsection{rd\_pipeline\_level} - -The static \sign{rd\_pipeline\_level} provides the master with the -read pipeline level of the slave. The signal has to be constant to -enable the synthesizer to optimize the pipeline level dependent -state machine in the master. - - -\subsubsection{wr\_pipeline\_level} - -The static \sign{wr\_pipeline\_level} provides the master with the -write pipeline level of the slave. The signal has to be constant to -enable the synthesizer to optimize the pipeline level dependent -state machine in the master. - -\section{Slave Acknowledge} -\label{sec:ack} - -Flow control between the slave and the master is usually done by a -single signal in the form of \emph{wait} or \emph{acknowledge}. The -\sign{ack} signal, e.g.\ in the Wishbone specification, is set when -the data is available or the write operation has finished. However, -for a pipelined master it can be of interest to know it -\emph{earlier} when a transaction will finish. - -For a lot of slaves, e.g.\ a SRAM interface with fixed wait states, -this information is available inside the slave. In the SimpCon -interface this information is communicated to the master through the -two bit signal \sign{rdy\_cnt}. \sign{rdy\_cnt} signals the number -of cycles till the read data will be available or the write -transaction will be finished. Value 0 is equivalent to an \emph{ack} -signal and 1, 2, and 3 are equivalent to a wait request with the -distinction that the master knows how long the wait request will -last. - -To avoid too many signals at the interconnect \sign{rdy\_cnt} has a -width of two bits. Therefore, the maximum value of 3 has the special -meaning that the transaction will finish in 3 or \emph{more} cycles. -As a result the master can only use the values 0, 1, and 2 to -release actions in it's pipeline. - -Idle slaves will keep the former value of 0 for \sign{rdy\_cnt}. -Slaves, that don't know in advance how many wait states are need for -the transaction can produce sequences that omit any of the numbers -3, 2, and 1. The master has to handle this situations. - -Figure~\ref{fig:sc:rd:ws} shows an example of a slave that needs -three cycles for the read to be processed. In cycle 1 the read -command and the address are set by the master. The slave registers -the address and sets \sign{rdy\_cnt} to 3 in cycle 2. The read takes -three cycles (2--4) during which \sign{rdy\_cnt} is decremented. In -cycle 4 the data is available inside the slave and gets registered. -It is available in cycle 5 for the master and \sign{rdy\_cnt} is -finally 0. Both, the \sign{rd\_data} and \sign{rdy\_cnt} will keep -their value till a new transaction is requested. - -\begin{figure} - \centering - \includegraphics{figures/sc_rd_ws} - \caption{Read transaction with wait states} - \label{fig:sc:rd:ws} -\end{figure} - - -Figure~\ref{fig:sc:wr:ws} shows an example of a slave that needs -three cycles for the write to be processed. The address, the data to -be written and the write command are valid during cycle 1. The slave -registers the address and write data during cycle 1 and performs the -write operation during cycles 2--4. The \sign{rdy\_cnt} is -decremented and a non-pipelined slave can accept a new command after -cycle 4. - -\begin{figure} - \centering - \includegraphics{figures/sc_wr_ws} - \caption{Write transaction with wait states} - \label{fig:sc:wr:ws} -\end{figure} - - - -\section{Pipelining} - -Figure~\ref{fig:sc:pipe:level} shows a read transaction for a slave -with four cycles latency. Without any pipelining the next read -transaction will start in cycle 7 after the data from the former -read transaction is read by the master. The three bottom lines show -when new read transactions will be started for different pipeline -levels. With pipeline level 1 a new transaction can start in the -same cycle when the former read data is available (in this example -in cycle 6). Higher levels mean that the next read will start -earlier as shown for level 2 and 3. - -\begin{figure} - \centering - \includegraphics[width=\textwidth]{figures/sc_pipe_level} - \caption{Different pipeline levels for a read transaction} - \label{fig:sc:pipe:level} -\end{figure} - -Implementation of level 1 in the slave is trivial (just two more -transitions in the state machine). It is recommended to provide -level 1 at least for read transactions. Level 2 is a little bit more -complex but usually no additional address or data registers are -needed. - -To implement level 3 pipelining in the slave at least an additional -address register is needed. However, to use level 3 the master has -to issue the request in the same cycle as \sign{rdy\_cnt} goes to 2. -That means this transition is combinatorial. We see in -Figure~\ref{fig:sc:pipe:level} that \sign{rdy\_cnt} value of 3 means -three or more cycles till the data is available and can therefore -not be used to trigger a new transaction. - -\section{Multiple Master} - -SimpCon defines no signals for the communication between a master -and an arbiter. However, it is possible to build a multi master -system with SimpCon. The SimpCon interface can be used as -interconnect between the masters and the arbiter and the arbiter and -the slaves. In this case the arbiter acts as slave for the master -and as master for the peripheral devices. - -The missing arbitration protocol in SimpCon results in the need to -queue $n-1$ requests in an arbiter for $n$ masters. However, for -this additional HW we get zero overheads for the bus request. The -master, which gets the bus will will start the slave transaction in -the same cycle. -\\ -\\ -TODO: add a timing diagram to explain this concept. - - -\section{Examples} - -This section provides some examples for the application of the -SimpCon definition. - -\subsection{IO Port} - -TODO: Show how simple an IO port can be with SimpCon. We need no -addresses and can tie \sign{bsy\_cnt} to 0. We only need the -\sign{rd} or \sign{wr} signal to enable the port. - -\subsection{SRAM interface} - -The following example is taken from an implementation of SimpCon for -a Java processor. The processor is clocked with 100MHz and the main -memory consists of 15ns static RAMs. Therefore the minimum access -time for the RAM is two cycles. The slack time of 5ns forces us to -use output registers for the RAM address and write data and input -registers for the read data in the IO cells of the FPGA. These -registers fit nice with the intention of SimpCon to use registers -inside the slave. - -Figure~\ref{fig:sc:sram} shows the interface for a non-pipelined -read access followed by a write access. Four signals are driven by -the master and two signal by the slave. The lower half of the figure -shows the signals at the FPGA pins where the RAM is connected. - -\begin{figure} - \centering - \includegraphics[width=\textwidth]{figures/sc_sram} - \caption{Static RAM interface without pipelining} - \label{fig:sc:sram} -\end{figure} - -In cycle 1 the read transaction is started by the master and the -slave registers the address. The slave also sets the registered -control signals \sign{ncs} and \sign{noe} during cycle1. Due to the -IO cell registers, the address and control signals are valid at the -FPGA pins very early in cycle 2. At the end of cycle 3 (15ns after -\sign{address}, \sign{ncs} and \sign{noe} are stable) the data from -the RAM is available and can be sampled with the rising edge for -cycle 4. - -The master reads the data in cycle 4 and starts a write transaction -in cycle 5. Address and data are again registered from the slave and -are available for the RAM at the beginning of cycle 6. To perform a -write in two cycles the nwr signal is registered by a negative -triggered flip-flop. - -In figure~\ref{fig:sc:sram:prd} we see a pipelined read from the RAM -with pipeline level 2. With this pipeline level and the two cycles -read access time of the RAM we get the maximum bandwidth possible. - -\begin{figure} - \centering - \includegraphics[width=\textwidth]{figures/sc_sram_prd} - \caption{Pipelined read from a static RAM} - \label{fig:sc:sram:prd} -\end{figure} - -We can see the start of the second read transaction in cycle 3 -during the read of the first data from the RAM. The new address is -registered in the same cycle and available for the RAM in the -following cycle 4. Although we have a pipeline level of 2 we need no -additional address or data register. The read data is available for -two cycles (\sign{rdy\_cnt} 2 or 1 for the next read) and the master -is free to select one of the two cycles to read the data. - -\subsection{Master Multiplexing} - -To add several slaves to a single master the \sign{rd\_data} and -\sign{bsy\_cnt} have to be multiplexed. Due to the fact that all -\sign{rd\_data} signals are registered by the slaves a single -pipeline stage will be enough for a large multiplexer. The selection -of the multiplexer is also known at the transaction start but needed -at most in the next cycle. Therefore it can be registered to further -speed up the multiplexer. -\\ -\\ -TODO: add a schematic for the master \sign{rd\_data} multiplexer. - -\section{Status} - -\begin{itemize} - \item First timing diagrams drawn - \item SimpCon SRAM interface for JOP on Cyclone and Spartan-3 is - available - \item Project at opencores.org accepted - \item Simple UART as SimpCon example - \item IO in JOP changed to SimpCon (uart, cnt, usb) -\end{itemize} -% -Next steps: -% -\begin{itemize} - \item Continue this document - \item Provide Wishbone bridges -\end{itemize} -% -to clarify: -\begin{itemize} - \item Use transaction or transfer in this document? - \item Use address phase or better command cycle? -\end{itemize} - -%\end{document} - - -\section{Notes} - -\subsection{Group comment} -\begin{verbatim} - -After implementing the Wishbone interface for main memory access -from JOP I see several issues with the Wishbone specification that -makes it not the best choice for SoC interconnect. - -The Wishbone interface specification is still in the tradition of -microcomputer or backplane busses. However, for a SoC interconnect, -which is usually point-to-point, this is not the best approach. - -The master is requested to hold the address and data valid through -the whole read or write cycle. This complicates the connection to a -master that has the data valid only for one cycle. In this case the -address and data have to be registered \emph{before} the Wishbone -connect or an expensive (time and resources) MUX has to be used. A -register results in one additional cycle latency. A better approach -would be to register the address and data in the slave. Than there -is also time to perform address decoding in the slave (before the -address register). - -There is a similar issue for the output data from the slave: As it -is only valid for a single cycle it has to be registered by the -master when the processor is not reading it immediately. Therefore, -the slave should keep the last valid data at it's output even when -\emph{wb.stb} is not assigned anymore (which is no issue from the -hardware complexity). - -The Wishbone connection for JOP resulted in an unregistered Wishbone -memory interface and registers for the address and data in the -Wishbone master. However, for fast address and control output -($t_{co}$) and short setup time ($t_{su}$) we want to place the -registers in the IO-pads of the FPGA. With the registers are buried -in the WB master it takes some effort to set the right constraints -for the Synthesizer to implement such IO-registers. - -The same issue is true for the control signals. The translation from -the \emph{wb.cyc}, \emph{wb.stb} and \emph{wb.we} signals to -\emph{ncs}, \emph{noe} and \emph{nwe} for the SRAM are on the -critical path. - -The \emph{ack} signal is too late for a pipelined master. We would -need to know it *earlier* when the next data will be available --- -and this is possible, as we know in the slave when the data from the -SRAM will arrive. A work around solution is a non-WB-conforming -early ack signal. - -Due to the fact that the data registers not inside the WB interface -we need an extra WB interface for the Flash/NAND interface (on the -Cyclone board). We cannot afford the address decoding and a MUX in -the data read path without registers. This would result in an extra -cycle for the memory read due to the combinational delay. - -In the WB specification (AFAIK) there is no way to perform pipelined -read or write. However, for blocked memory transfers (e.g. cache -load) this is the usual way to get a good performance. - -Conclusion -- I would prefer: - - * Address and data (in/out) register in the slave - * A way to know earlier when data will be available (or - a write has finished) - * Pipelining in the slave - -As a result from this experience I'm working on a new SoC -interconnect (working name SimpCon) definition that should avoid the -mentioned issues and should be still easy to implement the master -and slave. - -As there are so many projects available that implement the WB -interface I will provide bridges between SimpCon and WB. For IO -devices the former arguments do not apply to that extent as the -pressure for low latency access and pipelining is not high. -Therefore, a bridge to WB IO devices can be a practical solution for -design reuse. -\end{verbatim} - -\subsubsection{additional comments} -\begin{verbatim} - -The idea for (some) pipeline support is twofold: - -1.) The slave will provide more information than a single \emph{ack} -or wait states. It will (if it is capable to do) signal the number -of clock cycles remaining till the read data is available (or the -write has finished) to the master. This feature allows the pipelined -master to prepare for the upcoming read. - -2.) If the slave can provide pipelining the master can use -overlapped wr or rd requests. The slave has a static output port -that tells how many pipeline stages are available. I call this -'pipeline level': - 0 means non overlapping - 1 a new rd/wr request can be issued in the same cycle - when the former data is read. - 2 one earlier and - 3 is the maximum level where you get full pipelining - on the basic read cycle with one wait state - (command - read - read - result). - - -The draft of the spec at the moment are few sketches on real paper - -takes some time to draw all diagrams for a document. - -I have a first implementation of SimpCon on JOP to test the ideas: A -master in JOP and a slave for SRAM access. -\end{verbatim} - -\subsection{e-mail from Robert Finch} - -\begin{verbatim} - - -Hi Martin, I read your comments. I've thought some about the -WISHBONE spec myself. - - -"Martin Schoeberl" wrote in message -news:<4384f0b3$0$11610$3b214f66@tunews.univie.ac.at>... -> After implementing the Wishbone interface for main memory access -> from JOP I see several issues with the Wishbone specification that -> makes it not the best choice for SoC interconnect. - -> The master is requested to hold the address and data valid through -> the whole read or write cycle. This complicates the connection to a -> master that has the data valid only for one cycle. In this case the -> address and data have to be registered *before* the Wishbone connect -> or an expensive (time and resources) MUX has to be used. A register -> results in one additional cycle latency. A better approach would be -> to register the address and data in the slave. Than there is also -> time to perform address decoding in the slave (before the address -> register). - -I've of the opinion that all outputs of masters should be -registered. Registering the outputs hides the timing of the master's -internal signals from the rest of the system and helps turn it into -a 'black box'. However, in my designs I provide both registered and -unregistered versions of outputs, as it is quite handy to have -unregistered signals sometimes. It would have been nice if the -WISHBONE bus spec'd unregistered signals as well as registered ones. -I've just been naming the unregistered signals by including '_nxt' -in the signal name as in 'adr_nxt_o'. '_nxt' standing for the signal -value that will be 'next'. - -Why is the MUX needed ? - -I've found that a register may indeed result in an additional cycle -of latency, depending on the how the system is put together. -However, I've also found that it doesn't really make any difference -to the performance of the system. Registering the output often -allows the cycle time to be decreased, and the 'lost' cycle of -latency is made up for by better timing. I've also found that the -INTERCON (address decoding, bus muxing logic, and arbitration) -typically requires a full cycle by itself and it's best to have the -signals feeding into the INTERCON already registered. Unless the -system is really small (single master / slave). - -By 'address decoding in slaves' I'm assuming you mean partial -address decoding for only register selection. Full address decoding -shouldn't be done in slaves as it wastes a lot of resources. The -address decoding (device/slave selection) should be done by the -INTERCON, and is a function of the system. - -Almost always masters are designed to hold address and data valid -until the external system acknowledges the request. - -> -> There is a similar issue for the output data from the slave: As it -> is only valid for a single cycle it has to be registered by the -> master when the processor is not reading it immediately. Therefore, -> the slave should keep the last valid data at it's output even when -> wb.stb is not assigned anymore (which is no issue from the hardware -> complexity). - -I'm not sure I understand the 'single cycle' timing. Slave devices -I've worked on present valid data as long as the signals coming from -the INTERCON indicate that it should do so. Otherwise the output -data from the slave is allowed to flip around according to whatever -register is addressed as it doesn't affect the system since it's not -muxed to the master's inputs unless it's the addressed device. - -Generally, during a read request the master will always be ready to -read data immediately. If it wasn't ready to read the data it -shouldn't have requested it, as this wastes bus bandwidth. - -> -> The Wishbone connection for JOP resulted in an unregistered Wishbone -> memory interface and registers for the address and data in the -> Wishbone master. However, for fast address and control output (tco) -> and short setup time (tsu) we want the registers in the IO-pads of -> the FPGA. With the registers buried in the WB master it takes some -> effort to set the right constraints for the Synthesizer to implement -> such IO-registers. -> -> The same issue is true for the control signals. The translation from -> the wb.cyc, wb.stb and wb.we signals to ncs, noe and nwe for the -> SRAM are on the critical path. - -I've come to the conclusion that it's unrealistic to expect that -external memory can be accessed at a high rate using only a single -clock cycle. There is naturally a multi-cycle latency when dealing -with an external device operating a high clock rate. The registered -outputs of a WISHBONE master typically wouldn't need to be -registered at the IO-pads. - -> The ack signal is too late for a pipelined master. We would need to -> know it *earlier* when the next data will be available --- and this -> is possible, as we know in the slave when the data from the SRAM -> will arrive. A work around solution is a non-WB-conforming early ack -> signal. - -I ran into this too. I built a system similar to this and it worked -okay. But, I decided not to build newer systems this way. A problem -is that the latency of external device may vary. This makes it -difficult to pipeline the master. SRAM may have a latency of three -cycles, BRAM two cycles, and IO-devices a single cycle. My (current) -master already has an internal three stage pipeline, adding three -more pipeline stages for memory would turn it into a six stage -monster. - -> -> Due to the fact that the data registers not inside the WB interface -> we need an extra WB interface for the Flash/NAND interface (on the -> Cyclone board). We cannot afford the address decoding and a MUX in -> the data read path without registers. This would result in an extra -> cycle for the memory read due to the combinational delay. -> -Yes. Can the delay be hidden using mult-masters (later) ? - -> In the WB specification (AFAIK) there is no way to perform pipelined -> read or write. - -This is something I've thought was missing from the spec as well. -However, doing pipelined access across a system bus could be quite a -feat. - - -However, for blocked memory transfers (e.g. cache -> load) this is the usual way to get a good performance. -> -> Conclusion -- I would prefer: -> -> * Address and data (in/out) register in the slave -> * A way to know earlier when data will be available (or -> a write has finished) -> * Pipelining in the slave -> -> As a result from this experience I'm working on a new SoC -> interconnect (working name SimpCon) definition that should avoid the -> mentioned issues and should be still easy to implement the master -> and slave. -> -> As there are so many projects available that implement the WB -> interface I will provide bridges between SimpCon and WB. For IO -> devices the former arguments do not apply to that extent as the -> pressure for low latency access and pipelining is not high. -> Therefore, a bridge to WB IO devices can be a practical solution for -> design reuse. -> -> A question to the group: What SoC interconnect are you using? -> A standard one for the peripheral devices and a 'home-brewed' for -> more demanding connections (e.g. external RAM access)? -> -> Martin -> - -I'm using an 'enhanced' WISHBONE bus (I added one or two signals, -and renamed a couple). - -I found that for my systems it wasn't necessary to pipeline the -memory system to get good performance. The reason being that there -are multiple bus masters, and all the memory bandwidth is consumed -anyway. (CPU, VIDEO, AUDIO, SPRITE, DISK, CPU2). I ended up building -a shared memory controller with an arbitrater that allows each -device access only every third cycle. This effectively hides a three -cycle latency though the memory. The external memory can service a -request every single clock cycle (at 40MHz!). (Just not from the -same master) Every cycle one of the masters is selected to be -allowed a memory access. Three cycles later, read data is available -for that master. From the master's perspective it looks like a -normal WISHBONE bus. - -Even though the system isn't pipelined, it's using the maximum -amount of performance it can get out of the memory. As a result, -it's turned out that the WISHBONE bus serves as a suitable bus -system to use. - -I'm not sure what's included in JOP system (I'm a news-subscriber), -but it may be easier to get better performance by using multiple -CPU's. For example, one cpu could be handling network communcations -while a second is running Java code (JVM). If there is any kind of -VIDEO or audio (eg MP3) that could be handled by another master as -well. - - -Good Luck with you're bus design. - -Robert - -\end{verbatim} - -\subsection{comp.arch.fpga} - -\begin{verbatim} ->> The last days I played around with the Quartus SOPC builder [1]. ->> Although I'm more a batch/make guy, I'm impressed by the easy to use ->> tool. In order to scratch a little bit on the dominance of the NIOS II ->> in the SOPC world I wrapped JOP [2] into an Avalon component ;-) -> -> Kudos, that is excellent. Any lessons/gotchas about turning JOP into an -> SOPC components should someone else fancy a similar undertaking? - -The Avalon bus is very flexible. Therefore, writing a slave or -master (SOPC component) is not that hard. The magic is in the Avalon -switch fabric generated by the builder. However, an example would -have helped (Altera listening?). I didn't find anything on Altera's -website or with Google. Now a very simple slave can be found at [1]. - -One thing to take care: When you (like me) like to avoid VHDL files -in the Quartus directory you can easily end up with three copies of -your design files. Can get confusing which one to edit. When you -edit your VHDL file in the component directory (the source for the -SOPC builder) don't forget to rebuild your system. The build process -copies it to your Quartus project directory. - -When you want to start over with a clean project the only files -needed for the project are: .qpf, .qsf, .ptf - -The master is also ease: just address, read and write data, -read/write and you have to react to waitrequest. See as example the -SimpCon/Avalon bridge at [2]. The Avalon interconnect fabric handles -all bus multiplexing, bus resizing, and control signal translation. - ->> However, of course there is some drawback. The performance of the ->> Avalon system is lower than a 'native' connection (or in my case ->> via SimpCon [5]) of the main memory to the CPU. I can provide some ->> numbers if there is interest... -> -> Care to elaborate? I'd expect going over Avalon could add latency, but -> if you can exploit multiple outstanding transactions (aka "posted -> reads") and/or bust transfers, the bandwidth should be the same as -> "native". - -Yes, the latency is the issue for JOP. JOP does not trigger several -read or write transactions. However, it can trigger one transaction -and than continue to execute microcode. When the (read) result is -needed, the JOP pipeline is stopped till the result is available. -What helps is to know in advance (one or two cycles) when the result -will be available. That's the trick with the SimpCon interface. -There is not a single ack or waitrequest signal, but a counter that -will say how many cycles it will take to provide the result. In this -case I can restart the pipeline earlier. - -Another point is, in my opinion, the wrong role who has to hold data -for more than one cycle. This is true for several busses (e.g. also -Wishbone). For these busses the master has to hold address and write -data till the slave is ready. This is a result from the backplane -bus thinking. In an SoC the slave can easily register those signals -when needed longer and the master can continue. On the other hand, -as JOP continues to execute and it is not so clear when the result -is read, the slave should hold the data when available. That is easy -to implement, but Wishbone and Avalon specify just a single cycle -data valid. - ->> BTW: The Cyclone II FPGA cannot be clocked really faster than the ->> Cyclone (just a few %). I hoped to get some speed-up for free due ->> to a new generation FPGA :-( -> -> I was surprised too when I saw that. I gather the only way the Cyclone -> II can gain you speed over Cyclone I is when you can use the embedded -> multipliers. Makes me wonder about the upcoming Cyclone III. - -Are there any other data available on that. I did not find many -comments in this group on experiences with Cyclone I and II. Looks -like the CII was more optimized for cost than speed. Yes, waiting -for III ;-) - -Martin - -[1] -http://www.opencores.org/cvsweb.cgi/~checkout~/jop/sopc/components/avalon_test_slave/hdl/avalon_test_slave.vhd - -[2] -http://www.opencores.org/cvsweb.cgi/~checkout~/jop/vhdl/scio/sc2avalon.vhd - -Hi Antti, - -> most of the SOPC magin happens in the perl package "Europe" ASFAIK. -> dont expect a lot of information about the internals of the package. - -That's fine for me. When the connection magic happens and I don't -have to care it's fine. OK, one exception: Perhaps I would like to -know more details on the latency. The switch fabric is 'plain' VHdL -or Verilog. However, generated code is very hard to read. - -> as very simple example for avalon master-slave type of peripherals there -> is on free avalon IP core for SD-card support the core can be found -> at some russian forum and later it was also added to the user ip -> section of the microtronix forums. - -Any link handy for this example? - -> the avalon master is really as simple as the slave. - -Almost, you have to hold address, data and read/write active as long -as waitrequest is pending. I don't like this, see above. - -In my case e.g. the address from JOP (= top of stack) is valid only -for a single cycle. To avoid one more cycle latency I present in the -first cycle the TOS and register it. For additional wait cycles a -MUX switches from TOS to the address register. I know this is a -slight violation of the Avalon specification. There can be some -glitches on the MUX switch. For synchronous on-chip peripherals this -is absolute not issue. However, this signals are also used for -off-chip asynchronous peripherals (SRAM). However, I assume that -this possible switching glitches are not really seen on the output -pins (or at the SRAM input). - -Martin - - -\end{verbatim} - - -\end{document}

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.