# OpenCores G.729A Codec

# Version 1.0

January 2015

# Contents

| Introduction                                      | 3  |
|---------------------------------------------------|----|
| License                                           | 3  |
| Conventions                                       |    |
| General description                               | 4  |
| Source code language                              | 4  |
| Core architecture                                 | 4  |
| Core interface                                    | 5  |
| Flow of operations                                | 6  |
| Control signals                                   | 6  |
| Single-channel operations, full-duplex mode       |    |
| Single-channel operations, half-duplex mode       |    |
| Multiple-channel operations, full-duplex mode     |    |
| Multiple-channel operations, half-duplex mode     | 9  |
| Performance                                       | 9  |
| Sample timing diagrams                            | 10 |
| Core reset                                        |    |
| Initialization                                    |    |
| Encoding                                          |    |
| Channel state saving operation                    | 14 |
| Channel state restoring operation                 |    |
| Appendix A: Altera© Quartus II 9.1 synthesis test |    |
| Appendix B: Xilinx© ISE 14.7 synthesis test       |    |
| Appendix C: Simulation & Implementation Hints     |    |
| ** 1                                              |    |

### Introduction

This document describes OpenCores G.729A codec core available at <u>http://opencores.org/project,g729a\_codec</u>. The core performs multi-channel 8kbps voice compression based on ITU-T G.729A standard, in both half-duplex and full-duplex modes.

Hopefully this work can be of use to somebody.

## License

In accordance with the existing version of the OpenCores G.729A codec core. This work is licensed under the GNU Lesser General Public License. The license can be obtained at <u>http://www.gnu.org/licenses/lgpl.html</u>. As such, the following applies to all source files added to the OpenCores G.729A codec core.

Copyright (C) 2014 Stefano Tonello

This source file may be used and distributed without restriction provided that this copyright statement is not removed from the file and that any derivative work contains

the original copyright notice and the associated disclaimer This source file is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser GeneralPublic License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

This source is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this source; if not, download it from <u>http://www.gnu.org/licenses/lgpl.html</u>.

<u>G.729 includes patents from several companies and is licensed by Sipro Lab Telecom. Sipro Lab</u> <u>Telecom is the authorized Intellectual Property Licensing Administrator for G.729 technology and</u> patent pool. In a number of countries, the use of G.729 may require a license fee and/or royalty fee.

### Conventions

The following conventions are used in this document:

- Channel: a sequence of data packet related to the same audio stream.
- Coding (or encoding): the process of compressing audio samples.
- Core: the G.729A codec described in this document.
- Data packet: any sequence of 16-bit word transferred between the core and outside world (hereinafter called user logic), data packets can carry un-encoded, coded (or encoded) or state data.
- Decoding: the process of un-compressing audio samples.
- User logic: any circuitry connected to the core, and used to control it and/or exchanging data with it.

### **General description**

This G.729A codec core performs coding and decoding of 16-bit LPCM audio samples according to ITU-T G.729A standard.

When coding, the core takes as input data packets composed of 80 16-bit LPCM audio samples and outputs compressed data packets consisting of 5 16-bit words.

When decoding, the core takes as compressed data packets consisting of 5 16-bit words and outputs data packets composed of 80 16-bit LPCM audio samples.

The core can operate in both half-duplex (coding-only or decoding-only) and full-duplex (coding + decoding) modes.

The core can handle multiple channels by intermixing processing of their data packets, corruption of state information related to each channel is avoided by saving them (to external memory) after processing a packet belonging to a channel, and restoring them (from external memory) before processing the next packet belonging to the same channel.

The maximum number of channels which can be intermixed is limited only by the core operating frequency. Channel state save/restore operations are not needed for single-channel operations.

### Source code language

All source code files are written in synthesizable VHDL language.

Although the source code doesn't include vendor-specific features, it has been written having FPGA's in mind and therefore is more suitable to implementation on FPGA's.

FPGA-oriented features include, among others, the use of inferred dual-port RAM memories with one read/write and one read-only port, and the mixing of logic and memory instances in VHDL code.

#### **Core architecture**

The core has been designed following an ASIP-style (Application Specific Instruction Processor) approach, and it consists of a simple processor, which RISC-like instruction set has been augmented with instructions dedicated to G.729A coding/decoding operations, plus instruction and data memories and an interface module managing the processor itself and the data transfers between the core and user logic.

Every data transfer between the core and the user logic is actually implemented as a Direct Memory Access (DMA) operation on the processor data memory.

Data memory stores three types of data: constant data (in a read-only portion of it), channel state data (to be retained across decoding/encoding runs) and scratchpad data (to be overwritten at each run).

# **Core interface**

The core interface consists of the following signals:

- CLK i: clock input.
- RST i: synchronous reset input (reset completes in one cycle).
- STRT\_i: start input.
- OPS i[3-1:0]: operation selecting input.
- RE\_i: read-enable input.
- WE i: write-enable input.
- DI\_i[16-1:0]: data input.
- BSY\_0: core busy output.
- DMAE o: DMA-enabled output
- STS 0[3-1:0]: status output.
- DV o: data valid output
- DO\_o[IO\_WIDTH-1:0]: data output.

 $STRT_i$  starts a core operation and must be asserted by user logic for one cycle.  $STRT_i$  signal level is ignored when signal  $BSY_o$  is asserted.

OPS\_i[3-1:0] selects the operation to be performed by the core and must asserted by user logic on the same cycle where STRT i is asserted. Valid core operations are listed below.

RE\_i and WE\_i signals control data transfers between user logic and the core. User logic asserts WE\_i for one cycle when a 16-bit data word needs to be written to the core. User logic asserts RE\_i for one cycle when a 16-bit data word needs to be read from the core.

 $DI_i[16-1:0]$  is used to write 16-bit data words to the core, data loaded by user logic on  $DI_i$  are valid, and can be latched by the core, <u>on the same cycle</u> where WE i signal is asserted.

BSY\_o acts as "core busy" flag and is asserted by the core while processing a data packet.

 $DMAE_{\circ}$  acts as "DMA-enabled" flag and is asserted by the core when it's in DMA mode (i.e. when data can be read/written using RE\_i, WE\_i, DI\_i, and DO\_ $\circ$ ). This signal is currently redundant and can be ignored (left open).

 $STS_0[3-1:0]$  provides user logic with core status information specifying the type of data (decoded data packets, encoded data packet and state data packet) the core expects to be transferred and the direction of the transfer. Valid status values are listed below.

 $DV_{\circ}$  flags presence of valid data on  $DO_{\circ}$ . When  $DV_{\circ}$  is asserted, data on  $DO_{\circ}$  can be latched by user logic. This output is in some way redundant, because the information it provides can be obtained from RE i too.

 $DO_0[16-1:0]$  is used to read 16-bit data words from the core, data loaded by the core on  $DO_0$  are valid, and can be latched by user logic, <u>one cycle</u>, or two cycles, after RE\_i signal has been asserted (two cycles delay occurs when optional output registers are present, one cycle delay occurs when they're not).

All control signals are active-high: asserting a signal means driving it high, and de-asserting a signal means driving it low.

| Value Mnemonic | Purpose                                                                |
|----------------|------------------------------------------------------------------------|
| 000 RUNF       | Run full-duplex (coding + decoding) operation                          |
| 001 INIT       | Initialize core (required before processing first packet of a channel) |

#### Valid OPS i values

| 010 | RSTS | Restore channel state data              |
|-----|------|-----------------------------------------|
| 011 | RUNC | Run half-duplex (coding-only) operation |
| 100 | RUND | Run half-duplex (decoding-only)         |
| 101 | SAVS | Save channel state data                 |
| 110 | -    | Reserved                                |
| 111 | -    | Reserved                                |

#### Valid STS o values

| Value | Mnemonic | Purpose                                              |
|-------|----------|------------------------------------------------------|
| 000   | IDLE     | Core idle (no operation in progress)                 |
| 001   | COD_DIN  | Writing un-encoded data to core (encoding input)     |
| 010   | COD_DOUT | Reading coded data from core (encoding output)       |
| 011   | DEC_DIN  | Writing coded data to core (decoding input)          |
| 100   | DEC_DOUT | Reading en-encoded data from core (decoding output)  |
| 101   | STT_DIN  | Writing channel state data to core (restoring state) |
| 110   | STT_DOUT | Reading channel state data from core (dumping state) |
| 111   | RUN      | Processing (no data transfer possible)               |

Core top-level module is named G729A\_CODEC\_SDP and is located in source file G729A\_codec\_sdp.vhd.

Top-level module provides the following configuration parameters (generics):

- REGISTER\_INPUTS: when set to '1', input signals STRT\_i, OPS\_i, RE\_i, WE\_i, and DI\_i are registered by the core. Default value is '0' (no registering).
- REGISTER\_OUTPUTS: when set to '1', output signal DO\_o is registered by the core. Default value is '0' (no registering).
- SIMULATION\_ONLY: used for verification purposes only, ignore it or set to default value of '0'.
- ST FILE: used for verification purposes only, ignore it.
- WB FILE: used for verification purposes only, ignore it.
- USE\_ROM\_MIF: when set to '1', ROM memories data content is specified using a MIF (Memory Initialization File) format file (this option is suitable for synthesis with Altera tools). When set to '0', ROM memories data content is specified as a VHDL constant value (this option is suitable for simulation and for synthesis with Xilinx tools). Default value is '0' (no MIF file).

### **Flow of operations**

#### **Control signals**

Interaction between the core and user logic mainly occurs by means of  $OPS\_i$  input and  $STS\_o$  output:  $OPS\_i$  is used, by user logic, to send commands to the core (to be qualified by  $STRT\_i$  signal), which responds by providing status information on  $STS\_o$ . Status information is needed by user logic to detect when the core is ready to exchange data (either coded, un-encoded or state data) with it. Additional status information is provided by  $BSY\_o$  output, this signal informs user logic that an

#### Single-channel operations, full-duplex mode

Before starting to perform encoding/decoding operations, the core must be initialized by issuing command INIT on OPS\_i input and asserting STRT\_i signal. The core responds by asserting signal BSY\_ $\circ$  and (with some cycle of delay) driving RUN on STS\_ $\circ$  output for the whole duration of the operation.

When core de-asserts BSY\_o, actual encoding and decoding of next data packet can begin, this is accomplished by issuing command RUNF on OPS\_i input and asserting STRT\_i signal. The core responds by asserting signal BSY\_o and (with some cycle of delay) driving DEC\_DIN on STS\_o output, thus informing user logic than encoded data are expected by the core.

User logic detects change in STS\_ $\circ$  value and writes an encoded data packet (5 16-bit words) to core using WE\_i and DI\_i . The core counts incoming data words and, when whole packet has been transferred, responds by driving RUN on STS\_ $\circ$  output, thus informing user logic than decoding is in progress.

When decoding is complete, the core drives DEC\_DOUT on STS\_0 output, thus informing user logic than decoded data are ready to be read from the core.

User logic detects change in  $STS_o$  value and read a decoded data packet (80 16-bit words) from core using RE\_i and DO\_o. The core counts out coming data words and, when whole packet has been transferred, responds by driving COD\_DIN on  $STS_o$  output, thus informing user logic than unencoded data are expected by core.

User logic detects change in STS\_o value and (if encoding is requested) writes an un-encoded data packet (80 16-bit words) to core using WE\_i and DI\_i. The core counts incoming data words and, when whole packet has been transferred, responds by driving RUN on STS\_o output, thus informing user logic than encoding is in progress.

When encoding is complete, the core drives COD\_DOUT on STS o output.

User logic detects change in  $STS_{\circ}$  value and read an encoded data packet (5 16-bit words) from core using RE\_i and DO\_ $\circ$ . The core counts out coming data words and, when whole packet has been transferred, responds by de-asserting signal BSY\_ $\circ$  and driving IDLE on  $STS_{\circ}$  output, thus informing user logic than the core is back to idle state and can perform a new operation.

#### Single-channel operations, half-duplex mode

Before starting to perform encoding/decoding operations, the core must be initialized by issuing command INIT on OPS\_i input and asserting STRT\_i signal. The core responds by asserting signal BSY\_0 and (with some cycle of delay) driving RUN on STS\_0 output for the whole duration of the operation.

When core de-asserts BSY\_o, actual encoding/decoding of next data packet can begin, this is accomplished by issuing command RUNC (encoding-only), or RUND (decoding-only), on OPS\_i input and asserting STRT\_i signal. The core responds by asserting signal BSY\_o and (with some cycle of delay) driving DEC\_DIN, or COD\_DIN, on STS\_o output, thus informing user logic than encoded, or un-encoded, data are expected by the core.

User logic detects change in STS\_o value and writes an encoded (5 16-bit words), or an unencoded (80 16-bit words), data packet to core using WE\_i and DI\_i. The core counts incoming data words and, when whole packet has been transferred, responds by driving RUN on STS\_o output, thus informing user logic than decoding is in progress.

When decoding is complete, the core drives DEC\_DOUT, or COD\_DOUT, on STS\_0 output.

User logic detects change in STS\_ $\circ$  value and read a decoded (80 16-bit words), or coded (5 16-bit words), data packet from core using RE i and DO  $\circ$ .

The core counts out-coming data words and, when whole packet has been transferred, responds by de-asserting signal  $BSY_{0}$  and driving IDLE on  $STS_{0}$  output, thus informing user logic than the core is back to idle state and can perform a new operation.

#### Multiple-channel operations, full-duplex mode

Multiple-channel operations differ from single-channel ones because the core must be initialized for every channel and state data must be saved after every encoding/decoding operation and restored before every encoding/decoding operation on the same channel.

Before starting to perform encoding/decoding operations, the core must be initialized for first channel, by issuing command INIT on OPS\_i input and asserting STRT\_i signal. The core responds by asserting signal BSY\_0 and (with some cycle of delay) driving RUN on STS\_0 output for the whole duration of the operation.

When core de-asserts BSY\_o, actual encoding and decoding of next data packet can begin, this is accomplished by issuing command RUNF on OPS\_i input and asserting STRT\_i signal. The core responds by asserting signal BSY\_o and (with some cycle of delay) driving DEC\_DIN on STS\_o output, thus informing user logic than encoded data are expected by the core.

User logic detects change in  $STS_{0}$  value and writes an encoded data packet (5 16-bit words) to core using WE\_i and DI\_i. The core counts incoming data words and, when whole packet has been transferred, responds by driving RUN on  $STS_{0}$  output, thus informing user logic than decoding is in progress.

When decoding is complete, the core drives DEC\_DOUT on STS\_0 output, thus informing user logic than decoded data are ready to be read from the core.

User logic detects change in  $STS_o$  value and read a decoded data packet (80 16-bit words) from core using RE\_i and DO\_o. The core counts out coming data words and, when whole packet has been transferred, responds by driving COD\_DIN on  $STS_o$  output, thus informing user logic than unencoded data are expected by core.

User logic detects change in STS\_o value and (if encoding is requested) writes an un-encoded data packet (80 16-bit words) to core using WE\_i and DI\_i. The core counts incoming data words and, when whole packet has been transferred, responds by driving RUN on STS\_o output, thus informing user logic than encoding is in progress.

When encoding is complete, the core drives COD\_DOUT on STS o output.

User logic detects change in  $STS_{\circ}$  value and read an encoded data packet (5 16-bit words) from core using RE\_i and DO\_ $\circ$ . The core counts out-coming data words and, when whole packet has been transferred, responds by de-asserting signal BSY\_ $\circ$  and driving IDLE on  $STS_{\circ}$  output, thus informing user logic than the core is back to idle state and can perform a new operation.

User logic must now save state data for the current channel, this is accomplished by issuing command SAVS on OPS\_i input and asserting STRT\_i signal.

The core responds by asserting signal  $BSY_{0}$  and (with some cycle of delay) driving STT\_DOUT on  $STS_{0}$  output, thus informing user logic than state data are expected to be read from the core (to be stored in some external memory).

User logic detects change in  $STS_o$  value and reads state data packet (1679 16-bit words) from core using RE\_i and DO\_o. The core counts out-coming data words and, when whole packet has been

transferred, responds by driving IDLE on STS\_0 output, thus informing user logic that the core is back to idle state and can perform a new operation.

The same sequence of operations (initialization, coding/decoding and state saving) must then be repeated for each one of the remaining channels.

When such sequence has been performed for every channel (i.e., when first data packet from every channel has been processed), state data for first channel must be restored, this is accomplished by issuing command RSTS on OPS i input and asserting STRT i signal.

The core responds by asserting signal  $BSY_{0}$  and (with some cycle of delay) driving STT\_DIN on  $STS_{0}$  output, thus informing user logic than state data are expected to be written to the core (from some external memory).

User logic detects change in  $STS_{\circ}$  value and writes a state data packet (1679 16-bit words) to core using WE\_i and DI\_i. The core counts incoming data words and, when whole packet has been transferred, responds by driving IDLE on  $STS_{\circ}$  output, thus informing user logic that the core is back to idle state and can perform a new operation.

A new data packet for first channel can now be processed in the same way described above for single-channel operations. When core completes this operation and returns to IDLE state, channel state has to be saved again.

This restore-run-save operation sequence must then be repeated for each of the remaining channels.

#### Multiple-channel operations, half-duplex mode

Multiple-channel operations in half-duplex mode are not described in details, differences between them and multiple-channel operations in full-duplex mode being the same existent between single-channel half- and full-duplex operations.

### Performance

Minimum clock frequency for single-channel full-duplex operations is 27.5 MHz (actual minimum frequency is a bit lower, but the number of instructions executed in each encoding + decoding run varies from run to run, so it's safer to add some margin).

Rule-of-thumb for multiple-channel full-duplex operations is to add 27.5 MHz for each channel (55 MHz for two channels, 82.5 MHz for three channels, and so on).

Rule-of-thumb for half-duplex operations is to budget ~5.5 MHz/channel for decoding operations and ~22 MHz/channel for encoding operations.

# Sample timing diagrams

The following timing diagrams assume optional input/output registers are not present.

### Core reset



Fig. 1: reset timing diagram.

### Initialization



Fig. 2: Initialization timing diagram.

Note: for simplicity, BSY\_0 and STS\_0 are shown to change state on the same cycle after STRT\_i assertion, in reality STS\_0 changes state some cycle later than BSY\_0.

### Encoding



Fig. 3a: Encoding timing diagram (part I). Note: for simplicity, BSY\_0 and STS\_0 are shown to change state on the same cycle after STRT\_i assertion, in reality STS\_0 changes state some cycle later than BSY\_0.

Decoding operation timing diagram differs from encoding one only for the command that is issued (RUND, instead of RUNC) and for input data type (encoded data vs. un-encoded data).



Fig. 3b: Encoding timing diagram (part II).

Decoding operation timing diagram differs from encoding one only for the output data type (un-encoded data vs. encoded data).

## Channel state saving operation



Fig. 4: Channel state saving operation timing diagram. Note: for simplicity, BSY\_0 and STS\_0 are shown to change state on the same cycle after STRT\_i assertion, in reality STS\_0 changes state some cycle later than BSY\_0.

### Channel state restoring operation



Fig. 5: Channel state restoring operation timing diagram. Note: for simplicity, BSY\_0 and STS\_0 are shown to change state on the same cycle after STRT\_i assertion, in reality STS\_0 changes state some cycle later than BSY\_0.

# Appendix A: Altera© Quartus II 9.1 synthesis test

Top-level module: G729A\_CODEC\_SDP\_SYN (synthesis test-bench) from file G729A\_codec\_sdp\_SYN.vhd.

Configuration parameters:

- REGISTER\_INPUTS = '0'
- REGISTER\_OUTPUTS = '0'
- USE ROM MIF = '1'.

Target device: EP3C25F324C8 (the one used by NEEK development board) Target Fmax: 100 MHz (default synthesis options). Results:

- Total logic elements: 6175 (24%)
- Total registers: 1728 (7%)
- Total memory bits: 319488 (53%)
- Embedded multipliers: 6 (5%)
- Fmax (worst case): 87.5 MHz

# Appendix B: Xilinx© ISE 14.1 synthesis test

Top-level module: G729A\_CODEC\_SDP\_SYN (synthesis test-bench) from file G729A\_codec\_sdp\_SYN.vhd.

Configuration parameters:

- REGISTER\_INPUTS = '0'
- REGISTER OUTPUTS = '0'
- USE ROM  $\overline{MIF} = '0'$ .

Target device: xc6vlx75t-2ff484.

Target Fmax: 143MHz (default synthesis options).. Results:

- Number of slice registers: 1574 (1%)
- Number of slice LUTs: 4284 (9%)
- Number of RAMB36: 18 (11%)
- Number of DSP48E1s: 4 (1%)

# **Appendix C: Simulation & Implementation Hints**

- The suggested starting point, after downloading the project from OpenCores site, is to run the self-test module simulation (related VHDL files are located in VHDL/SELF\_TEST folder). This simulation allows to verify that all design files are available and can be compiled correctly, and provides also an example of codec top-level module instantiation and interfacing to user logic.
- Once self-test module simulation runs successfully, a simple test on HW can be performed by
  implementing the self-test module itself on an FPGA board (so far the module has been
  successfully implemented on Cyclone III, Virtex-5, Virtex-6, Spartan-6 and Artix-7 FPGA's).
  The simplest approach is to connect self-test module clock and reset inputs to the board FPGA
  clock and cpu/user reset pins and to connect self-test module pass and done outputs to board
  LED's (if available). Pay attention to polarity of input/output signals (self-test module signals are
  active-high, while some board signal may be active-low). A more sophisticated test may consist

in adding a clock generator core (a PLL) and derive codec reset input from the clock generator lock/stable signal.

• When targeting Virtex-5 FPGA family (or older ones), add "-use\_new\_parser yes" to synthesis other options (the default parser used for such FPGA families doesn't synthesize some portion of VHDL code correctly). Check synthesis report file to verify option has been properly specified.