Est. read time: 7 minutes | Last updated: January 17, 2025 by John Gentile


Contents

Open In Colab

import numpy as np
import matplotlib.pyplot as plt

from rfproto import measurements, nco, plot

NCO Basics and Characteristics

A Numerically Controlled Oscillator (NCO) is a way to digitally- and efficiently- generate sinusoidal signals. It has a basic structure consisting of:

  • Input phase increment value, Δθ\Delta\theta, or Frequency Control Word (FCW), which controls how quickly the phase rotates per clock cycle. Like a phasor, the speed at which phase rotates is related to the frequency of the resulting sinusoid.
  • A phase accumulator stage which adds the PCW each cycle. Again similar to a phasor, this acts as the integrator of phase (as frequency is the integral of phase). The phase accumulator is a fixed-point integer that is expected to roll-over back to 0 beyond its terminal value, similar to phase being bounded from 02π0\rightarrow 2\pi. In this way, an NN bit-long phase accumulator represents 0 radians when equal to 0, and 2π2\pi radians when equal to 2N12^{N}-1.
  • A phase truncation (or quantizer) stage which takes the PP most-significant-bits from the phase accumulator to form the lookup table (LUT) index. PP is equal to the number of address bits in the associated wave LUT (or read-only memory (ROM)). PP is often much smaller than the phase accumulator length NN, such that the associated LUT size is not prohibitively large. However, the phase trunctation process introduces periodic phase errors in the output, causing a form of spurious output noise.
  • A LUT which takes an index/address and outputs an associated output sample value. This LUT is nominally filled with the full-cycle of a sinusoid, such that the input address represents the associated phase for a given output value. In this way, the LUT is also known as a phase-to-amplitude converter. The LUT samples have a quantized value of MM bits, representing the signed, fixed-point integer values of the sinusoid’s amplitude at the associated phase value/index. The quantization of a fixed point value to MM bits also causes a form of output noise, compared to ideal sinusoidal value.

image.png

f  = 440.5 # desired output frequency
n  = 2**16 # number of output points to compute (power of 2 for FFTs)
fs = 48000 # sampling frequency
N  = 32    # phase accumulator length (num bits)
P  = 11    # LUT table address length (total depth = 2^P)
M  = 16    # quantized word length (num bits)

The estimated Spurius-Free Dynamic Range (SFDR) is based on the amount of Sine samples in the pre-determined look-up table (LUT). This LUT depth can be expressed in how many bits (PP) it takes to address all samples in a LUT (e.g. total samples equals 2N2^N address bits):

SFDR (dB)=6.02P3.92\text{SFDR (dB)} = 6.02P - 3.92

The esimated Noise Floor is mainly driven on the amount of quantization of our Sine samples in the LUT; the higher the number of bits of our samples (MM), the less quantization noise:

Noise Floor (dBFS)=6.02M+1.76\text{Noise Floor (dBFS)} = 6.02M + 1.76
plt.subplot(1,2,1)
SFDR_range = [(6.02*P)-3.92 for P in range(5,17)]
plt.plot(list(range(5,17)),SFDR_range)
plt.xlabel("Sine LUT Address Width (bits)")
plt.ylabel("Est. SFDR (dB)")
plt.subplot(1,2,2)
noise_range = [-measurements.ideal_SNR(M) for M in range(8,33)]
plt.plot(list(range(8,33)),noise_range)
plt.xlabel("Sine LUT Word Length (bits)")
plt.ylabel("Est. Noise Floor (dBFS)")
plt.tight_layout()

png

The frequency resolution of DDS’s are excellent given the ease of using a phase accumulator with a nominal word length (e.x. 32 bits) to keep track of the phase that the DDS should transform into sinusoid samples. The frequency resolution of a given phase accumulator can be given by the number of bits for the accumulator (NN) and the sampling (or DDS synchronous) frequency (FsF_{s}):

Frequency Resolution (Hz)=Fs2N\text{Frequency Resolution (Hz)} = \frac{F_{s}}{2^N}
print("Frequency Resolution: {:.6f} Hz".format(fs/(2**N)))

Frequency Resolution: 0.000011 Hz

The amount to increment the phase accumulator, Δθ\Delta\theta, by each clock cycle, to get a specific output frequency (fof_{o}) can be found by:

Δθ=fo2NFs\Delta\theta = f_{o}\frac{2^N}{F_{s}}

This is also known as the tuning of Frequency Control Word (FCW).

The Sine Look-Up Table (LUT) can be generated given a LUT address length (PP) and the desired output signed-integer bitwidth (MM); essentially, one full cycle of a sine wave is generated and quantized to the full-scale sample vale.

test_NCO = nco.Nco(N, M, P, fs)
plot.samples(test_NCO.LUT.imag, "LUT Samples")
plt.show()

png

The DDS output can now be created at each clock cycle by:

  • Incrementing the phase accumulator by our previously determined ΔPhase\Delta\text{Phase} value
  • Using the upper PP bits of the phase accumulator (right-shift phase accumulator by NPN-P bits) to directly address the pre-generated Sine LUT, which gives the the output value
y = np.zeros(n)    # initialize output sample vector (statically sized)
test_NCO.SetOutputFreq(f)
for i in range (n):
    # just take imag part (starts at 0) for this
    y[i] = test_NCO.Step().imag

The sample outputs of the DDS can be shown below:

plot.samples(y[0:int(2*fs/f)], "DDS Output")
plt.show()

png

The DDS Output Spectrum can be shown as

plot.spec_an(y, fs, "DDS Output Spectrum", scale_noise=True, norm=True)
plt.show()

png

Quarter-Wave LUT NCO

Since the full period of a sine wave is symmetrical, only 1/4 of the total sinusoid’s sample values need to be stored in the LUT.

image.png

In this case, the associated phase can determine how to index into the LUT table and/or invert the LUT output to create a full period sine wave:

yNCO(θ)={ LUT[θ],0θ<π/4 LUT[πθ],π/4θ<π/2LUT[θπ],π/2θ<3π/4LUT[2πθ],3π/4θ<2πy_{NCO}(\theta)= \begin{cases} \text{ LUT}[\theta],& 0 \leq \theta < \pi/4 \\ \text{ LUT}[\pi - \theta],& \pi/4 \leq \theta < \pi/2 \\ -\text{LUT}[\theta - \pi],& \pi/2 \leq \theta < 3\pi/4 \\ -\text{LUT}[2\pi - \theta],& 3\pi/4 \leq \theta < 2\pi \\ \end{cases}
quart_NCO = nco.Nco(N, M, P, fs, quarter_wave=True)
plot.samples(quart_NCO.LUT.imag, "LUT Samples")
plt.show()

png

y = np.zeros(n)    # initialize output sample vector (statically sized)
quart_NCO.SetOutputFreq(f)
for i in range (n):
    # just take imag part (starts at 0) for this
    y[i] = quart_NCO.Step().imag

plot.samples(y[0:int(2*fs/f)], "DDS Output")
plt.show()

png

plot.spec_an(y, fs, "1/4 Wave DDS Output Spectrum", scale_noise=True, norm=True)
plt.show()

png

Phase Dithering

Periodic phase truncation/quantization effects cause spurs in the total spectrum. Near-in phase noise can be traded for lower far-out spurs and total SFDR by dithering (adding some random bits of noise) the phase accumulator.

dither_SFDR = []
y_dith = np.zeros(n)
max_SFDR = 0
max_SFDR_bits = 0

for dither_bits in range(2, N - 4):
    dith_NCO = nco.Nco(N, M, P, fs, quarter_wave=True, dither=True, dither_bits=dither_bits)
    dith_NCO.SetOutputFreq(f)
    for i in range (n):
        y_dith[i] = np.imag(dith_NCO.Step())
    dSFDR = measurements.SFDR(y_dith, fs, norm=True)
    meas_SFDR = dSFDR["SFDR"]
    dither_SFDR.append(meas_SFDR)
    if meas_SFDR > max_SFDR:
        max_SFDR = meas_SFDR
        max_SFDR_bits = dither_bits

plt.plot(np.linspace(2, N-4, len(dither_SFDR)), dither_SFDR, '--bo')
plt.xlabel("Number of dither bits + phase accumulator")
plt.ylabel("SFDR (dB)")
plt.show()

dith_NCO = nco.Nco(N, M, P, fs, quarter_wave=True, dither=True, dither_bits=max_SFDR_bits)
dith_NCO.SetOutputFreq(f)
for i in range (n):
    y_dith[i] = np.imag(dith_NCO.Step())
plot.spec_an(y_dith, fs, "1/4 Wave DDS Output Spectrum (w/phase dithering)", scale_noise=True, norm=True)
plt.show()

print(f"Best SFDR at {max_SFDR_bits} bits of added dither, given phase acc. length of {N} bits and {P} LUT address bits")

png

png

Best SFDR at 18 bits of added dither, given phase acc. length of 32 bits and 11 LUT address bits

Taylor Series

The Taylor Series expansion is a powerful method to decompose a function into an infinite sum of terms that are experessed in terms of a the function’s derivatives at a single point. As such, the Taylor series of trigonometric functions can be used to approximate a sin()\sin() function as:

sin(θ)=n=0(1)n(2n+1)!θ2n+1θθ33!+θ55!θ77!,for all θ (Radians)\sin(\theta) = \sum_{n=0}^{\infty}\frac{(-1)^{n}}{(2n + 1)!}\theta^{2n + 1} \rightarrow \theta - \frac{\theta^{3}}{3!} + \frac{\theta^{5}}{5!} - \frac{\theta^{7}}{7!}\dots,\quad \text{for all }\theta \text{ (Radians)} cos(θ)=n=0(1)n(2n)!θ2n1θ22!+θ44!θ66!,for all θ (Radians)\cos(\theta) = \sum_{n=0}^{\infty}\frac{(-1)^{n}}{(2n)!}\theta^{2n} \rightarrow 1 - \frac{\theta^{2}}{2!} + \frac{\theta^{4}}{4!} - \frac{\theta^{6}}{6!}\dots,\quad \text{for all }\theta \text{ (Radians)}
def factorial(x):
    if x > 1:
        return x * factorial(x-1)
    else:
        return 1

def sin_ts(x):
    p3 = (x**3)/(factorial(3))
    p5 = (x**5)/(factorial(5))
    p7 = (x**7)/(factorial(7))
    # 11th order gives diminishing results in output precision...
    #p11 = (x**11)/(factorial(11))
    return x - p3 + p5 - p7

def sin_ts_compress(x):
    if x < np.pi / 2:
        return sin_ts(x)
    elif x < np.pi:
        return sin_ts(np.pi - x)
    elif x < 3 * np.pi / 2:
        return -sin_ts(x - np.pi)
    else: # < 2*pi
        return -sin_ts(2*np.pi - x)

num_compares = 1000
sin_err = np.zeros(num_compares)
comp_sin_err = np.zeros(num_compares)
for i in range(num_compares):
    theta = 2 * np.pi * (i/num_compares)
    ref = np.sin(theta)
    taylor = sin_ts(theta)
    comp_taylor = sin_ts_compress(theta)
    if ref != 0.0:
        sin_err[i] = abs(100.0 * (ref - taylor)/ref)
        comp_sin_err[i] = abs(100.0 * (ref - comp_taylor)/ref)
    else:
        sin_err[i] = 0.0
        comp_sin_err[i] = 0.0
    if comp_sin_err[i] > 0.1:
        print(f"Annomaly at [theta = {theta} rads] ref = {ref} | comp = {comp_taylor}")

plt.plot(np.linspace(0, 2*np.pi, num_compares), sin_err, label="Taylor series")
plt.plot(np.linspace(0, 2*np.pi, num_compares), comp_sin_err, label="Taylor series (folded)")
plt.xlabel("x (Radians)")
plt.ylabel("% Error")
plt.ylim((0, 0.1))
plt.legend()
plt.show()

Annomaly at [theta = 3.141592653589793 rads] ref = 1.2246467991473532e-16 | comp = -0.0

png

It can be seen that Taylor Series is very accurate for small angles/phase, but starts to be very innacurate as phase approaches (and exceeds) π/2\pi/2 Radians. However, the use of a couple multiplies to find a very accurate sin()\sin() and cos()\cos() value at small phase angles can be exploited for a Taylor-Series Corrected NCO.

As seen from the phase-dithered NCO, the main source of spurs is from the phase truncation process of keeping only the NPN - P MSBs of the phase accumulator to index the LUT. If the phase accumulator matched the LUT address size completely, there would be no spurs and only quantization noise from the discrete sinusoid samples stored in the LUT. However, this is often impractical for many applications as it would require either:

  • An impractically large LUT size for a phase accumulator with enough precision for fine frequency output control or
  • An impractically small output frequency resolution if a small enough phase accumulator width was chosen to match a practical LUT address size. For example, a system where Fs=100MHzF_{s}=100\text{MHz} and N=P=10N = P = 10 bits, would have a very coarse output frequency resolution of almost 100kHz100\text{kHz}. However, for some systems this may be OK!

For NCOs used in applications like Digital Down Converters (DDCs) though, having high sprectral purity and high frequency precision is critical, as each spur acts like a separate mixer in this case! A fundamental way to think of the issue is to notice that the phase accumulator represents a given phase angle θ\theta at any time with a range of 02π0 \rightarrow 2\pi. When truncated, the “dropped” LSBs of the phase accumulator represent the unaccounted phase error, θerr\theta_{err}, as the LUT is indexed with the rounded/truncated MSBs representing θLUT\theta_{LUT}. Since θ=θerr+θLUT\theta = \theta_{err} + \theta_{LUT}, we can exploit the trigonometric identities of angle sums to use the dropped phase error θerr\theta_{err} in creating a full-precision sinusoidal output (and for both real and imaginary outputs of the NCO):

sin(θLUT+θerr)=sin(θLUT)cos(θerr)+cos(θLUT)sin(θerr)\sin(\theta_{LUT} + \theta_{err}) = \sin(\theta_{LUT})\cos(\theta_{err}) + \cos(\theta_{LUT})\sin(\theta_{err}) cos(θLUT+θerr)=cos(θLUT)cos(θerr)sin(θLUT)sin(θerr)\cos(\theta_{LUT} + \theta_{err}) = \cos(\theta_{LUT})\cos(\theta_{err}) - \sin(\theta_{LUT})\sin(\theta_{err})

While it can be seen that applying the correction is rather trivial, even for digital hardware implementations (it’s a couple multiplies and two addition/subtraction operations), now the issue is how to compute the values sin(θerr)\sin(\theta_{err}) and cos(θerr)\cos(\theta_{err}).

One way that has been discussed before is to use the small-angle approximation (e.g. sin(θ)θ\sin(\theta) \approx \theta) to simply feed-forward the truncated LSBs to the angle sum and difference output, as θerr\theta_{err} is always a relatively small angle. However, in simulation, this still does not result in the best output spectrum.

Another method is to compute sin(θerr)\sin(\theta_{err}) and cos(θerr)\cos(\theta_{err}) using a second (and sometimes third) look-up table indexed by the truncated bits. This method uses more memory (for the additional LUTs), but can still be prohibitive if the truncated phase is still large. However in this case, a third LUT is usually added, for example to break up a 32 bit phase accumulator into: a primary 12 bit LUT, 10 bit secondary and 10 bit final LUTs. This approach still requires multipliers and addition circuits to implement the angle sum and difference though.

Given modern devices with many multiplier circuits, and the fact that Taylor-Series Approximation is very accurate for small phase angles, we can use a 2nd Order sin()\sin() and cos()\cos() approximation to get and apply the sin(θerr)\sin(\theta_{err}) and cos(θerr)\cos(\theta_{err}) values.

# TODO: figure out why Taylor Series correction- even floating-point model- has issues in 1/4 wave mode, likely due to 
#  LUT index being different than straight phase_acc truncation
taylor_NCO = nco.Nco(N, M, P, fs, quarter_wave=False, dither=False, taylor_corr=True)
taylor_NCO.SetOutputFreq(12200)
#n = 100
y_taylor = np.zeros(n) + 1j*np.zeros(n)
for i in range (n):
    y_taylor[i] = taylor_NCO.Step()
plot.spec_an(y_taylor, fs, "Taylor Series NCO", scale_noise=True, norm=True)
plt.show()

png

Now there are no obvious spurs in the output spectrum and the noise floor is at the limit of the given quantization level of the LUT samples!

Note that phase accumulator dithering is also now moot and can be seen to have no effect on the output spectrum. The phase noise spreading effect of dithering is not necessary with Taylor Series correction of truncated phase.

References