Design and Implementation of a Multiplier free FPGA based OFDM Transmitter

Orthogonal Frequency Division Multiplexing (OFDM) is an efficient multi-carrier technique. The core operation in the OFDM systems is the FFT/IFFT unit that requires a large amount of hardware resources and processing delay. The developments in implementation techniques likes Field Programmable Gate Array (FPGA) technologies have made OFDM a feasible option. The goal of this paper is to design and implement an OFDM transmitter based on Altera FPGA using Quartus software. The proposed transmitter is carried out to simplify the Fourier transform calculation by using decoder instead of multipliers. After programming ALTERA DE2 FPGA kit with implemented project, several practical tests have been done starting from monitoring all the results of the implemented blocks (VHDL code) and compare them with corresponding results from simulation system implemented in matlab 2010a. The results of these practical tests show that the suggested approach gives a significant improvement in reducing complexity and processing delays (45 nsec) in comparison with the conventional implementations of OFDM transmitter.

The core block of OFDM transmitter is the transform block (IDFT or IFFT) because it takes the most hardware resources that used to implement OFDM system.Furthermore this block required relatively large processing time.To minimize processing time as well as complexity of the OFDM transmitter the transform calculation must be developed.

A. DFT and FFT
A complete direct calculation of a N-point DFT requires (N-1) 2 complex multiplications and N×(N-1) complex additions.It can be seen that the computational complexity is in the order of N 2 .
For N >8, direct calculation of the IDFT is too computational intensive and not practical for implementation in hardware as listed in Table 1.
So the idea of FFT (i.e.IFFT) is brought forward.The total number of complex multiplications is reduced to (/2 log 2 ) in FFT and the total number of complex addition is reduced to ( log 2 ), but for a large N i.e. (N>32), FFT also be not efficient [Ludman, 1987].[Ludman, 1987] It is instructive to view the DFT and IDFT as linear transformations on sequences and , respectively.Let us define an -point vector of the signal sequence , and an -point vector of frequency samples, and an matrix as

B. DFT and IDFT in Matrix Form
can be simplified by writing the powers of all elements in modulonotation, instead of the previous form, i.e.
With these definitions, the -point DFT may be expressed in matrix form as: , eq. ( 3) where is the matrix of the linear transformation.We observe that is a symmetric matrix.If it is assumed that the inverse of exists, then eq. ( 3) can be inverted by premultiplying both sides by .Thus we obtain: , eq. ( 4) But this is just an expression for the IDFT.In fact, the IDFT can be expressed in matrix form as: , eq. ( 5) where denotes the complex conjugate of the matrix .Comparison of eq. ( 4) with eq. ( 5) leads to conclude that: eq. ( 6) which, in turn, implies that: eq. ( 7) where is an identity matrix.Therefore, the matrix in the transformation is an orthogonal (unitary) matrix.Furthermore, its inverse exists and is given as .

II. PROPOSED METHOD TO IMPLEMENT IDFT FOR OFDM TRANSMITTER [Esttaifan, 2011]
For the OFDM transmitter shown in Fig 1 , the input sequence (data) must be either '0' or '1'; the mapper converts the input data to complex symbols according to modulation type.The inputs to the IDFT block are depending on mapper type (modulation type i.e.BPSK, QPSK or QAM).So the circuit that uses to calculate the IDFT can be simplified according to the limited possible values for mapper output as seen in Table 2.
The proposed method can be briefed by using decoder instead of all multipliers to choose between cases according to input data Then the answer must be: According to eq. ( 8), 4 multipliers and 2 adders must be used to find the result of multiplying two complex numbers given, but if the result of (3a) and (3b) were saved in memory {(3a) and (3b) are represents the scaled twiddle factors matrix (XW N )}, so, only calling them from memory and adding them to find the answer and the complex multiplication be just two adder.
For the above example (16-QAM modulator) it is important to save (a) and (b) with the presence of (3a) and (3b) matrices in memory too ((a) and (b) are represents the conventional twiddle factors matrix (W N )), and using decoder for mapping (or selecting) between cases to enable adders to complete calculations.Fig. 2 represents the proposed OFDM transmitter (including: S/P, mapper, IDFT, cyclic prefix adder, P/S).
• S/P represented by D-FFs; the decoder is represents the mapper, it will control the adder circuit which acts as IDFT block.
• Cyclic prefix adder and P/S are built in control unit block which is also responsible of clock generation, managing the start of transmission, giving the control signals (read, write and addresses) to all memories and synchronizing between all blocks.
• Each adder block is responsible of finding only one IDFT element at output, this element is the resultant of adding or subtracting N various twiddle factors according to IDFT matrix.It is more suitable to rename adder block to Cumulative Adder Block (CAB).Each CAB has two adders work concurrently, one to calculate real part and the other for imaginary.
• Every Cumulative Adder Block has two registers to save the cumulative results, one register to save real part and the other to save imaginary part.The length of each register is equal to the precision of twiddle factors (p).Initially, the contents of the two registers will be zeros, then CAB call these contents and add (or subtract) them with the new data and the results will be written on the registers.
• The processing algorithm in this proposed method has two fields, first, pipeline processing which is done by each cumulative adder block internally to find individual (each one) IDFT output, this processing algorithm will done N times; secondly, concurrent processing, it is done by the N cumulative adders blocks that works in parallel.
• D-FFs and Decoder will work serially, also Control unit and memory works serially but they are concurrently processing with respect to D-FF and Decoder together so the processing time of the two groups must be calculated and the larger one must be taken in account only, and must be added to the delay of one cumulative adder block (all adders are parallel to each other) to obtain the total processing time for OFDM transmitter.

III. COMPLEXITY AND TIME CALCULATIONS FOR PROPOSED OFDM TRANSMITTER [Esttaifan, 2011]
The internal construction of D-FF, decoder, Memory, CAB and control unit (Fig. 2) will used to calculate processing time.

Bashar Adel Esttaifan
Design and Implementation of a Multiplier Oday AbdulLateef free FPGA based OFDM Transmitter Waleed Ameen Mahmoud 1059 TABLE 3 will summarize the number of blocks and processing time for a specific number of IDFT point (N) in addition of number of point in constellation (mapper type) (M), gate propagation delay (G) and the precision of twiddle factor (P).The total processing time can be written: P T ={Larger of [(P D-FFs +P decoder ), (P Memory +P control )]}+P CAB eq. ( 9) Where: P decoder = processing time due to decoder = 2G.P Memory = processing time due to memory = 3G.P control = processing time due to control unit = 3G.P CAB = processing time due to Cumulative Adder Block = 3G.
The complexity of DAC and ADC is reduced because no multiplier used, when multiplying data (16 bit length) with twiddle factor (16 bit length) the result must be 30 bit length so it was not efficient to use converters with 30 bit complexity.

IV. IMPLEMENTED OFDM TRANSMITTER [Esttaifan, 2011]
To be familiar with this proposed method an example must be used to prove its efficiency against other methods that used to implement OFDM transmitter, it assumed QPSK (M=4) modulator as mapper, IFFT length (N) will be 16 and cyclic prefix added was 4, so in Fig. 3, according Table 3: One D-FF may be used to store data, decoder (2:4) to select one of the four cases according to the input came from D-FF output (Q) and the other input before D-FF (Q+1) so the Decoder must enabled once every two clock.
After first clock to D-FF, in1 and in2 (two inputs of Decoder in Fig. 3) will be 1 and 1 respectively so that Op1, Op2 and Op3 must be inactive i.e. '0' and Op4 will be active '1' which is represents (-1+0i) in QPSK constellation (from Decoder truth table: Table 4).The Cumulative adder blocks are modified to represent the four cases of mapper: ) *(a + bi) = -abi So that only the conventional twiddle factors matrices (a and b) are needed in calculations (in QPSK), the memories (ROM) attached to first cumulative adder block must contain the first row of IDFT matrix, at the same manner the memories attached to second cumulative adder block must contain the second row of IDFT matrix and so on.For the given x(n) in this example, Table 5 shows the contents of two registers that attached to first CAB to save the cumulative results, the contents of registers must be updated after each input (2 bit for QPSK), where a 1 to a 16 are the real parts of the twiddle factors for the first row in IDFT matrix and b 1 to b 16 are the imaginary parts of the twiddle factors for the first row in IDFT matrix.
The difference (error) between matlab results and the proposed system did not exceed (0.001) at worst cases which is clearly shown in Table 6.
A monitoring program was implemented inside FPGA as well as OFDM transmitter to show IDFT results for any input sequence.This program used Red LEDs numbered from LEDR0 to LEDR14 to view any element; to show an element is just to select its sequence by using switches numbered from SW0 to SW4 (from 0000 to 1111).By using the monitoring program, the result x(n) can be shown in Fig. 5.

VI. COMPARISON BETWEEN THE PROPOSED METHOD AND OTHERS
We implemented OFDM transmitter with the following parameters to compare the results with other official and standard cores, N=256, with BPSK mapper (M = 2), according to equation ( 9) and the obtained practical results the total propagation delay is: If the gate propagation delay (G) is typically assumed to be 5 nsec then P T =9 * 5 nsec = 45 nsec This 45 nsec will be the processing time(P T )for any number of IDFT length (N) after receiving the final bit of data for BPSK mapper (M=2) because P T did not depend on N but it related to mapper type (M) as shown in TABLE 3 and eq(9).
A comparison between multi -IDFT (or IFFT) cores with respect to complexity was carried out in Table 7 from practical implementations.

VII. CONCLUSIONS
1.The number of complex multipliers in the proposed method that used in transmitter was ZERO so that the processing time significantly decreased.2. The number of complex adders in transmitter was reduced because of the pipelining in calculations.

Seq input
Corresponding constellation

Contents of Result1_imag register
. The twiddle factors (W N = Re +i Im) must be saved in memory to use them in calculations of IDFT in proposed OFDM transmitter, two ROMs must be used to save twiddle factors (W N ) (one for real (Re) and the other for imaginary parts (Im)), but in the proposed methods, an extra memory cells must be used to save the scaled twiddle factors (XW N ).The scaled twiddle factors (XW N ) can be explained + i b) and ((-3+3i): a point in 16-QAM modulator) total processing time in G. P D-FFs = processing time due to D-FFs = delay due to one D-FF× number of D-FFs = 2G*([log 2 (M)]-1).

Figure 4 .
Figure 4. schematic diagram of the implemented OFDM transmitter generated by Quartus 10.0