0%

FPGA数字信号处理之加法器与乘法器的设计

本节主要介绍了常见的加法器与乘法器的加速设计方法

加法器

1.Ripple Carry Adder(串行逐个进位加法器)

  • Ripple Carry Adder就是串联加法器,前一个加法器的进位会连接到下一个加法器的进位输入

    image-20231031150011522
  • Ripple Carry Adder的关键路径延迟为:
    $$
    T_{RCA}=(N-1)T_{FA}+T_m
    $$

    • $T_{FA}$是全加器的延时,$T_{m}$是全加器进位产生的延迟
  • ripple_carry_adder.v

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    module ripple_carry_adder #(parameter W=16) (
    input clk,
    input [W-1:0] a, b,
    input cin,
    output reg [W-1:0] s_r,
    output reg cout_r
    );

    wire [W-1:0] s;
    wire cout;
    reg [W-1:0] a_r, b_r;
    reg cin_r;

    assign {cout,s} = a_r + b_r + cin_r;

    always@(posedge clk) begin
    a_r<=a;
    b_r<=b;
    cin_r<=cin;
    s_r<=s;
    cout_r<=cout;
    end
    endmodule
    • 原理图如下:

      image-20231031150916675

2.Carry Look-ahead Adder(超前进位加法器)

  • 对于Ripple Carry Adder加法器,就4位加法器算法而言,在最坏情况下,做一次加法运算需要经过4个全加器的传输时间才能得到最终运算结果,运算的延时主要是由于进位的延迟引起的,为了提高运算速度,必须尽量减小由于进位信号逐级传递所耗费的时间

  • Carry Look-ahead Adder(CLA)可以有效的解决这一问题

    • 进位输出位方程:
      $$
      c_{i+1}=a_i\cdot b_i+c_i\cdot(a_i\bigoplus b_i)
      $$

      • $\bigoplus $代表异或,$\cdot$代表与门
    • 其中,进位产生项$g_i$:
      $$
      g_i=a_i\cdot b_i
      $$

    • 进位传播项$p_i$:
      $$
      p_i=a_i \bigoplus b_i
      $$

    • 则$c_{i+1}$:
      $$
      c_{i+1}=g_i+c_i\cdot p_i
      $$

    • 而输出结果位$s_i$:
      $$
      s_i=p_i\bigoplus c_i
      $$

  • 故CLA的主要思想是先对每位计算$g_i$和$p_i$的值,然后运用它们来计算进位$c_{i+1}$和$s_i$,这避免了进位串行传播通过进位链,从而实现超前进位的目的

  • 一般是构建4bit的CLA,否则资源占用会过大:

    image-20231031152635985

  • 这里给出位数可变的通用CLA的设计文件BinaryCarryLookaheadAdder.v

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    module BinaryCarryLookaheadAdder # (parameter N=16)(
    input [N-1:0] a,b,
    input c_in,
    output reg [N-1:0] sum,
    output reg c_out
    );

    reg [N-1:0] p, g, P, G;
    reg [N:0] c;
    integer i;

    always @(*) begin
    for (i=0;i<N;i=i+1) begin // Generate all ps and gs
    p[i] = a[i] ^ b[i];
    g[i] = a[i] & b[i];
    end
    end

    always @(*) begin
    // Linearly apply dot operators
    P[0] = p[0];
    G[0] = g[0];
    for (i=1; i<N; i=i+1) begin
    P[i] = p[i] & P[i-1];
    G[i] = g[i] | (p[i] & G[i-1]) ;
    end
    end

    always@(*) begin //Generate all carries and sum
    c[0] = c_in;
    for(i=0;i<N;i=i+1) begin
    c[i+1] = G[i] | (P[i] & c[0]);
    sum[i] = p[i] ^ c[i];
    end
    c_out = c[N];
    end

    endmodule
    • 原理图如下:

      Snipaste_2023-10-30_19-46-30

      3.Carry Select Adder(进位选择加法器)

  • Carry Select Adder采用面积换速度的思想,由于进位不确定导致下一级加法器只能等待,反正进位不是0就是1,那么可以提前算出1或者0的情况,最后再由进位去选择算好的结果

    image-20231031154457762
    • 图中所说的第二级其实只是做了MUX的判断,具体需要结合代码分析理解
  • HierarchicalCSA.v

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    module HierarchicalCSA(a, b, cin, sum, c_out);
    input [15:0] a,b;
    input cin;
    output c_out;
    output [15:0] sum;

    wire c4, c8, c8_0, c8_1, c12_0, c12_1, c16_0, c16_1, c16L2_0, c16L2_1;
    wire [15:4] sumL1_0, sumL1_1;
    wire [15:12] sumL2_0, sumL2_1;

    // Level one of hierarchical CSA
    assign {c4,sum[3:0]} = a[3:0] + b[3:0] + cin;
    assign {c8_0, sumL1_0[7:4]} = a[7:4] + b[7:4] + 1'b0;
    assign {c8_1, sumL1_1[7:4]} = a[7:4] + b[7:4] + 1'b1;

    assign {c12_0,sumL1_0[11:8]} = a[11:8] + b[11:8] + 1'b0;
    assign {c12_1,sumL1_1[11:8]} = a[11:8] + b[11:8] + 1'b1;

    assign {c16_0, sumL1_0[15:12]} = a[15:12] + b[15:12] + 1'b0;
    assign {c16_1, sumL1_1[15:12]} = a[15:12] + b[15:12] + 1'b1;

    // Level two of hierarchical CSA
    assign c8 = c4 ? c8_1 : c8_0;
    assign sum[7:4] = c4 ? sumL1_1[7:4]: sumL1_0[7:4];
    // Selecting sum and carry within a group
    assign c16L2_0 = c12_0 ? c16_1 : c16_0;
    assign sumL2_0[15:12] = c12_0 ? sumL1_1[15:12] : sumL1_0[15:12];
    assign c16L2_1 = c12_1 ? c16_1 : c16_0;
    assign sumL2_1[15:12] = c12_1 ? sumL1_1[15:12]: sumL1_0[15:12];

    // Level three selecting the final outputs
    assign c_out = c8 ? c16L2_1 : c16L2_0;
    assign sum[15:8] = c8 ? {sumL2_1[15:12], sumL1_1[11:8]} : {sumL2_0[15:12], sumL1_0[11:8]};

    endmodule
    • 原理图如下:

      Snipaste_2023-10-30_22-23-02

4. Carry Save Adder(进位保留加法器)


乘法器

1.Wallace Tree乘法器

2.Booth乘法器

2.1 Radix2 Booth

  • Booth算法的目的是解决有符号乘法运算中复杂的符号修正问题,同时也能减少部分乘积项(PP)的数量

  • 设n位乘法B的补码形式为:
    $$
    \begin{aligned}
    B &= -B_{n-1}\cdot 2^{n-1}+B_{n-2}\cdot 2^{n-2}+\dots+B_1\cdot 2^1+B_0\cdot 2^0\
    &= -B_{n-1}2^{n-1}+(\sum_{i=0}^{n-2}B_i\cdot2^i)
    \end{aligned}
    $$

  • 对上式做一些变换后可得到:
    $$
    B=\sum_{i=0,B_{-1}=0}^{n-1}(-B_i+B_{i-1})\cdot 2^i=\sum_{i=0}^{n-1}E_i\cdot 2^i
    $$

  • 则有两个有符号乘积结果$P$:
    $$
    P=A\times B=A\times(\sum_{i=0}^{n-1}E_i\cdot 2^i) = \sum_{i=0}^{n-1}(A\times E_i)\cdot 2^i
    $$

    • 其中:$E_i=-B_i+B_{i-1},0\le i \le n-1,B_{-1}=0$
    • $E_i$称为编码值,$A\times E_i$称为部分积
    image-20231031162452338
  • 缺点:对于101010101010这样的乘数,需要计算的部分积个数没有变,反而多了编码的步骤,计算延迟增加,得不偿失,故实际中很少使用,而采用其优化而来的Radix4 Booth算法

2.2 Radix4 Booth

  • 对于有符号数补码而言,扩展其最高位对数值并无影响。因此任何奇数位宽的有符号数补码都可以看成是偶数位宽

  • 设一个二进制乘数B,位宽为n,且n为偶数,则B可以表示为:
    $$
    B=\sum_{i=0,B_{-1}=0}^{\frac n2-1}(-2B_{2i+1}+B_{2i}+B_{2i-1})\cdot 2^{2i}=\sum_{i=0}^{\frac n2 -1}E_i\cdot 4^i
    $$

  • 则有两个有符号乘积结果$P$:
    $$
    P=A\times B=A\times \sum_{i=0}^{\frac n2 -1}E_i\cdot 4^i =\sum_{i=0}^{\frac n2 -1}(A\times E_i)\cdot 4^i
    $$

    • 其中,$E_i=-2B_{2i+1}+B_{2i}+B_{2i-1},0\le i\le \frac n2 -1,B_{-1}=0$
    • 且$\frac n2$为部分积的个数
  • Radix4 Booth算法每次编码的位数为3bit,使得部分积的数目减少为原始Booth算法的一半

    image-20231031163518501
  • 而Radix4 Booth的编码结果如下:

    image-20231031163635108
  • 对除0外的操作都先对被乘数A的最高位添上一个符号位,再进行操作,+2A是左移,-2A代表取补码后左移(或者左移后再取补码也可)

  • Radix4 Booth编码实现的两个8bit有符号数相乘,输出为16bit有符号数

    image-20231031164644373
    • 相关代码:

    • Booth_mul_top.v

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      module Booth_mul_top(
      a_i,
      b_i,
      mul_o
      );
      parameter length = 8;
      input [length-1 : 0] a_i,b_i;
      output [length*2-1:0] mul_o;

      wire [length : 0] booth_o0 ;
      wire [length : 0] booth_o1 ;
      wire [length : 0] booth_o2 ;
      wire [length : 0] booth_o3 ;

      wire [length*2-1: 0] pp0 ;
      wire [length*2-1: 0] pp1 ;
      wire [length*2-1: 0] pp2 ;
      wire [length*2-1: 0] pp3 ;


      //module booth4code (a_i,b_i,booth_o);
      booth4code booth4_0 (a_i , {b_i[1:0],1'b0} , booth_o0 );
      booth4code booth4_1 (a_i , b_i[3:1] , booth_o1 );
      booth4code booth4_2 (a_i , b_i[5:3] , booth_o2 );
      booth4code booth4_3 (a_i , b_i[7:5] , booth_o3 );


      assign pp0 = {{ 7{booth_o0 [length]}} , {booth_o0 }}; //2*0
      assign pp1 = {{ 5{booth_o1 [length]}} , {booth_o1 } , { 2{1'b0}}}; //<< 2
      assign pp2 = {{ 3{booth_o2 [length]}} , {booth_o2 } , { 4{1'b0}}}; //<< 4
      assign pp3 = {{ 1{booth_o3 [length]}} , {booth_o3 } , { 6{1'b0}}}; //<< 6

      wire cout_0 ;

      wire [length*2 : 0] cpr_o_0 ;
      wire [length*2 : 0] cpr_o_1 ;


      //wallace tree
      //
      //first level of wallace tree: 42compressor
      //module compressor42 (in1,in2,in3,in4,cin,out1,out2,cout);
      //out1 needs to be multiplied by two (out1<<1)
      //module compressor42 (in1 ,in2 ,in3 ,in4 , cin ,out1 ,out2 ,cout );
      compressor42 compressor42_1_0 (pp0 ,pp1 ,pp2 ,pp3 , 1'b0 ,cpr_o_0 ,cpr_o_1 ,cout_0 );
      //本例中输入数据位宽是8,刚好一个42压缩即可,如果位宽较大,则需要进行多级位宽压缩
      wire cout;
      //carry lookahead adder
      //module cla (op1,op2,sum,cout);
      cla cla_0 (cpr_o_0[length*2-1:0]<<1 ,cpr_o_1[length*2-1:0] ,mul_o ,cout);

      endmodule
    • booth4code.v

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      module booth4code(
      a_i,
      b_i,
      booth_o
      );

      parameter length = 8;
      input [length-1 : 0] a_i; //full 8-bit input
      input [2:0] b_i; //3 of 8 bit input
      output reg [length : 0] booth_o; //booth output

      always @(*) begin
      case(b_i)
      3'b000 : booth_o <= 0;
      3'b001 : booth_o <= { a_i[length-1], a_i};
      3'b010 : booth_o <= { a_i[length-1], a_i};
      3'b011 : booth_o <= a_i<<1;
      //3'b100 : booth_o <= -(a_i<<1);
      3'b100 : booth_o <= (-{ a_i[length-1], a_i})<<1;
      3'b101 : booth_o <= -{a_i[length-1],a_i};
      3'b110 : booth_o <= -{a_i[length-1],a_i};
      3'b111 : booth_o <= 0;
      default: booth_o <= 0;
      endcase
      end

      endmodule
    • compressor42.v

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      module compressor42(
      in1,in2,in3,in4,cin,out1,out2,cout
      );

      parameter length = 8;
      input [length*2-1 : 0] in1,in2,in3,in4;
      input cin;
      output [length*2 : 0] out1,out2;
      output cout;
      wire [length*2-1 : 0] w1,w2,w3;

      assign w1 = in1 ^ in2 ^ in3 ^ in4;
      assign w2 = (in1 & in2) | (in3 & in4);
      assign w3 = (in1 | in2) & (in3 | in4);

      assign out2 = { w1[length*2-1] , w1} ^ {w3 , cin};
      assign cout = w3[length*2-1];
      assign out1 = ({ w1[length*2-1] , w1} & {w3 , cin}) | (( ~{w1[length*2-1] , w1}) & { w2[length*2-1] , w2});

      endmodule
    • cla.v

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      module cla(
      op1,
      op2,
      sum,
      cout
      );

      parameter width = 16 ;
      input [width-1:0] op1;
      input [width-1:0] op2;
      output [width-1:0] sum;
      output cout;

      wire [width>>2:0] c; //16是4位宽,右移2位是2
      assign c[0] = 1'b0;
      assign cout = c[width>>2];

      cla_4bit u_cla_4bit_0 (.op1( op1[ 0*4+3: 0*4] ),.op2( op2[ 0*4+3: 0*4] ),.cin( c[0 ] ),.sum( sum[ 0*4+3: 0*4] ),.cout( c[0 +1]));
      cla_4bit u_cla_4bit_1 (.op1( op1[ 1*4+3: 1*4] ),.op2( op2[ 1*4+3: 1*4] ),.cin( c[1 ] ),.sum( sum[ 1*4+3: 1*4] ),.cout( c[1 +1]));
      cla_4bit u_cla_4bit_2 (.op1( op1[ 2*4+3: 2*4] ),.op2( op2[ 2*4+3: 2*4] ),.cin( c[2 ] ),.sum( sum[ 2*4+3: 2*4] ),.cout( c[2 +1]));
      cla_4bit u_cla_4bit_3 (.op1( op1[ 3*4+3: 3*4] ),.op2( op2[ 3*4+3: 3*4] ),.cin( c[3 ] ),.sum( sum[ 3*4+3: 3*4] ),.cout( c[3 +1]));

      endmodule
    • cla_4bit.v

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      module cla_4bit(
      op1,op2,cin,sum,cout
      );

      parameter width = 4;

      input [width-1:0] op1;
      input [width-1:0] op2;
      input cin;
      output [width-1:0] sum;
      output cout;

      wire [width-1:0] g;
      wire [width-1:0] p;
      wire [width:0] c;

      //gp generator
      pg_gen u_pg_gen_0 (.a( op1[0]),.b( op2[0]),.g( g[0] ),.p( p[0] ));
      pg_gen u_pg_gen_1 (.a( op1[1]),.b( op2[1]),.g( g[1] ),.p( p[1] ));
      pg_gen u_pg_gen_2 (.a( op1[2]),.b( op2[2]),.g( g[2] ),.p( p[2] ));
      pg_gen u_pg_gen_3 (.a( op1[3]),.b( op2[3]),.g( g[3] ),.p( p[3] ));

      //carry generator
      assign c[0] = cin;
      assign c[1] = g[0] + ( c[0] & p[0] );
      assign c[2] = g[1] + ( (g[0] + ( c[0] & p[0]) ) & p[1] );
      assign c[3] = g[2] + ( (g[1] + ( (g[0] + (c[0] & p[0]) ) & p[1])) & p[2] );
      assign c[4] = g[3] + ( (g[2] + ( (g[1] + ( (g[0] + (c[0] & p[0]) ) & p[1])) & p[2] )) & p[3]);
      assign cout = c[width];

      //sum generator
      assign sum[0] = p[0] ^ c[0];
      assign sum[1] = p[1] ^ c[1];
      assign sum[2] = p[2] ^ c[2];
      assign sum[3] = p[3] ^ c[3];

      endmodule
    • pg_gen.v

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      module pg_gen(
      a,b,g,p
      );

      input a;
      input b;
      output g;
      output p;

      assign g = a & b;
      assign p = a ^ b;

      endmodule
    • Testbench

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      module Booth_mul_tb;
      reg [7 : 0] a_i,b_i;
      wire [15 : 0] mul_o;


      Booth_mul_top u0(
      .a_i(a_i),
      .b_i(b_i),
      .mul_o(mul_o)
      );


      integer i;

      initial begin
      a_i = 8'b1111_1111;
      b_i = 8'b1111_1111;

      #10;
      a_i = 8'b1000_1111;
      b_i = 8'b0000_0010;

      #10;
      a_i = 8'b0000_0010;
      b_i = 8'b0000_0001;

      #10;
      a_i = 8'b1000_0010;
      b_i = 8'b0000_0001;

      #10;
      a_i = 8'b1000_0010;
      b_i = 8'b1000_0001;

      #10;
      b_i[3:1] = 3'b001;
      a_i = 8'b1111_1111;
      b_i = 8'b1111_1111;

      #10;
      a_i = 8'b1111_1111;
      b_i = 8'b0000_0010;

      #10;
      a_i = 8'b1111_1111;
      b_i = 8'b1111_1111;

      #10;
      a_i = 8'b1111_1111;
      b_i = 8'b0000_0000;

      for (i = 0; i<10; i=i+1) begin
      a_i = a_i + 1;
      b_i = b_i + 1;
      #10;
      end
      end

      initial begin
      #1000 $stop;
      end

      endmodule
      • 仿真结果如下:

        image-20231031165217224


Reference

欢迎来到ssy的世界