Skip to content
  • Luke Nelson's avatar
    18a4d8c9
    bpf, riscv: Use compressed instructions in the rv64 JIT · 18a4d8c9
    Luke Nelson authored
    
    
    This patch uses the RVC support and encodings from bpf_jit.h to optimize
    the rv64 jit.
    
    The optimizations work by replacing emit(rv_X(...)) with a call to a
    helper function emit_X, which will emit a compressed version of the
    instruction when possible, and when RVC is enabled.
    
    The JIT continues to pass all tests in lib/test_bpf.c, and introduces
    no new failures to test_verifier; both with and without RVC being enabled.
    
    Most changes are straightforward replacements of emit(rv_X(...), ctx)
    with emit_X(..., ctx), with the following exceptions bearing mention;
    
    * Change emit_imm to sign-extend the value in "lower", since the
    checks for RVC (and the instructions themselves) treat the value as
    signed. Otherwise, small negative immediates will not be recognized as
    encodable using an RVC instruction. For example, without this change,
    emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a
    6b int even though a "c.li rd, -1" instruction suffices.
    
    * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit
    cases since the values are zero-extended into the upper 32 bits in
    the following instructions anyways, and the addition commutes with
    zero-extension. (BPF_SUB BPF_X must still use subw since subtraction
    does not commute with zero-extension.)
    
    This patch avoids optimizing branches and jumps to use RVC instructions
    since surrounding code often makes assumptions about the sizes of
    emitted instructions. Optimizing these will require changing these
    functions (e.g., emit_branch) to dynamically compute jump offsets.
    
    The following are examples of the JITed code for the verifier selftest
    "direct packet read test#3 for CGROUP_SKB OK", without and with RVC
    enabled, respectively. The former uses 178 bytes, and the latter uses 112,
    for a ~37% reduction in code size for this example.
    
    Without RVC:
    
       0: 02000813    addi  a6,zero,32
       4: fd010113    addi  sp,sp,-48
       8: 02813423    sd    s0,40(sp)
       c: 02913023    sd    s1,32(sp)
      10: 01213c23    sd    s2,24(sp)
      14: 01313823    sd    s3,16(sp)
      18: 01413423    sd    s4,8(sp)
      1c: 03010413    addi  s0,sp,48
      20: 03056683    lwu   a3,48(a0)
      24: 02069693    slli  a3,a3,0x20
      28: 0206d693    srli  a3,a3,0x20
      2c: 03456703    lwu   a4,52(a0)
      30: 02071713    slli  a4,a4,0x20
      34: 02075713    srli  a4,a4,0x20
      38: 03856483    lwu   s1,56(a0)
      3c: 02049493    slli  s1,s1,0x20
      40: 0204d493    srli  s1,s1,0x20
      44: 03c56903    lwu   s2,60(a0)
      48: 02091913    slli  s2,s2,0x20
      4c: 02095913    srli  s2,s2,0x20
      50: 04056983    lwu   s3,64(a0)
      54: 02099993    slli  s3,s3,0x20
      58: 0209d993    srli  s3,s3,0x20
      5c: 09056a03    lwu   s4,144(a0)
      60: 020a1a13    slli  s4,s4,0x20
      64: 020a5a13    srli  s4,s4,0x20
      68: 00900313    addi  t1,zero,9
      6c: 006a7463    bgeu  s4,t1,0x74
      70: 00000a13    addi  s4,zero,0
      74: 02d52823    sw    a3,48(a0)
      78: 02e52a23    sw    a4,52(a0)
      7c: 02952c23    sw    s1,56(a0)
      80: 03252e23    sw    s2,60(a0)
      84: 05352023    sw    s3,64(a0)
      88: 00000793    addi  a5,zero,0
      8c: 02813403    ld    s0,40(sp)
      90: 02013483    ld    s1,32(sp)
      94: 01813903    ld    s2,24(sp)
      98: 01013983    ld    s3,16(sp)
      9c: 00813a03    ld    s4,8(sp)
      a0: 03010113    addi  sp,sp,48
      a4: 00078513    addi  a0,a5,0
      a8: 00008067    jalr  zero,0(ra)
    
    With RVC:
    
       0:   02000813    addi    a6,zero,32
       4:   7179        c.addi16sp  sp,-48
       6:   f422        c.sdsp  s0,40(sp)
       8:   f026        c.sdsp  s1,32(sp)
       a:   ec4a        c.sdsp  s2,24(sp)
       c:   e84e        c.sdsp  s3,16(sp)
       e:   e452        c.sdsp  s4,8(sp)
      10:   1800        c.addi4spn  s0,sp,48
      12:   03056683    lwu     a3,48(a0)
      16:   1682        c.slli  a3,0x20
      18:   9281        c.srli  a3,0x20
      1a:   03456703    lwu     a4,52(a0)
      1e:   1702        c.slli  a4,0x20
      20:   9301        c.srli  a4,0x20
      22:   03856483    lwu     s1,56(a0)
      26:   1482        c.slli  s1,0x20
      28:   9081        c.srli  s1,0x20
      2a:   03c56903    lwu     s2,60(a0)
      2e:   1902        c.slli  s2,0x20
      30:   02095913    srli    s2,s2,0x20
      34:   04056983    lwu     s3,64(a0)
      38:   1982        c.slli  s3,0x20
      3a:   0209d993    srli    s3,s3,0x20
      3e:   09056a03    lwu     s4,144(a0)
      42:   1a02        c.slli  s4,0x20
      44:   020a5a13    srli    s4,s4,0x20
      48:   4325        c.li    t1,9
      4a:   006a7363    bgeu    s4,t1,0x50
      4e:   4a01        c.li    s4,0
      50:   d914        c.sw    a3,48(a0)
      52:   d958        c.sw    a4,52(a0)
      54:   dd04        c.sw    s1,56(a0)
      56:   03252e23    sw      s2,60(a0)
      5a:   05352023    sw      s3,64(a0)
      5e:   4781        c.li    a5,0
      60:   7422        c.ldsp  s0,40(sp)
      62:   7482        c.ldsp  s1,32(sp)
      64:   6962        c.ldsp  s2,24(sp)
      66:   69c2        c.ldsp  s3,16(sp)
      68:   6a22        c.ldsp  s4,8(sp)
      6a:   6145        c.addi16sp  sp,48
      6c:   853e        c.mv    a0,a5
      6e:   8082        c.jr    ra
    
    Signed-off-by: default avatarLuke Nelson <luke.r.nels@gmail.com>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Cc: Björn Töpel <bjorn.topel@gmail.com>
    Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
    18a4d8c9
    bpf, riscv: Use compressed instructions in the rv64 JIT
    Luke Nelson authored
    
    
    This patch uses the RVC support and encodings from bpf_jit.h to optimize
    the rv64 jit.
    
    The optimizations work by replacing emit(rv_X(...)) with a call to a
    helper function emit_X, which will emit a compressed version of the
    instruction when possible, and when RVC is enabled.
    
    The JIT continues to pass all tests in lib/test_bpf.c, and introduces
    no new failures to test_verifier; both with and without RVC being enabled.
    
    Most changes are straightforward replacements of emit(rv_X(...), ctx)
    with emit_X(..., ctx), with the following exceptions bearing mention;
    
    * Change emit_imm to sign-extend the value in "lower", since the
    checks for RVC (and the instructions themselves) treat the value as
    signed. Otherwise, small negative immediates will not be recognized as
    encodable using an RVC instruction. For example, without this change,
    emit_imm(rd, -1, ctx) would cause lower to become 4095, which is not a
    6b int even though a "c.li rd, -1" instruction suffices.
    
    * For {BPF_MOV,BPF_ADD} BPF_X, drop using addiw,addw in the 32-bit
    cases since the values are zero-extended into the upper 32 bits in
    the following instructions anyways, and the addition commutes with
    zero-extension. (BPF_SUB BPF_X must still use subw since subtraction
    does not commute with zero-extension.)
    
    This patch avoids optimizing branches and jumps to use RVC instructions
    since surrounding code often makes assumptions about the sizes of
    emitted instructions. Optimizing these will require changing these
    functions (e.g., emit_branch) to dynamically compute jump offsets.
    
    The following are examples of the JITed code for the verifier selftest
    "direct packet read test#3 for CGROUP_SKB OK", without and with RVC
    enabled, respectively. The former uses 178 bytes, and the latter uses 112,
    for a ~37% reduction in code size for this example.
    
    Without RVC:
    
       0: 02000813    addi  a6,zero,32
       4: fd010113    addi  sp,sp,-48
       8: 02813423    sd    s0,40(sp)
       c: 02913023    sd    s1,32(sp)
      10: 01213c23    sd    s2,24(sp)
      14: 01313823    sd    s3,16(sp)
      18: 01413423    sd    s4,8(sp)
      1c: 03010413    addi  s0,sp,48
      20: 03056683    lwu   a3,48(a0)
      24: 02069693    slli  a3,a3,0x20
      28: 0206d693    srli  a3,a3,0x20
      2c: 03456703    lwu   a4,52(a0)
      30: 02071713    slli  a4,a4,0x20
      34: 02075713    srli  a4,a4,0x20
      38: 03856483    lwu   s1,56(a0)
      3c: 02049493    slli  s1,s1,0x20
      40: 0204d493    srli  s1,s1,0x20
      44: 03c56903    lwu   s2,60(a0)
      48: 02091913    slli  s2,s2,0x20
      4c: 02095913    srli  s2,s2,0x20
      50: 04056983    lwu   s3,64(a0)
      54: 02099993    slli  s3,s3,0x20
      58: 0209d993    srli  s3,s3,0x20
      5c: 09056a03    lwu   s4,144(a0)
      60: 020a1a13    slli  s4,s4,0x20
      64: 020a5a13    srli  s4,s4,0x20
      68: 00900313    addi  t1,zero,9
      6c: 006a7463    bgeu  s4,t1,0x74
      70: 00000a13    addi  s4,zero,0
      74: 02d52823    sw    a3,48(a0)
      78: 02e52a23    sw    a4,52(a0)
      7c: 02952c23    sw    s1,56(a0)
      80: 03252e23    sw    s2,60(a0)
      84: 05352023    sw    s3,64(a0)
      88: 00000793    addi  a5,zero,0
      8c: 02813403    ld    s0,40(sp)
      90: 02013483    ld    s1,32(sp)
      94: 01813903    ld    s2,24(sp)
      98: 01013983    ld    s3,16(sp)
      9c: 00813a03    ld    s4,8(sp)
      a0: 03010113    addi  sp,sp,48
      a4: 00078513    addi  a0,a5,0
      a8: 00008067    jalr  zero,0(ra)
    
    With RVC:
    
       0:   02000813    addi    a6,zero,32
       4:   7179        c.addi16sp  sp,-48
       6:   f422        c.sdsp  s0,40(sp)
       8:   f026        c.sdsp  s1,32(sp)
       a:   ec4a        c.sdsp  s2,24(sp)
       c:   e84e        c.sdsp  s3,16(sp)
       e:   e452        c.sdsp  s4,8(sp)
      10:   1800        c.addi4spn  s0,sp,48
      12:   03056683    lwu     a3,48(a0)
      16:   1682        c.slli  a3,0x20
      18:   9281        c.srli  a3,0x20
      1a:   03456703    lwu     a4,52(a0)
      1e:   1702        c.slli  a4,0x20
      20:   9301        c.srli  a4,0x20
      22:   03856483    lwu     s1,56(a0)
      26:   1482        c.slli  s1,0x20
      28:   9081        c.srli  s1,0x20
      2a:   03c56903    lwu     s2,60(a0)
      2e:   1902        c.slli  s2,0x20
      30:   02095913    srli    s2,s2,0x20
      34:   04056983    lwu     s3,64(a0)
      38:   1982        c.slli  s3,0x20
      3a:   0209d993    srli    s3,s3,0x20
      3e:   09056a03    lwu     s4,144(a0)
      42:   1a02        c.slli  s4,0x20
      44:   020a5a13    srli    s4,s4,0x20
      48:   4325        c.li    t1,9
      4a:   006a7363    bgeu    s4,t1,0x50
      4e:   4a01        c.li    s4,0
      50:   d914        c.sw    a3,48(a0)
      52:   d958        c.sw    a4,52(a0)
      54:   dd04        c.sw    s1,56(a0)
      56:   03252e23    sw      s2,60(a0)
      5a:   05352023    sw      s3,64(a0)
      5e:   4781        c.li    a5,0
      60:   7422        c.ldsp  s0,40(sp)
      62:   7482        c.ldsp  s1,32(sp)
      64:   6962        c.ldsp  s2,24(sp)
      66:   69c2        c.ldsp  s3,16(sp)
      68:   6a22        c.ldsp  s4,8(sp)
      6a:   6145        c.addi16sp  sp,48
      6c:   853e        c.mv    a0,a5
      6e:   8082        c.jr    ra
    
    Signed-off-by: default avatarLuke Nelson <luke.r.nels@gmail.com>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Cc: Björn Töpel <bjorn.topel@gmail.com>
    Link: https://lore.kernel.org/bpf/20200721025241.8077-4-luke.r.nels@gmail.com
Loading