haikuwebkit/Websites/webkit.org/docs/b3/intermediate-representation...

<html>
<head>
  <title>B3 Intermediate Representation</title>
  <link rel="stylesheet" type="text/css" href="style.css">
</head>
<body>
  <div id="banner">
    <a href="http://www.webkit.org/" class="banner-link">
      <div id="logo" class="site-logo">
        WebKit
        <span class="tagline">Open Source Web Browser Engine</span>
      </div>
    </a>
  </div>
  <div id="contents">
    <h1><a href="index.html">Bare Bones Backend</a> / B3 Intermediate Representation</h1>
    <p>B3 IR is a C-like SSA representation of a procedure.  A procedure has a root block at
      which it starts execution when it is invoked.  A procedure does not have to terminate, but
      if it does, then it can be either due to a Return, which gracefully returns some value, or
      by a side-exit at designated instructions.  B3 gives the client a lot of flexibility to
      implement many different kinds of side-exits.</p>

    <p>B3 is designed to represent procedures for the purpose of transforming them.  Knowing
      what transformations are legal requires knowing what a procedure does.  A transformation
      is valid if it does not change the observable behavior of a procedure.  This document
      tells you what B3 procedures do by telling you what each construct in B3 IR does.</p>

    <h2>Procedure</h2>

    <p>The parent object of all things in B3 is the Procedure.  Every time you want to compile
      something, you start by creating a Procedure.  The lifecycle of a Procedure is
      usually:</p>

    <ol>
      <li>Create the Procedure.</li>
      <li>Populate the Procedure with code.</li>
      <li>Use either the <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/B3Compilation.h">high-level
          Compilation API</a> or the
        <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/B3Generate.h">low-level
          generation API</a>.</li>
    </ol>

    <p>The act of compiling the Procedure changes it in-place, making it unsuitable for
      compiling again.  Always create a new Procedure every time you want to compile
      something.</p>

    <h2>Types</h2>

    <p>B3 has a trivial type system with only five types:</p>

    <dl>
      <dt>Void</dt>
      <dd>Used to say that an instruction does not return a value.</dd>

      <dt>Int32</dt>
      <dd>32-bit integer.  Integers don't have sign, but operations on them do.</dd>

      <dt>Int64</dt>
      <dd>64-bit integer.</dd>

      <dt>Float</dt>
      <dd>32-bit binary floating point number.</dd>

      <dt>Double</dt>
      <dd>64-bit binary floating point number.</dd>
    </dl>

    <p>B3 does not have a pointer type. Instead, the <code>B3::pointerType()</code> function will
      return either Int32 or Int64 depending on which kind of integer can be used to represent a
      pointer on the current platform. It's not a goal of B3 to support hardware targets that require
      pointers and integers to be segregated. It's not a goal of B3 to support GC (garbage
      collection) roots as a separate type, since JSC uses
      <a href="http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-88-2.pdf">Bartlett-style conservative
        root scanning</a>. This doesn't preclude any mainstream garbage collection algorithms,
      including copying, generational, or concurrent collectors, and frees up the compiler to perform
      more optimizations.</p>

    <h2>Values</h2>

    <p>Variables, and the instructions that define them, are represented using the Value object.
      The Value object has a return type, a kind, and zero or more children.  Children are
      references to other Values.  Those values are used as input to the instruction that
      computes this value.</p>

    <p>The value kind is a combination of an opcode and optional flags. The flags allow a single
      opcode to have many variants. For example, Div and Mod may have the Chill flag set to indicate
      that they should not trap on corner cases. Alternatively, Load/Store opcodes may have the
      Traps flag set to indicate that they should trap deterministically.</p>

    <p>Values also have a unique 32-bit index that is used as the name.</p>

    <p>Example:</p>

    <pre><code>Int32 @3 = Add(@1, @2)</code></pre>

    <p>This represents a single Value instance.  Its index is 3.  It is an Int32.  The opcode is
      Add, and its children are @1 and @2.</p>

    <p>Values may also have additional meta-data.  We use special subclasses of the B3::Value
      class for values that need meta-data.  For example, the Load value needs a 32-bit offset
      for the load.  We use the MemoryValue class for memory-accessing values, which all have
      such an offset.</p>

    <h2>Stack Slots</h2>

    <p>B3 exposes the concept of stack-allocated data and gives the client a lot of control.
      By default, stack slots get allocated wherever B3 chooses. It will try to pack them as
      much as possible. After compilation is done, you can retrieve each stack slot's location
      in the form of an offset from the frame pointer.</p>

    <p>You can force stack slots to end up at a particular offset from the frame pointer, though
      this is very dangerous.  For example, B3 assumes that it can use the slots closest to the
      frame pointer for callee-saves, and currently when you force something to a particular
      frame pointer offset, there is no mechanism to notice that this space is also being used
      for callee-saves.  Therefore, we recommend not using the frame pointer offset forcing
      feature unless you know a lot about the ABI and you have no other choice.</p>

    <h2>Variables</h2>

    <p>Sometimes, SSA is inconvenient. For example, it's hard to do path specialization over SSA.
      B3 has the concept of Variables as a fall-back. The backend knows how to handle them and
      will coalesce and copy-propagate them. Inside the B3 optimizer, there is a classic SSA
      builder that eliminates variables and builds SSA in their place.</p>

    <p>You can create Variables by using Procedure::addVariable(), and then you can access them
      using the Get and Set opcodes.</p>

    <p>The fixSSA() phase will convert variables to SSA. If you use a lot of variables in your
      input to B3, it's a good idea to run fixSSA() manually before running the compiler. The
      default optimizer only runs fixSSA() towards the middle of optimizations. Passing non-SSA code
      as input to the optimizer may render the early phases ineffective. Fortunately, B3 phases
      are super easy to run. The following runs SSA fix-up on a Procedure named "proc":</p>

    <pre><code>fixSSA(proc);</code></pre>

    <h2>Control flow</h2>

    <p>B3 represents control flow using basic blocks.  Each basic block may have zero or more
      predecessors.  Each basic block may have zero or more successors.  The meaning of the
      successors is determined by the basic block's last Value, which must be a terminal. A
      value is terminal if:</p>

    <pre><code>value-&gt;effects().terminal</code></pre>

    <p>Some opcodes are definitely terminal, like Jump, Branch, Oops, Return, and Switch. But a
      value with the Patchpoint opcode may or may not be terminal. In general it's necessary to
      check the <code>terminal</code> bit as shown above.</p>

    <p>Each basic block contains a Vector&lt;Value*&gt; as the contents of the block. Control
      flow inside the block is implicit based on the order of Values in the vector.</p>

    <h2>Memory model</h2>

    <p>B3 uses a secondary address space of <i>abstract heaps</i> to reason about aliasing and
      concurrency. This address space is 32-bit and we point at it using the HeapRange type. If you
      never supply HeapRanges to your memory operations (the default), they will have [0, UINT_MAX]
      as their range, indicating that the operation may conservatively write to all 2<sup>32</sup>
      abstract heaps. Two memory operations are said to alias if they access the same abstract heaps.
      A simple example of using HeapRanges is with loads and stores, but you can also supply
      HeapRanges as bounds on the effects of calls and patchpoints.</p>

    <p>Fences are modeled as phantom effects that we call <i>fence ranges</i>. For example, a load
      fence could be represented as a patchpoint that writes to [0, UINT_MAX]. B3 models load fences,
      store fences, store-load fences, acquire fences, and release fences using fence ranges. The
      absence of a fence range means that there is no fence.</p>

    <p>Acquire fences are modeled as phantom writes to memory, while release fences are modeled as
      phantom reads from memory. By itself this does not permit B3 to do all of the reorderings that
      acquire/release allow. B3 is therefore allowed to treat loads/stores with fence ranges more
      precisely. The phantom write effects of a fenced load cannot write to anything that is read or
      written after the fence. The phantom read effects of a fenced store cannot read anything that
      is written after the fence.</p>

    <p>Applying a fence range to a load or store only makes that access a half-fence: load-with-fence
      is an acquire fence that only prevents later operations from being hoisted and store-with-fence
      is a release fence that only prevents earlier operations from being sunk. The canonical way to
      express a fully fenced load (which is fenced in <i>both</i> directions) is to use
      AtomicXchgAdd(0, ptr) with a non-empty fence range. The canonical way to express a fully fenced
      store is to use AtomicXchg(value, ptr) with a non-empty fence range.</p>

    <h2>Opcodes</h2>

    <p>This section describes opcodes in the following format:</p>

    <dl>
      <dt>Int32 Foo(Int64, Double)</dt>
      <dd>This describes an opcode named Foo that uses Int32 as its return type and takes two
        children - one of type Int64 and another of type Double.</dd>
    </dl>

    <p>We sometimes use the wildcard type T to represent polymorphic operations, like "T Add(T,
      T)".  This means that the value must take two children of the same type and returns a
      value of that type.  We use the type IntPtr to mean either Int32, or Int64, depending on
      the platform.</p>

    <p>Some opcodes can have some flags set. A description of flags follows the description of
      opcodes.</p>

    <h3>Opcode descriptions</h3>

    <dl>
      <dt>Void Nop()</dt>
      <dd>The empty value.  Instead of removing Values from basic blocks, most optimizations
        convert them to Nops.  Various phases run fix-up where all Nops are removed in one pass.
        It's common to see Nops in intermediate versions of B3 IR during optimizations.  Nops
        never lead to any code being generated and they do not impede optimizations, so they are
        usually harmless.  You can convert a Value to a Nop by doing convertToNop().</dd>

      <dt>T Identity(T)</dt>
      <dd>Returns the passed value.  May be used for any type except Void.  Instead of replacing
        all uses of a Value with a different Value, most optimizations convert them to Identity.
        Various phases run fix-up where all uses of Identity are replaced with the Identity's
        child (transitively, so Identity(Identity(Identity(@x))) is changed to just @x).  Even
        the instruction selector "sees through" Identity.  You can remove all references to
        Identity in any value by calling Value::performSubstitution().  You can convert a Value
        to an Identity by doing convertToIdentity(otherValue).  If the value is Void,
        convertToIdentity() converts it to a Nop instead.</dd>

      <dt>T Opaque(T)</dt>
      <dd>Returns the passed value.  May be used for any type except Void.  This opcode is provided as a hack
        for avoiding B3 optimizations that the client finds unprofitable.  B3 treats Opaque as an identity
        only during instruction selection.  All prior optimizations (including all B3 IR optimizations) do
        not know what Opaque does, other than Opaque(x) == Opaque(x) and Opaque(Opaque(x)) == Opaque(x).</dd>

      <dt>Int32 Const32(constant)</dt>
      <dd>32-bit integer constant.  Must use the Const32Value class, which has space for the
        int32_t constant.</dd>

      <dt>Int64 Const64(constant)</dt>
      <dd>64-bit integer constant.  Must use the Const64Value class, which has space for the
        int64_t constant.</dd>

      <dt>Float ConstFloat(constant)</dt>
      <dd>Float constant.  Must use the ConstFloatValue class, which has space for the float constant.</dd>

      <dt>Double ConstDouble(constant)</dt>
      <dd>Double constant.  Must use the ConstDoubleValue class, which has space for the double constant.</dd>

      <dt>Void Set(value, variable)</dt>
      <dd>Assigns the given value to the given Variable. Must use the VariableValue class.</dd>

      <dt>T Get(variable)</dt>
      <dd>Returns the current value of the given Variable. Its return type depends on the
        variable. Must use the VariableValue class.</dd>

      <dt>IntPtr SlotBase(stackSlot)</dt>
      <dd>Returns a pointer to the base of the given stack slot.  Must use the SlotBaseValue
        class.</dd>

      <dt>IntPtr|Double ArgumentReg(%register)</dt>
      <dd>Returns the value that the given register had at the prologue of the procedure.  It
        returns IntPtr for general-purpose registers and Double for FPRs.  Must use the
        ArgumentRegValue class.</dd>

      <dt>IntPtr FramePointer()</dt>
      <dd>Returns the value of the frame pointer register.  B3 procedures alway use a frame
        pointer ABI, and the frame pointer is always guaranteed to have this value anywhere
        inside the procedure.</dd>

      <dt>T Add(T, T)</dt>
      <dd>Works with any type except Void.  For integer types, this represents addition with
        wrap-around semantics.  For floating point types, this represents addition according to
        the IEEE 854 spec.  B3 does not have any notion of "fast math".  A transformation over
        floating point code is valid if the new code produces exactly the same value, bit for
        bit.</dd>

      <dt>T Sub(T, T)</dt>
      <dd>Works with any type except Void.  For integer types, this represents subtraction with
        wrap-around semantics.  For floating point types, this represents subtraction according
        to the IEEE 854 spec.</dd>

      <dt>T Mul(T, T)</dt>
      <dd>Works with any type except Void.  For integer types, this represents multiplication
        with wrap-around semantics.  For floating point types, this represents multiplication
        according to the IEEE 854 spec.</dd>

      <dt>T Div(T, T)</dt>
      <dd>Works with any type except Void.  For integer types, this represents signed
        division with round-to-zero.  By default, its behavior is undefined for x/0 or
        -2<sup>31</sup>/-1.  For floating point types, this represents division according
        to the IEEE 854 spec. Integer Div may have the Chill flag set.</dd>

      <dt>T Mod(T, T)</dt>
      <dd>Works with any type except Void.  For integer types, this represents signed
        modulo. By default, its behavior is undefined for x%0 or -2<sup>31</sup>%-1.  For
        floating point types, this represents modulo according to "fmod()". Integer Mod may have the
        Chill flag set.</dd>

      <dt>T Neg(T)</dt>
      <dd>Works with any type except Void.  For integer types, this represents twos-complement
        negation.  For floating point types, this represents negation according to the IEEE
        spec.</dd>

      <dt>T BitAnd(T, T)</dt>
      <dd>Bitwise and.  Valid for any type except Void.</dd>

      <dt>T BitOr(T, T)</dt>
      <dd>Bitwise or.  Valid for any type except Void.</dd>

      <dt>T BitXor(T, T)</dt>
      <dd>Bitwise xor.  Valid for any type except Void.</dd>

      <dt>T Shl(T, Int32)</dt>
      <dd>Shift left.  Valid for Int32 and Int64.  The shift amount is always Int32.  Only the
        low 31 bits of the shift amount are used for Int32.  Only the low 63 bits of the shift
        amount are used for Int64.</dd>

      <dt>T SShr(T, Int32)</dt>
      <dd>Shift right with sign extension.  Valid for Int32 and Int64.  The shift amount is
        always Int32.  Only the low 31 bits of the shift amount are used for Int32.  Only the
        low 63 bits of the shift amount are used for Int64.</dd>

      <dt>T ZShr(T, Int32)</dt>
      <dd>Shift right with zero extension.  Valid for Int32 and Int64.  The shift amount is
        always Int32.  Only the low 31 bits of the shift amount are used for Int32.  Only the
        low 63 bits of the shift amount are used for Int64.</dd>

      <dt>T RotR(T, Int32)</dt>
      <dd>Rotate right. Valid for Int32 and Int64. The shift amount is always Int32. Only the
        low 31 bits of the shift amount are used for Int32.  Only the low 63 bits of the shift
        amount are used for Int64.</dd>

      <dt>T RotL(T, Int32)</dt>
      <dd>Rotate left. Valid for Int32 and Int64. The shift amount is always Int32. Only the
        low 31 bits of the shift amount are used for Int32.  Only the low 63 bits of the shift
        amount are used for Int64.</dd>

      <dt>T Clz(T)</dt>
      <dd>Count leading zeroes.  Valid for Int32 and Int64.</dd>

      <dt>T Abs(T)</dt>
      <dd>Absolute value.  Valid for Float and Double.</dd>

      <dt>T Ceil(T)</dt>
      <dd>Ceiling.  Valid for Float and Double.</dd>

      <dt>T Floor(T)</dt>
      <dd>Flooring.  Valid for Float and Double.</dd>

      <dt>T Sqrt(T)</dt>
      <dd>Square root.  Valid for Float and Double.</dd>

      <dt>U BitwiseCast(T)</dt>
      <dd>If T is Int32 or Int64, it returns the bitwise-corresponding Float or Double,
        respectively.  If T is Float or Double, it returns the bitwise-corresponding Int32 or
        Int64, respectively.</dd>

      <dt>Int32 SExt8(Int32)</dt>
      <dd>Fills the top 24 bits of the integer with the low byte's sign extension.</dd>

      <dt>Int32 SExt16(Int32)</dt>
      <dd>Fills the top 16 bits of the integer with the low short's sign extension.</dd>

      <dt>Int64 SExt32(Int32)</dt>
      <dd>Returns a 64-bit integer such that the low 32 bits are the given Int32 value and the
        high 32 bits are its sign extension.</dd>

      <dt>Int64 ZExt32(Int32)</dt>
      <dd>Returns a 64-bit integer such that the low 32 bits are the given Int32 value and the
        high 32 bits are zero.</dd>

      <dt>U Trunc(T)</dt>
      <dd>Returns the low 32 bits of the 64-bit value. If T is Int64 then U is Int32.
        If T is Double then U is Float.</dd>

      <dt>Double IToD(T)</dt>
      <dd>Converts the given integer into a double.  Value for Int32 or Int64 inputs.</dd>

      <dt>Double FloatToDouble(Float)</dt>
      <dd>Converts the given float into a double.</dd>

      <dt>Float DoubleToFloat(Double)</dt>
      <dd>Converts the given double into a float.</dd>

      <dt>Int32 Equal(T, T)</dt>
      <dd>Compares the two values.  If they are equal, return 1; else return 0.  Valid for all
        types except Void.  Integer comparisons simply compare all bits.  Floating point
        comparisons mostly compare bits, but have some corner cases: positive zero and negative
        zero are considered equal, and they return false when either value is NaN.</dd>

      <dt>Int32 NotEqual(T, T)</dt>
      <dd>The opposite of Equal().  NotEqual(@x, @y) yields the same result as BitXor(Equal(@x,
        @y), 1).</dd>

      <dt>Int32 LessThan(T, T)</dt>
      <dd>Returns 1 if the left value is less than the right one, 0 otherwise.  Does a signed
        comparison for integers.  For floating point comparisons, this has the usual caveats
        with respect to negative zero and NaN.</dd>

      <dt>Int32 GreaterThan(T, T)</dt>
      <dd>Returns 1 if the left value is greater than the right one, 0 otherwise.  Does a signed
        comparison for integers.  For floating point comparisons, this has the usual caveats
        with respect to negative zero and NaN.</dd>

      <dt>Int32 LessEqual(T, T)</dt>
      <dd>Returns 1 if the left value is less than or equal to the right one, 0 otherwise.  Does
        a signed comparison for integers.  For floating point comparisons, this has the usual
        caveats with respect to negative zero and NaN.</dd>

      <dt>Int32 GreaterEqual(T, T)</dt>
      <dd>Returns 1 if the left value is greater than or equal to the right one, 0 otherwise.
        Does a signed comparison for integers.  For floating point comparisons, this has the
        usual caveats with respect to negative zero and NaN.</dd>

      <dt>Int32 Above(T, T)</dt>
      <dd>Unsigned integer comparison, valid for Int32 and Int64 only.  Returns 1 if the left
        value is unsigned-greater-than the right one, 0 otherwise.</dd>

      <dt>Int32 Below(T, T)</dt>
      <dd>Unsigned integer comparison, valid for Int32 and Int64 only.  Returns 1 if the left
        value is unsigned-less-than the right one, 0 otherwise.</dd>

      <dt>Int32 AboveEqual(T, T)</dt>
      <dd>Unsigned integer comparison, valid for Int32 and Int64 only.  Returns 1 if the left
        value is unsigned-greater-than-or-equal the right one, 0 otherwise.</dd>

      <dt>Int32 BelowEqual(T, T)</dt>
      <dd>Unsigned integer comparison, valid for Int32 and Int64 only.  Returns 1 if the left
        value is unsigned-less-than-or-equal the right one, 0 otherwise.</dd>

      <dt>Int32 EqualOrUnordered(T, T)</dt>
      <dd>Floating point comparison, valid for Float and Double only.  Returns 1 if the left
        value is equal to the right one or if either value is NaN.  Returns 0 otherwise.</dd>

      <dt>T Select(U, T, T)</dt>
      <dd>Returns either the second child or the third child.  T can be any type except Void.  U
        can be either Int32 or Int64.  If the first child is non-zero, returns the second child.
        Otherwise returns the third child.</dd>

      <dt>Int32 Load8Z(IntPtr, offset)</dt>
      <dd>Loads a byte from the address, which is computed by adding the compile-time 32-bit
        signed integer offset to the child value.  Zero extends the loaded byte, so the high 24
        bits are all zero.  Must use the MemoryValue class.  May have the Traps flag set. May set
        a fence range to turn this into a load acquire.</dd>

      <dt>Int32 Load8S(IntPtr, offset)</dt>
      <dd>Loads a byte from the address, which is computed by adding the compile-time 32-bit
        signed integer offset to the child value.  Sign extends the loaded byte.  Must use the
        MemoryValue class.  May have the Traps flag set. May set
        a fence range to turn this into a load acquire.</dd>

      <dt>Int32 Load16Z(IntPtr, offset)</dt>
      <dd>Loads a 16-bit integer from the address, which is computed by adding the compile-time
        32-bit signed integer offset to the child value.  Zero extends the loaded 16-bit
        integer, so the high 16 bits are all zero.  Misaligned loads are not penalized.  Must
        use the MemoryValue class.  May have the Traps flag set. May set
        a fence range to turn this into a load acquire.</dd>

      <dt>Int32 Load16S(IntPtr, offset)</dt>
      <dd>Loads a 16-bit integer from the address, which is computed by adding the compile-time
        32-bit signed integer offset to the child value.  Sign extends the loaded 16-bit
        integer.  Misaligned loads are not penalized.  Must use the MemoryValue class.  May have the
        Traps flag set. May set a fence range to turn this into a load acquire.</dd>

      <dt>T Load(IntPtr, offset)</dt>
      <dd>Valid for any type except Void.  Loads a value of that type from the address, which is
        computed by adding the compile-time 32-bit signed integer offset to the child value.
        Misaligned loads are not penalized.  Must use the MemoryValue class.  May have the Traps
        flag set. May set a fence range to turn this into a load acquire.</dd>

      <dt>Void Store8(Int32, IntPtr, offset)</dt>
      <dd>Stores a the low byte of the first child into the address computed by adding the
        compile-time 32-bit signed integer offset to the second child.  Must use the MemoryValue
        class.  May have the Traps flag set. May set a fence range to turn this into a store release.</dd>

      <dt>Void Store16(Int32, IntPtr, offset)</dt>
      <dd>Stores a the low 16 bits of the first child into the address computed by adding the
        compile-time 32-bit signed integer offset to the second child.  Misaligned stores are
        not penalized.  Must use the MemoryValue class.  May have the Traps flag set. May set a
        fence range to turn this into a store release.</dd>

      <dt>Void Store(T, IntPtr, offset)</dt>
      <dd>Stores the value in the first child into the address computed by adding the
        compile-time 32-bit signed integer offset to the second child.  Misaligned stores are
        not penalized.  Must use the MemoryValue class.  May have the Traps flag set. May set a
        fence range to turn this into a store release.</dd>

      <dt>Int32 AtomicWeakCAS(T expectedValue, T newValue, IntPtr ptr)</dt>
      <dd>Performs a weak <a href="https://en.wikipedia.org/wiki/Compare-and-swap">CAS
          (compare-and-swap)</a>. Returns a boolean (0 or 1) to indicate the result (failure or
        success, respectively). May spuriously fail, in which case it will do nothing and return
        0. You can supply a fence range to indicate that the CAS is fenced. Fenced CAS has both
        acquire and release fences, and claims to both read and write the full fence range in
        addition to whatever primary range is supplied for the CAS itself. AtomicWeakCAS only
        has fencing behavior in the case that it succeeds. Atomic operations take a width parameter,
        allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory locations.</dd>

      <dt>T AtomicStrongCAS(T expectedValue, T newValue, IntPtr ptr)</dt>
      <dd>Performs a strong <a href="https://en.wikipedia.org/wiki/Compare-and-swap">CAS
          (compare-and-swap)</a>. Returns the old value before the CAS. The instruction selector
        is smart about Equal(AtomicStrongCAS(expected, ...), expected) patterns, so this opcode
        is also the best way to do a strong CAS that returns a boolean - just use Equal to compare
        the result to expected. You can supply a fence range to indicate that the CAS is fenced.
        Fenced CAS has both acquire and release fences, and claims to both read and write the full
        fence range in addition to whatever primary range is supplied for the CAS itself.
        AtomicStrongCAS has the same fencing on failure and success. Atomic operations take a width
        parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
        locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>

      <dt>T AtomicXchgAdd(T, IntPtr)</dt>
      <dd>Atomically adds a value to the memory location and returns the old value. Is allowed to
        have a fence range, which causes it to have acquire/release fencing. Atomic operations take
        a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
        locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>

      <dt>T AtomicXchgAnd(T, IntPtr)</dt>
      <dd>Atomically bitwise-ands a value to the memory location and returns the old value. Is allowed to
        have a fence range, which causes it to have acquire/release fencing. Atomic operations take
        a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
        locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>

      <dt>T AtomicXchgOr(T, IntPtr)</dt>
      <dd>Atomically bitwise-ors a value to the memory location and returns the old value. Is allowed to
        have a fence range, which causes it to have acquire/release fencing. Atomic operations take
        a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
        locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>

      <dt>T AtomicXchgSub(T, IntPtr)</dt>
      <dd>Atomically subtracts a value to the memory location and returns the old value. Is allowed to
        have a fence range, which causes it to have acquire/release fencing. Atomic operations take
        a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
        locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>

      <dt>T AtomicXchgXor(T, IntPtr)</dt>
      <dd>Atomically bitwise-xors a value to the memory location and returns the old value. Is allowed to
        have a fence range, which causes it to have acquire/release fencing. Atomic operations take
        a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
        locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>

      <dt>T AtomicXchg(T, IntPtr)</dt>
      <dd>Atomically stores a value to the memory location and returns the old value. Is allowed to
        have a fence range, which causes it to have acquire/release fencing. Atomic operations take
        a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
        locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>

      <dt>T Depend(T)</dt>
      <dd>Only valid for integer types. This is guaranteed to return zero by xoring the input with
        itself, and the B3 compiler guarantees that it will not leverage this knowledge for any
        optimization. On x86, this is lowered to a Fence with read=Bottom, write=Top (i.e. a load-load
        fence) and then it's constant-folded to zero. On ARM, this is preserved through the compiler
        pipeline all the way to the MacroAssembler, which turns this into a <code>eor</code>. Using
        Depend is the most efficient way to create load-load dependencies, regarldess of platform. It
        allows B3's CSE to work for surrounding loads and it even supports CSEing the dependency
        itself - so two identical load chains with no interleaving effects can be turned into one. On
        ARM, it also avoids emitting an expensive <code>dmb ish</code> load-load fence. Because of the
        benefits for CSE, it's recommended to use Depend for load-load dependencies even if you're only
        targeting x86.</dd>

      <dt>IntPtr WasmAddress(IntPtr, pinnedGPR)</dt>
      <dd>This is used to compute the address of a wasm memory load from a pinned base GPR.
        Must use the WasmAddressValue class.</dd>

      <dt>Void Fence()</dt>
      <dd>Abstracts standalone data fences on x86 and ARM. Must use the FenceValue class, which has
        two additional members that configure the precise meaning of the fence:
        <code>HeapRange FenceValue::read</code> and <code>HeapRange FenceValue::write</code>. If the
        <code>write</code> range is empty, this is taken to be a store-store fence, which leads to
        no code generation on x86 and the weaker <code>dmb ishst</code> fence on ARM. If the write
        range is non-empty, this produces <code>mfence</code> on x86 and <code>dmb ish</code> on
        ARM. Within B3 IR, the Fence also reports the read/write in its effects. This allows you to
        scope the fence for the purpose of B3's load elimination. For example, you may use a Fence
        to protect a store from being sunk below a specific load. In that case, you can claim to
        read just that store's range and write that load's range.</dd>

      <dt>T1 CCall(IntPtr, [T2, [T3, ...]])</dt>
      <dd>Performs a C function call to the function pointed to by the first child.  The types
        that the function takes and the type that it returns are determined by the types of the
        children and the type of the CCallValue.  Only the first child is mandatory.  Must use
        the CCallValue class.</dd>

      <dt>T1 Patchpoint([T2, [T3, ...]])</dt>
      <dd>
        <p>A Patchpoint is a customizable value.  Patchpoints take zero or more values of any
          type and return any type.  A Patchpoint's behavior is determined by the generator
          object.  The generator is a C++ lambda that gets called during code generation.  It gets
          passed an assembler instance (specifically, CCallHelpers&) and an object describing
          where to find all of the input values and where to put the result.  Here's an example:</p>

        <pre><code>PatchpointValue* patchpoint = block->appendNew&lt;PatchpointValue&gt;(proc, Int32, Origin());
patchpoint->append(ConstrainedValue(arg1, ValueRep::SomeRegister));
patchpoint->append(ConstrainedValue(arg2, ValueRep::SomeRegister));
patchpoint->setGenerator(
    [&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
        jit.add32(params[1].gpr(), params[2].gpr(), params[0].gpr());
    });</code></pre>

        <p>This creates a patchpoint that just adds two numbers. The patchpoint is set to return
          Int32.  The two child values, arg1 and arg2, are passed to the patchpoint with the
          SomeRegister constraint, which just requests that they get put in appropriate
          registers (GPR for integer values, FPR for floating point values).  The generator uses
          the params object to figure out which registers the inputs are in (params[1] and
          params[2]) and which register to put the result in (params[0]).  Many sophisticated
          constraints are possible.  You can request that a child gets placed in a specific
          register.  You can list additional registers that are
          clobbered - either at the top of the patchpoint (i.e. early) so that the clobbered
          registers interfere with the inputs, or at the bottom of the patchpoint (i.e. late) so
          that the clobbered registers interfere with the output.  Patchpoint constraints also
          allow you to force values onto the call argument area of the stack.  Patchpoints are
          powerful enough to implement custom calling conventions, inline caches, and
          side-exits.</p>

        <p>A patchpoint is allowed to "side exit", i.e. abruptly exit from the procedure.  If it
          wants to do so by returning, it can use Procedure's API for getting the callee-save
          register layout, unwinding the callee-saves, and then returning.  More likely, the
          patchpoint will take some exit state as part of its arguments, and it will manipulate
          the call frame in place to make it look like another execution engine's call frame.
          This is called OSR, and JavaScriptCore does it a lot.</p>

        <p>A patchpoint can be used as a terminal. Simply set the <code>terminal</code> bit:</p>

        <pre><code>patchpoint-&gt;effects.terminal = true;</code></pre>

        <p>The generator can determine where to branch by using the StackmapGenerationParams to
          get the set of labels for all successors. You're guaranteed that the number of
          successors of the patchpoint's basic block will be the same as when you created it.
          However, like any value, a patchpoint may be cloned. This means, for example, that if
          you use this to implement a table jump then the jump table must be created inside the
          generator, so that you get one jump table per clone.</p>

        <p>Must use the PatchpointValue class with the Patchpoint opcode.</p>
      </dd>

      <dt>T CheckAdd(T, T, [T2, [T3, ...]])</dt>
      <dd>Checked integer addition.  Works for T = Int32 or T = Int64.  First first two children
        are mandatory.  Additional children are optional.  All of the Check instructions take a
        generator and value constraints like a Patchpoint.  In the case of a CheckAdd, the
        generator runs on the path where the integer addition overflowed.  B3 assumes that
        CheckAdd will side-exit upon overflow, so the generator must do some kind of
        termination.  Usually, this is used to implement OSR exits on integer overflow and the
        optional arguments to CheckAdd will be the OSR exit state.  Must use the CheckValue
        class.</dd>

      <dt>T CheckSub(T, T, [T2, [T3, ...]])</dt>
      <dd>Checked integer subtraction.  Works for T = Int32 or T = Int64.  First first two
        children are mandatory.  Additional children are optional.  All of the Check
        instructions take a generator and value constraints like a Patchpoint.  In the case of a
        CheckSub, the generator runs on the path where the integer subtraction overflowed.  B3
        assumes that CheckSub will side-exit upon overflow, so the generator must do some kind
        of termination.  Usually, this is used to implement OSR exits on integer overflow and
        the optional arguments to CheckSub will be the OSR exit state.  You can use CheckSub for
        checked negation by using zero for the first child.  B3 will select the native negate
        instruction when you do this.  Must use the CheckValue class.</dd>

      <dt>T CheckMul(T, T, [T2, [T3, ...]])</dt>
      <dd>Checked integer multiplication.  Works for T = Int32 or T = Int64.  First first two
        children are mandatory.  Additional children are optional.  All of the Check
        instructions take a generator and value constraints like a Patchpoint.  In the case of a
        CheckMul, the generator runs on the path where the integer multiplication overflowed.
        B3 assumes that CheckMul will side-exit upon overflow, so the generator must do some
        kind of termination.  Usually, this is used to implement OSR exits on integer overflow
        and the optional arguments to CheckMul will be the OSR exit state.  Must use the
        CheckValue class.</dd>

      <dt>Void Check(T, [T2, [T3, ...]])</dt>
      <dd>Exit check.  Works for T = Int32 or T = Int64.  This branches on the first child.  If
        the first child is zero, this just falls through.  If it's non-zero, it goes to the exit
        path generated by the passed generator.  Only the first child is mandatory.  B3 assumes
        that Check will side-exit when the first child is non-zero, so the generator must do
        some kind of termination.  Usually, this is used to implement OSR exit checks and the
        optional arguments to Check will be the OSR exit state.  Check supports efficient
        compare/branch fusion, so you can Check for fairly sophisticated predicates.  For
        example, Check(Equal(LessThan(@a, @b), 0)) where @a and @b are doubles will be generated
        to an instruction that branches to the exit if @a &gt;= @b or if either @a or @b are
        NaN.  Must use the CheckValue class.</dd>

      <dt>Void WasmBoundsCheck(Int32, pinnedGPR, offset)</dt>
      <dd>Special Wasm opcode. This branches on the first child. If the first child plus the offset
        produces a Int64 less than to the pinnedGPR this falls through. Otherwise, it branches to
        the exit path generated by the passed generator. Unlike the Patch/Check family, the
        generator used by WasmBoundsCheck should be set on the Procedure itself. The GRPReg passed in
        pinnedGPR must also be marked as pinned by calling the Procedure's pinning API, or it must be
        InvalidGPR, in which case the out-of-bounds limit is 4GiB. In the later case a maximum parameter
        can be provided, to further constrain the out-of-bounds limit and help generate smaller
        immediates. B3 assumes the WasmBoundsCheck will side-exit when it branches, so the generator
        must do some kind of termination. In Wasm this is used to trap and unwind back to JS. Must use
        the WasmBoundsCheckValue class.</dd>

      <dt>Void Upsilon(T, ^phi)</dt>
      <dd>B3 uses SSA.  SSA requires that each variable gets assigned to only once anywhere in
        the procedure.  But that doesn't work if you have a variable that is supposed to be the
        result of merging two values along control flow.  B3 uses Phi values to represent value
        merges, just like SSA compilers usually do.  But while other SSA compilers represent the
        inputs to the Phi by listing the control flow edges from which the Phi gets its values,
        B3 uses the Upsilon value.  Each Phi behaves as if it has a memory location associated
        with it.  Executing the Phi behaves like a load from that memory location.
        Upsilon(@value, ^phi) behaves like a store of @value into the memory location associated
        with @phi.  We say "^phi" when we mean that we are writing to the memory location
        associated with the Phi.  Must use the UpsilonValue class.</dd>

      <dt>T Phi()</dt>
      <dd>Works for any type except Void.  Represents a local memory location large enough to
        hold T.  Loads from that memory location.  The only way to store to that location is
        with Upsilon.</dd>

      <dt>Void Jump(takenBlock)</dt>
      <dd>Jumps to takenBlock.  This must appear at the end of the basic block. The basic block
        must have exactly one successor.</dd>

      <dt>Void Branch(T, takenBlock, notTakenBlock)</dt>
      <dd>Works for T = Int32 or T = Int64.  Branches on the child.  If the child is non-zero,
        it branches to the takenBlock.  Otherwise it branches to the notTakenBlock.  Must appear
        at the end of the basic block. The block must have exactly two successors.</dd>

      <dt>Void Switch(T, cases...)</dt>
      <dd>Works for T = Int32 or T = Int64.  Switches on the child.  Contains a list of switch
        cases.  Each switch case has an integer constant and a target block.  The switch value
        also contains a fall-through target in case the child has a value that wasn't mentioned
        in the cases list.  Must use the SwitchValue class.  Must appear at the end of the basic
        block. The block must have one successor for each case, plus a successor for the
        fall-through (default) case. You can manage the successors of a block containing a Switch
        using API in SwitchValue, like SwitchValue::appendCase() and
        SwitchValue::setFallThrough().</dd>

      <dt>Void EntrySwitch()</dt>
      <dd>
        <p>B3 supports multiple procedure entrypoints. The way you create multiple entrypoints is
          by placing an EntrySwitch at the end of the root block. The root block must then have a
          successor for each entrypoint. Additionally, you must tell the Procedure how many
          entrypoints you want. For example:</p>
        <pre><code>Procedure proc;
proc.setNumEntrypoints(3);
BasicBlock* root = proc.addBlock();
BasicBlock* firstEntrypoint = proc.addBlock();
BasicBlock* secondEntrypoint = proc.addBlock();
BasicBlock* thirdEntrypoint = proc.addBlock();
root->appendNew&lt;Value&gt;(proc, EntrySwitch, Origin());
root->appendSuccessor(firstEntrypoint);
root->appendSuccessor(secondEntrypoint);
root->appendSuccessor(thirdEntrypoint);</code></pre>
        <p>This is the canonical way to use EntrySwitch, however the semantics are flexible enough
          to allow its use anywhere in the control flow graph. You can have an arbitrary number of
          EntrySwitches. This flexibility ensures that EntrySwitch works even when B3 does
          transformations that move code above the EntrySwitch, duplicate the EntrySwitch itself,
          or do any number of other unexpected things.</p>
        <p>EntrySwitch behaves as if each Procedure has a variable called Entry. Each physical
          entrypoint sets Entry to the index of that entrypoint (so 0, 1, or 2, above) and jumps to
          the root block. EntrySwitch is just a switch on the Entry variable. Transformations over
          code that uses EntrySwitch are valid so long as they don't change the procedure's
          behavior under these semantics.</p>
        <p>EntrySwitch is implemented without any actual variables or switching. Instead, all code
          that lies on some path from the root block to some EntrySwitch is cloned for each
          entrypoint. This lowering is done as a very late phase in Air, so most of the compiler
          does not have to know anything about entrypoints. Most of the compiler treats EntrySwitch
          as an opaque control flow construct. EntrySwitch lowering is allowed to clone an
          arbitrary amount of code. However, normal uses of EntrySwitch will place it at the end of
          an empty root block and B3 will only hoist a handful of things above EntrySwitch. This
          leads to only a small amount of cloned code in practice.</p>
      </dd>

      <dt>Void Return(T <i>(optional)</i>)</dt>
      <dd>
          <p>
            Returns the control flow to the caller and terminates the procedure.
            Must appear at the end of the basic block. The block must have zero successors.
          </p>
          <p>
            If the node has a child, its value is returned. The type of the child can be any type except Void.
          </p>
      </dd>

      <dt>Void Oops()</dt>
      <dd>Indicates unreachable code.  This may be implemented as either a trap or as a bare
        fall-through, since B3 is allowed to assume that this will never be reached.  Must appear
        at the end of the basic block.  The block must have zero successors. Note that we also use
        the Oops opcode to mean "no such opcode" in some B3 transformations.</dd>
    </dl>

    <h2>Flags</h2>

    <p>This section describes flags. These may be set in
      <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/B3Kind.h"><code>Kind</code></a>.</p>

    <dl>
      <dt>Chill</dt>
      <dd><i>Applies to: Div, Mod.</i> You can create a chill Div/Mod by saying
        <code>chill(Div)</code>. This creates a Kind that has the Chill flag set. This can only be
        used with interer types. An operation is said to be chill if it returns a sensible value
        whenever its non-chill form would have had undefined behavior. Chill Div turns x/0 into 0
        and -2<sup>31</sup>/-1 into -2<sup>31</sup>. We recognize this in IR because it's exactly
        the semantics of division on ARM64, and it's also exactly the semantics that JavaScript
        wants for "(x/y)|0". Chill Mod turns x%0 into 0 and -2<sup>31</sup>%-1 into 0. This matches
        the semantics of JavaScript "(x%y)|0".</dd>

      <dt>Traps</dt>
      <dd><i>Applies to: Load8Z, Load8S, Load16Z, Load16S, Load, Store8, Store16, Store.</i> You can
        create a trapping Kind from an opcode by saying <code>trapping(opcode)</code>. For example,
        <code>trapping(Load)</code>. An operation is said to be trapping if it will cause a page
        fault when used on an invalid pointer and this page fault will be observed. This means that
        the compiler cannot fudge when the page fault happens. This is logically equivalent to what
        B3 calls <code>Effects::exitsSideways</code>, but further implies that if any of the B3
        values used to fuse an Air instruction were trapping, then the Air instruction must have its
        <code>Air::Kind::traps</code> flag set. The compiler won't help you identify where you
        trapped. Even if you use the compiler's origin facility to track down the trap location, you
        may get the origin of any B3 value that was incorporated into the fused instruction that
        caused the trap. For example, "Add(Load&lt;Traps&gt;(ptr), $1)" may claim to trap at the Add
        rather than the Load on x86, because this pattern is a perfect candidate for add-load
        fusion. Nevertheless, you are guaranteed to get the trap and the trap will be observed at
        the point you intended. For example, the compiler will not hoist a trapping load past any
        effects, even those outside of its read range, because the trap is presumed to read top. The
        compiler will not attempt to DCE a trapping load. The compiler will not attempt to sink or
        eliminate any trapping stores, even if they are dead because of a guaranteed subsequent
        store to the same address, because we conservatively assume that the store was done for the
        trap effect. This feature is meant to support high throughput memory safety checks in
        WebAssembly.</dd>
    </dl>

  </div>
</body>
</html>