811 lines
48 KiB
HTML
811 lines
48 KiB
HTML
<html>
|
|
<head>
|
|
<title>B3 Intermediate Representation</title>
|
|
<link rel="stylesheet" type="text/css" href="style.css">
|
|
</head>
|
|
<body>
|
|
<div id="banner">
|
|
<a href="http://www.webkit.org/" class="banner-link">
|
|
<div id="logo" class="site-logo">
|
|
WebKit
|
|
<span class="tagline">Open Source Web Browser Engine</span>
|
|
</div>
|
|
</a>
|
|
</div>
|
|
<div id="contents">
|
|
<h1><a href="index.html">Bare Bones Backend</a> / B3 Intermediate Representation</h1>
|
|
<p>B3 IR is a C-like SSA representation of a procedure. A procedure has a root block at
|
|
which it starts execution when it is invoked. A procedure does not have to terminate, but
|
|
if it does, then it can be either due to a Return, which gracefully returns some value, or
|
|
by a side-exit at designated instructions. B3 gives the client a lot of flexibility to
|
|
implement many different kinds of side-exits.</p>
|
|
|
|
<p>B3 is designed to represent procedures for the purpose of transforming them. Knowing
|
|
what transformations are legal requires knowing what a procedure does. A transformation
|
|
is valid if it does not change the observable behavior of a procedure. This document
|
|
tells you what B3 procedures do by telling you what each construct in B3 IR does.</p>
|
|
|
|
<h2>Procedure</h2>
|
|
|
|
<p>The parent object of all things in B3 is the Procedure. Every time you want to compile
|
|
something, you start by creating a Procedure. The lifecycle of a Procedure is
|
|
usually:</p>
|
|
|
|
<ol>
|
|
<li>Create the Procedure.</li>
|
|
<li>Populate the Procedure with code.</li>
|
|
<li>Use either the <a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/B3Compilation.h">high-level
|
|
Compilation API</a> or the
|
|
<a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/B3Generate.h">low-level
|
|
generation API</a>.</li>
|
|
</ol>
|
|
|
|
<p>The act of compiling the Procedure changes it in-place, making it unsuitable for
|
|
compiling again. Always create a new Procedure every time you want to compile
|
|
something.</p>
|
|
|
|
<h2>Types</h2>
|
|
|
|
<p>B3 has a trivial type system with only five types:</p>
|
|
|
|
<dl>
|
|
<dt>Void</dt>
|
|
<dd>Used to say that an instruction does not return a value.</dd>
|
|
|
|
<dt>Int32</dt>
|
|
<dd>32-bit integer. Integers don't have sign, but operations on them do.</dd>
|
|
|
|
<dt>Int64</dt>
|
|
<dd>64-bit integer.</dd>
|
|
|
|
<dt>Float</dt>
|
|
<dd>32-bit binary floating point number.</dd>
|
|
|
|
<dt>Double</dt>
|
|
<dd>64-bit binary floating point number.</dd>
|
|
</dl>
|
|
|
|
<p>B3 does not have a pointer type. Instead, the <code>B3::pointerType()</code> function will
|
|
return either Int32 or Int64 depending on which kind of integer can be used to represent a
|
|
pointer on the current platform. It's not a goal of B3 to support hardware targets that require
|
|
pointers and integers to be segregated. It's not a goal of B3 to support GC (garbage
|
|
collection) roots as a separate type, since JSC uses
|
|
<a href="http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-88-2.pdf">Bartlett-style conservative
|
|
root scanning</a>. This doesn't preclude any mainstream garbage collection algorithms,
|
|
including copying, generational, or concurrent collectors, and frees up the compiler to perform
|
|
more optimizations.</p>
|
|
|
|
<h2>Values</h2>
|
|
|
|
<p>Variables, and the instructions that define them, are represented using the Value object.
|
|
The Value object has a return type, a kind, and zero or more children. Children are
|
|
references to other Values. Those values are used as input to the instruction that
|
|
computes this value.</p>
|
|
|
|
<p>The value kind is a combination of an opcode and optional flags. The flags allow a single
|
|
opcode to have many variants. For example, Div and Mod may have the Chill flag set to indicate
|
|
that they should not trap on corner cases. Alternatively, Load/Store opcodes may have the
|
|
Traps flag set to indicate that they should trap deterministically.</p>
|
|
|
|
<p>Values also have a unique 32-bit index that is used as the name.</p>
|
|
|
|
<p>Example:</p>
|
|
|
|
<pre><code>Int32 @3 = Add(@1, @2)</code></pre>
|
|
|
|
<p>This represents a single Value instance. Its index is 3. It is an Int32. The opcode is
|
|
Add, and its children are @1 and @2.</p>
|
|
|
|
<p>Values may also have additional meta-data. We use special subclasses of the B3::Value
|
|
class for values that need meta-data. For example, the Load value needs a 32-bit offset
|
|
for the load. We use the MemoryValue class for memory-accessing values, which all have
|
|
such an offset.</p>
|
|
|
|
<h2>Stack Slots</h2>
|
|
|
|
<p>B3 exposes the concept of stack-allocated data and gives the client a lot of control.
|
|
By default, stack slots get allocated wherever B3 chooses. It will try to pack them as
|
|
much as possible. After compilation is done, you can retrieve each stack slot's location
|
|
in the form of an offset from the frame pointer.</p>
|
|
|
|
<p>You can force stack slots to end up at a particular offset from the frame pointer, though
|
|
this is very dangerous. For example, B3 assumes that it can use the slots closest to the
|
|
frame pointer for callee-saves, and currently when you force something to a particular
|
|
frame pointer offset, there is no mechanism to notice that this space is also being used
|
|
for callee-saves. Therefore, we recommend not using the frame pointer offset forcing
|
|
feature unless you know a lot about the ABI and you have no other choice.</p>
|
|
|
|
<h2>Variables</h2>
|
|
|
|
<p>Sometimes, SSA is inconvenient. For example, it's hard to do path specialization over SSA.
|
|
B3 has the concept of Variables as a fall-back. The backend knows how to handle them and
|
|
will coalesce and copy-propagate them. Inside the B3 optimizer, there is a classic SSA
|
|
builder that eliminates variables and builds SSA in their place.</p>
|
|
|
|
<p>You can create Variables by using Procedure::addVariable(), and then you can access them
|
|
using the Get and Set opcodes.</p>
|
|
|
|
<p>The fixSSA() phase will convert variables to SSA. If you use a lot of variables in your
|
|
input to B3, it's a good idea to run fixSSA() manually before running the compiler. The
|
|
default optimizer only runs fixSSA() towards the middle of optimizations. Passing non-SSA code
|
|
as input to the optimizer may render the early phases ineffective. Fortunately, B3 phases
|
|
are super easy to run. The following runs SSA fix-up on a Procedure named "proc":</p>
|
|
|
|
<pre><code>fixSSA(proc);</code></pre>
|
|
|
|
<h2>Control flow</h2>
|
|
|
|
<p>B3 represents control flow using basic blocks. Each basic block may have zero or more
|
|
predecessors. Each basic block may have zero or more successors. The meaning of the
|
|
successors is determined by the basic block's last Value, which must be a terminal. A
|
|
value is terminal if:</p>
|
|
|
|
<pre><code>value->effects().terminal</code></pre>
|
|
|
|
<p>Some opcodes are definitely terminal, like Jump, Branch, Oops, Return, and Switch. But a
|
|
value with the Patchpoint opcode may or may not be terminal. In general it's necessary to
|
|
check the <code>terminal</code> bit as shown above.</p>
|
|
|
|
<p>Each basic block contains a Vector<Value*> as the contents of the block. Control
|
|
flow inside the block is implicit based on the order of Values in the vector.</p>
|
|
|
|
<h2>Memory model</h2>
|
|
|
|
<p>B3 uses a secondary address space of <i>abstract heaps</i> to reason about aliasing and
|
|
concurrency. This address space is 32-bit and we point at it using the HeapRange type. If you
|
|
never supply HeapRanges to your memory operations (the default), they will have [0, UINT_MAX]
|
|
as their range, indicating that the operation may conservatively write to all 2<sup>32</sup>
|
|
abstract heaps. Two memory operations are said to alias if they access the same abstract heaps.
|
|
A simple example of using HeapRanges is with loads and stores, but you can also supply
|
|
HeapRanges as bounds on the effects of calls and patchpoints.</p>
|
|
|
|
<p>Fences are modeled as phantom effects that we call <i>fence ranges</i>. For example, a load
|
|
fence could be represented as a patchpoint that writes to [0, UINT_MAX]. B3 models load fences,
|
|
store fences, store-load fences, acquire fences, and release fences using fence ranges. The
|
|
absence of a fence range means that there is no fence.</p>
|
|
|
|
<p>Acquire fences are modeled as phantom writes to memory, while release fences are modeled as
|
|
phantom reads from memory. By itself this does not permit B3 to do all of the reorderings that
|
|
acquire/release allow. B3 is therefore allowed to treat loads/stores with fence ranges more
|
|
precisely. The phantom write effects of a fenced load cannot write to anything that is read or
|
|
written after the fence. The phantom read effects of a fenced store cannot read anything that
|
|
is written after the fence.</p>
|
|
|
|
<p>Applying a fence range to a load or store only makes that access a half-fence: load-with-fence
|
|
is an acquire fence that only prevents later operations from being hoisted and store-with-fence
|
|
is a release fence that only prevents earlier operations from being sunk. The canonical way to
|
|
express a fully fenced load (which is fenced in <i>both</i> directions) is to use
|
|
AtomicXchgAdd(0, ptr) with a non-empty fence range. The canonical way to express a fully fenced
|
|
store is to use AtomicXchg(value, ptr) with a non-empty fence range.</p>
|
|
|
|
<h2>Opcodes</h2>
|
|
|
|
<p>This section describes opcodes in the following format:</p>
|
|
|
|
<dl>
|
|
<dt>Int32 Foo(Int64, Double)</dt>
|
|
<dd>This describes an opcode named Foo that uses Int32 as its return type and takes two
|
|
children - one of type Int64 and another of type Double.</dd>
|
|
</dl>
|
|
|
|
<p>We sometimes use the wildcard type T to represent polymorphic operations, like "T Add(T,
|
|
T)". This means that the value must take two children of the same type and returns a
|
|
value of that type. We use the type IntPtr to mean either Int32, or Int64, depending on
|
|
the platform.</p>
|
|
|
|
<p>Some opcodes can have some flags set. A description of flags follows the description of
|
|
opcodes.</p>
|
|
|
|
<h3>Opcode descriptions</h3>
|
|
|
|
<dl>
|
|
<dt>Void Nop()</dt>
|
|
<dd>The empty value. Instead of removing Values from basic blocks, most optimizations
|
|
convert them to Nops. Various phases run fix-up where all Nops are removed in one pass.
|
|
It's common to see Nops in intermediate versions of B3 IR during optimizations. Nops
|
|
never lead to any code being generated and they do not impede optimizations, so they are
|
|
usually harmless. You can convert a Value to a Nop by doing convertToNop().</dd>
|
|
|
|
<dt>T Identity(T)</dt>
|
|
<dd>Returns the passed value. May be used for any type except Void. Instead of replacing
|
|
all uses of a Value with a different Value, most optimizations convert them to Identity.
|
|
Various phases run fix-up where all uses of Identity are replaced with the Identity's
|
|
child (transitively, so Identity(Identity(Identity(@x))) is changed to just @x). Even
|
|
the instruction selector "sees through" Identity. You can remove all references to
|
|
Identity in any value by calling Value::performSubstitution(). You can convert a Value
|
|
to an Identity by doing convertToIdentity(otherValue). If the value is Void,
|
|
convertToIdentity() converts it to a Nop instead.</dd>
|
|
|
|
<dt>T Opaque(T)</dt>
|
|
<dd>Returns the passed value. May be used for any type except Void. This opcode is provided as a hack
|
|
for avoiding B3 optimizations that the client finds unprofitable. B3 treats Opaque as an identity
|
|
only during instruction selection. All prior optimizations (including all B3 IR optimizations) do
|
|
not know what Opaque does, other than Opaque(x) == Opaque(x) and Opaque(Opaque(x)) == Opaque(x).</dd>
|
|
|
|
<dt>Int32 Const32(constant)</dt>
|
|
<dd>32-bit integer constant. Must use the Const32Value class, which has space for the
|
|
int32_t constant.</dd>
|
|
|
|
<dt>Int64 Const64(constant)</dt>
|
|
<dd>64-bit integer constant. Must use the Const64Value class, which has space for the
|
|
int64_t constant.</dd>
|
|
|
|
<dt>Float ConstFloat(constant)</dt>
|
|
<dd>Float constant. Must use the ConstFloatValue class, which has space for the float constant.</dd>
|
|
|
|
<dt>Double ConstDouble(constant)</dt>
|
|
<dd>Double constant. Must use the ConstDoubleValue class, which has space for the double constant.</dd>
|
|
|
|
<dt>Void Set(value, variable)</dt>
|
|
<dd>Assigns the given value to the given Variable. Must use the VariableValue class.</dd>
|
|
|
|
<dt>T Get(variable)</dt>
|
|
<dd>Returns the current value of the given Variable. Its return type depends on the
|
|
variable. Must use the VariableValue class.</dd>
|
|
|
|
<dt>IntPtr SlotBase(stackSlot)</dt>
|
|
<dd>Returns a pointer to the base of the given stack slot. Must use the SlotBaseValue
|
|
class.</dd>
|
|
|
|
<dt>IntPtr|Double ArgumentReg(%register)</dt>
|
|
<dd>Returns the value that the given register had at the prologue of the procedure. It
|
|
returns IntPtr for general-purpose registers and Double for FPRs. Must use the
|
|
ArgumentRegValue class.</dd>
|
|
|
|
<dt>IntPtr FramePointer()</dt>
|
|
<dd>Returns the value of the frame pointer register. B3 procedures alway use a frame
|
|
pointer ABI, and the frame pointer is always guaranteed to have this value anywhere
|
|
inside the procedure.</dd>
|
|
|
|
<dt>T Add(T, T)</dt>
|
|
<dd>Works with any type except Void. For integer types, this represents addition with
|
|
wrap-around semantics. For floating point types, this represents addition according to
|
|
the IEEE 854 spec. B3 does not have any notion of "fast math". A transformation over
|
|
floating point code is valid if the new code produces exactly the same value, bit for
|
|
bit.</dd>
|
|
|
|
<dt>T Sub(T, T)</dt>
|
|
<dd>Works with any type except Void. For integer types, this represents subtraction with
|
|
wrap-around semantics. For floating point types, this represents subtraction according
|
|
to the IEEE 854 spec.</dd>
|
|
|
|
<dt>T Mul(T, T)</dt>
|
|
<dd>Works with any type except Void. For integer types, this represents multiplication
|
|
with wrap-around semantics. For floating point types, this represents multiplication
|
|
according to the IEEE 854 spec.</dd>
|
|
|
|
<dt>T Div(T, T)</dt>
|
|
<dd>Works with any type except Void. For integer types, this represents signed
|
|
division with round-to-zero. By default, its behavior is undefined for x/0 or
|
|
-2<sup>31</sup>/-1. For floating point types, this represents division according
|
|
to the IEEE 854 spec. Integer Div may have the Chill flag set.</dd>
|
|
|
|
<dt>T Mod(T, T)</dt>
|
|
<dd>Works with any type except Void. For integer types, this represents signed
|
|
modulo. By default, its behavior is undefined for x%0 or -2<sup>31</sup>%-1. For
|
|
floating point types, this represents modulo according to "fmod()". Integer Mod may have the
|
|
Chill flag set.</dd>
|
|
|
|
<dt>T Neg(T)</dt>
|
|
<dd>Works with any type except Void. For integer types, this represents twos-complement
|
|
negation. For floating point types, this represents negation according to the IEEE
|
|
spec.</dd>
|
|
|
|
<dt>T BitAnd(T, T)</dt>
|
|
<dd>Bitwise and. Valid for any type except Void.</dd>
|
|
|
|
<dt>T BitOr(T, T)</dt>
|
|
<dd>Bitwise or. Valid for any type except Void.</dd>
|
|
|
|
<dt>T BitXor(T, T)</dt>
|
|
<dd>Bitwise xor. Valid for any type except Void.</dd>
|
|
|
|
<dt>T Shl(T, Int32)</dt>
|
|
<dd>Shift left. Valid for Int32 and Int64. The shift amount is always Int32. Only the
|
|
low 31 bits of the shift amount are used for Int32. Only the low 63 bits of the shift
|
|
amount are used for Int64.</dd>
|
|
|
|
<dt>T SShr(T, Int32)</dt>
|
|
<dd>Shift right with sign extension. Valid for Int32 and Int64. The shift amount is
|
|
always Int32. Only the low 31 bits of the shift amount are used for Int32. Only the
|
|
low 63 bits of the shift amount are used for Int64.</dd>
|
|
|
|
<dt>T ZShr(T, Int32)</dt>
|
|
<dd>Shift right with zero extension. Valid for Int32 and Int64. The shift amount is
|
|
always Int32. Only the low 31 bits of the shift amount are used for Int32. Only the
|
|
low 63 bits of the shift amount are used for Int64.</dd>
|
|
|
|
<dt>T RotR(T, Int32)</dt>
|
|
<dd>Rotate right. Valid for Int32 and Int64. The shift amount is always Int32. Only the
|
|
low 31 bits of the shift amount are used for Int32. Only the low 63 bits of the shift
|
|
amount are used for Int64.</dd>
|
|
|
|
<dt>T RotL(T, Int32)</dt>
|
|
<dd>Rotate left. Valid for Int32 and Int64. The shift amount is always Int32. Only the
|
|
low 31 bits of the shift amount are used for Int32. Only the low 63 bits of the shift
|
|
amount are used for Int64.</dd>
|
|
|
|
<dt>T Clz(T)</dt>
|
|
<dd>Count leading zeroes. Valid for Int32 and Int64.</dd>
|
|
|
|
<dt>T Abs(T)</dt>
|
|
<dd>Absolute value. Valid for Float and Double.</dd>
|
|
|
|
<dt>T Ceil(T)</dt>
|
|
<dd>Ceiling. Valid for Float and Double.</dd>
|
|
|
|
<dt>T Floor(T)</dt>
|
|
<dd>Flooring. Valid for Float and Double.</dd>
|
|
|
|
<dt>T Sqrt(T)</dt>
|
|
<dd>Square root. Valid for Float and Double.</dd>
|
|
|
|
<dt>U BitwiseCast(T)</dt>
|
|
<dd>If T is Int32 or Int64, it returns the bitwise-corresponding Float or Double,
|
|
respectively. If T is Float or Double, it returns the bitwise-corresponding Int32 or
|
|
Int64, respectively.</dd>
|
|
|
|
<dt>Int32 SExt8(Int32)</dt>
|
|
<dd>Fills the top 24 bits of the integer with the low byte's sign extension.</dd>
|
|
|
|
<dt>Int32 SExt16(Int32)</dt>
|
|
<dd>Fills the top 16 bits of the integer with the low short's sign extension.</dd>
|
|
|
|
<dt>Int64 SExt32(Int32)</dt>
|
|
<dd>Returns a 64-bit integer such that the low 32 bits are the given Int32 value and the
|
|
high 32 bits are its sign extension.</dd>
|
|
|
|
<dt>Int64 ZExt32(Int32)</dt>
|
|
<dd>Returns a 64-bit integer such that the low 32 bits are the given Int32 value and the
|
|
high 32 bits are zero.</dd>
|
|
|
|
<dt>U Trunc(T)</dt>
|
|
<dd>Returns the low 32 bits of the 64-bit value. If T is Int64 then U is Int32.
|
|
If T is Double then U is Float.</dd>
|
|
|
|
<dt>Double IToD(T)</dt>
|
|
<dd>Converts the given integer into a double. Value for Int32 or Int64 inputs.</dd>
|
|
|
|
<dt>Double FloatToDouble(Float)</dt>
|
|
<dd>Converts the given float into a double.</dd>
|
|
|
|
<dt>Float DoubleToFloat(Double)</dt>
|
|
<dd>Converts the given double into a float.</dd>
|
|
|
|
<dt>Int32 Equal(T, T)</dt>
|
|
<dd>Compares the two values. If they are equal, return 1; else return 0. Valid for all
|
|
types except Void. Integer comparisons simply compare all bits. Floating point
|
|
comparisons mostly compare bits, but have some corner cases: positive zero and negative
|
|
zero are considered equal, and they return false when either value is NaN.</dd>
|
|
|
|
<dt>Int32 NotEqual(T, T)</dt>
|
|
<dd>The opposite of Equal(). NotEqual(@x, @y) yields the same result as BitXor(Equal(@x,
|
|
@y), 1).</dd>
|
|
|
|
<dt>Int32 LessThan(T, T)</dt>
|
|
<dd>Returns 1 if the left value is less than the right one, 0 otherwise. Does a signed
|
|
comparison for integers. For floating point comparisons, this has the usual caveats
|
|
with respect to negative zero and NaN.</dd>
|
|
|
|
<dt>Int32 GreaterThan(T, T)</dt>
|
|
<dd>Returns 1 if the left value is greater than the right one, 0 otherwise. Does a signed
|
|
comparison for integers. For floating point comparisons, this has the usual caveats
|
|
with respect to negative zero and NaN.</dd>
|
|
|
|
<dt>Int32 LessEqual(T, T)</dt>
|
|
<dd>Returns 1 if the left value is less than or equal to the right one, 0 otherwise. Does
|
|
a signed comparison for integers. For floating point comparisons, this has the usual
|
|
caveats with respect to negative zero and NaN.</dd>
|
|
|
|
<dt>Int32 GreaterEqual(T, T)</dt>
|
|
<dd>Returns 1 if the left value is greater than or equal to the right one, 0 otherwise.
|
|
Does a signed comparison for integers. For floating point comparisons, this has the
|
|
usual caveats with respect to negative zero and NaN.</dd>
|
|
|
|
<dt>Int32 Above(T, T)</dt>
|
|
<dd>Unsigned integer comparison, valid for Int32 and Int64 only. Returns 1 if the left
|
|
value is unsigned-greater-than the right one, 0 otherwise.</dd>
|
|
|
|
<dt>Int32 Below(T, T)</dt>
|
|
<dd>Unsigned integer comparison, valid for Int32 and Int64 only. Returns 1 if the left
|
|
value is unsigned-less-than the right one, 0 otherwise.</dd>
|
|
|
|
<dt>Int32 AboveEqual(T, T)</dt>
|
|
<dd>Unsigned integer comparison, valid for Int32 and Int64 only. Returns 1 if the left
|
|
value is unsigned-greater-than-or-equal the right one, 0 otherwise.</dd>
|
|
|
|
<dt>Int32 BelowEqual(T, T)</dt>
|
|
<dd>Unsigned integer comparison, valid for Int32 and Int64 only. Returns 1 if the left
|
|
value is unsigned-less-than-or-equal the right one, 0 otherwise.</dd>
|
|
|
|
<dt>Int32 EqualOrUnordered(T, T)</dt>
|
|
<dd>Floating point comparison, valid for Float and Double only. Returns 1 if the left
|
|
value is equal to the right one or if either value is NaN. Returns 0 otherwise.</dd>
|
|
|
|
<dt>T Select(U, T, T)</dt>
|
|
<dd>Returns either the second child or the third child. T can be any type except Void. U
|
|
can be either Int32 or Int64. If the first child is non-zero, returns the second child.
|
|
Otherwise returns the third child.</dd>
|
|
|
|
<dt>Int32 Load8Z(IntPtr, offset)</dt>
|
|
<dd>Loads a byte from the address, which is computed by adding the compile-time 32-bit
|
|
signed integer offset to the child value. Zero extends the loaded byte, so the high 24
|
|
bits are all zero. Must use the MemoryValue class. May have the Traps flag set. May set
|
|
a fence range to turn this into a load acquire.</dd>
|
|
|
|
<dt>Int32 Load8S(IntPtr, offset)</dt>
|
|
<dd>Loads a byte from the address, which is computed by adding the compile-time 32-bit
|
|
signed integer offset to the child value. Sign extends the loaded byte. Must use the
|
|
MemoryValue class. May have the Traps flag set. May set
|
|
a fence range to turn this into a load acquire.</dd>
|
|
|
|
<dt>Int32 Load16Z(IntPtr, offset)</dt>
|
|
<dd>Loads a 16-bit integer from the address, which is computed by adding the compile-time
|
|
32-bit signed integer offset to the child value. Zero extends the loaded 16-bit
|
|
integer, so the high 16 bits are all zero. Misaligned loads are not penalized. Must
|
|
use the MemoryValue class. May have the Traps flag set. May set
|
|
a fence range to turn this into a load acquire.</dd>
|
|
|
|
<dt>Int32 Load16S(IntPtr, offset)</dt>
|
|
<dd>Loads a 16-bit integer from the address, which is computed by adding the compile-time
|
|
32-bit signed integer offset to the child value. Sign extends the loaded 16-bit
|
|
integer. Misaligned loads are not penalized. Must use the MemoryValue class. May have the
|
|
Traps flag set. May set a fence range to turn this into a load acquire.</dd>
|
|
|
|
<dt>T Load(IntPtr, offset)</dt>
|
|
<dd>Valid for any type except Void. Loads a value of that type from the address, which is
|
|
computed by adding the compile-time 32-bit signed integer offset to the child value.
|
|
Misaligned loads are not penalized. Must use the MemoryValue class. May have the Traps
|
|
flag set. May set a fence range to turn this into a load acquire.</dd>
|
|
|
|
<dt>Void Store8(Int32, IntPtr, offset)</dt>
|
|
<dd>Stores a the low byte of the first child into the address computed by adding the
|
|
compile-time 32-bit signed integer offset to the second child. Must use the MemoryValue
|
|
class. May have the Traps flag set. May set a fence range to turn this into a store release.</dd>
|
|
|
|
<dt>Void Store16(Int32, IntPtr, offset)</dt>
|
|
<dd>Stores a the low 16 bits of the first child into the address computed by adding the
|
|
compile-time 32-bit signed integer offset to the second child. Misaligned stores are
|
|
not penalized. Must use the MemoryValue class. May have the Traps flag set. May set a
|
|
fence range to turn this into a store release.</dd>
|
|
|
|
<dt>Void Store(T, IntPtr, offset)</dt>
|
|
<dd>Stores the value in the first child into the address computed by adding the
|
|
compile-time 32-bit signed integer offset to the second child. Misaligned stores are
|
|
not penalized. Must use the MemoryValue class. May have the Traps flag set. May set a
|
|
fence range to turn this into a store release.</dd>
|
|
|
|
<dt>Int32 AtomicWeakCAS(T expectedValue, T newValue, IntPtr ptr)</dt>
|
|
<dd>Performs a weak <a href="https://en.wikipedia.org/wiki/Compare-and-swap">CAS
|
|
(compare-and-swap)</a>. Returns a boolean (0 or 1) to indicate the result (failure or
|
|
success, respectively). May spuriously fail, in which case it will do nothing and return
|
|
0. You can supply a fence range to indicate that the CAS is fenced. Fenced CAS has both
|
|
acquire and release fences, and claims to both read and write the full fence range in
|
|
addition to whatever primary range is supplied for the CAS itself. AtomicWeakCAS only
|
|
has fencing behavior in the case that it succeeds. Atomic operations take a width parameter,
|
|
allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory locations.</dd>
|
|
|
|
<dt>T AtomicStrongCAS(T expectedValue, T newValue, IntPtr ptr)</dt>
|
|
<dd>Performs a strong <a href="https://en.wikipedia.org/wiki/Compare-and-swap">CAS
|
|
(compare-and-swap)</a>. Returns the old value before the CAS. The instruction selector
|
|
is smart about Equal(AtomicStrongCAS(expected, ...), expected) patterns, so this opcode
|
|
is also the best way to do a strong CAS that returns a boolean - just use Equal to compare
|
|
the result to expected. You can supply a fence range to indicate that the CAS is fenced.
|
|
Fenced CAS has both acquire and release fences, and claims to both read and write the full
|
|
fence range in addition to whatever primary range is supplied for the CAS itself.
|
|
AtomicStrongCAS has the same fencing on failure and success. Atomic operations take a width
|
|
parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
|
|
locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>
|
|
|
|
<dt>T AtomicXchgAdd(T, IntPtr)</dt>
|
|
<dd>Atomically adds a value to the memory location and returns the old value. Is allowed to
|
|
have a fence range, which causes it to have acquire/release fencing. Atomic operations take
|
|
a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
|
|
locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>
|
|
|
|
<dt>T AtomicXchgAnd(T, IntPtr)</dt>
|
|
<dd>Atomically bitwise-ands a value to the memory location and returns the old value. Is allowed to
|
|
have a fence range, which causes it to have acquire/release fencing. Atomic operations take
|
|
a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
|
|
locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>
|
|
|
|
<dt>T AtomicXchgOr(T, IntPtr)</dt>
|
|
<dd>Atomically bitwise-ors a value to the memory location and returns the old value. Is allowed to
|
|
have a fence range, which causes it to have acquire/release fencing. Atomic operations take
|
|
a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
|
|
locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>
|
|
|
|
<dt>T AtomicXchgSub(T, IntPtr)</dt>
|
|
<dd>Atomically subtracts a value to the memory location and returns the old value. Is allowed to
|
|
have a fence range, which causes it to have acquire/release fencing. Atomic operations take
|
|
a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
|
|
locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>
|
|
|
|
<dt>T AtomicXchgXor(T, IntPtr)</dt>
|
|
<dd>Atomically bitwise-xors a value to the memory location and returns the old value. Is allowed to
|
|
have a fence range, which causes it to have acquire/release fencing. Atomic operations take
|
|
a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
|
|
locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>
|
|
|
|
<dt>T AtomicXchg(T, IntPtr)</dt>
|
|
<dd>Atomically stores a value to the memory location and returns the old value. Is allowed to
|
|
have a fence range, which causes it to have acquire/release fencing. Atomic operations take
|
|
a width parameter, allowing them to operate over 8-bit, 16-bit, 32-bit, or 64-bit integer memory
|
|
locations. Returns a sign-extended result for 8-bit or 16-bit widths.</dd>
|
|
|
|
<dt>T Depend(T)</dt>
|
|
<dd>Only valid for integer types. This is guaranteed to return zero by xoring the input with
|
|
itself, and the B3 compiler guarantees that it will not leverage this knowledge for any
|
|
optimization. On x86, this is lowered to a Fence with read=Bottom, write=Top (i.e. a load-load
|
|
fence) and then it's constant-folded to zero. On ARM, this is preserved through the compiler
|
|
pipeline all the way to the MacroAssembler, which turns this into a <code>eor</code>. Using
|
|
Depend is the most efficient way to create load-load dependencies, regarldess of platform. It
|
|
allows B3's CSE to work for surrounding loads and it even supports CSEing the dependency
|
|
itself - so two identical load chains with no interleaving effects can be turned into one. On
|
|
ARM, it also avoids emitting an expensive <code>dmb ish</code> load-load fence. Because of the
|
|
benefits for CSE, it's recommended to use Depend for load-load dependencies even if you're only
|
|
targeting x86.</dd>
|
|
|
|
<dt>IntPtr WasmAddress(IntPtr, pinnedGPR)</dt>
|
|
<dd>This is used to compute the address of a wasm memory load from a pinned base GPR.
|
|
Must use the WasmAddressValue class.</dd>
|
|
|
|
<dt>Void Fence()</dt>
|
|
<dd>Abstracts standalone data fences on x86 and ARM. Must use the FenceValue class, which has
|
|
two additional members that configure the precise meaning of the fence:
|
|
<code>HeapRange FenceValue::read</code> and <code>HeapRange FenceValue::write</code>. If the
|
|
<code>write</code> range is empty, this is taken to be a store-store fence, which leads to
|
|
no code generation on x86 and the weaker <code>dmb ishst</code> fence on ARM. If the write
|
|
range is non-empty, this produces <code>mfence</code> on x86 and <code>dmb ish</code> on
|
|
ARM. Within B3 IR, the Fence also reports the read/write in its effects. This allows you to
|
|
scope the fence for the purpose of B3's load elimination. For example, you may use a Fence
|
|
to protect a store from being sunk below a specific load. In that case, you can claim to
|
|
read just that store's range and write that load's range.</dd>
|
|
|
|
<dt>T1 CCall(IntPtr, [T2, [T3, ...]])</dt>
|
|
<dd>Performs a C function call to the function pointed to by the first child. The types
|
|
that the function takes and the type that it returns are determined by the types of the
|
|
children and the type of the CCallValue. Only the first child is mandatory. Must use
|
|
the CCallValue class.</dd>
|
|
|
|
<dt>T1 Patchpoint([T2, [T3, ...]])</dt>
|
|
<dd>
|
|
<p>A Patchpoint is a customizable value. Patchpoints take zero or more values of any
|
|
type and return any type. A Patchpoint's behavior is determined by the generator
|
|
object. The generator is a C++ lambda that gets called during code generation. It gets
|
|
passed an assembler instance (specifically, CCallHelpers&) and an object describing
|
|
where to find all of the input values and where to put the result. Here's an example:</p>
|
|
|
|
<pre><code>PatchpointValue* patchpoint = block->appendNew<PatchpointValue>(proc, Int32, Origin());
|
|
patchpoint->append(ConstrainedValue(arg1, ValueRep::SomeRegister));
|
|
patchpoint->append(ConstrainedValue(arg2, ValueRep::SomeRegister));
|
|
patchpoint->setGenerator(
|
|
[&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
|
|
jit.add32(params[1].gpr(), params[2].gpr(), params[0].gpr());
|
|
});</code></pre>
|
|
|
|
<p>This creates a patchpoint that just adds two numbers. The patchpoint is set to return
|
|
Int32. The two child values, arg1 and arg2, are passed to the patchpoint with the
|
|
SomeRegister constraint, which just requests that they get put in appropriate
|
|
registers (GPR for integer values, FPR for floating point values). The generator uses
|
|
the params object to figure out which registers the inputs are in (params[1] and
|
|
params[2]) and which register to put the result in (params[0]). Many sophisticated
|
|
constraints are possible. You can request that a child gets placed in a specific
|
|
register. You can list additional registers that are
|
|
clobbered - either at the top of the patchpoint (i.e. early) so that the clobbered
|
|
registers interfere with the inputs, or at the bottom of the patchpoint (i.e. late) so
|
|
that the clobbered registers interfere with the output. Patchpoint constraints also
|
|
allow you to force values onto the call argument area of the stack. Patchpoints are
|
|
powerful enough to implement custom calling conventions, inline caches, and
|
|
side-exits.</p>
|
|
|
|
<p>A patchpoint is allowed to "side exit", i.e. abruptly exit from the procedure. If it
|
|
wants to do so by returning, it can use Procedure's API for getting the callee-save
|
|
register layout, unwinding the callee-saves, and then returning. More likely, the
|
|
patchpoint will take some exit state as part of its arguments, and it will manipulate
|
|
the call frame in place to make it look like another execution engine's call frame.
|
|
This is called OSR, and JavaScriptCore does it a lot.</p>
|
|
|
|
<p>A patchpoint can be used as a terminal. Simply set the <code>terminal</code> bit:</p>
|
|
|
|
<pre><code>patchpoint->effects.terminal = true;</code></pre>
|
|
|
|
<p>The generator can determine where to branch by using the StackmapGenerationParams to
|
|
get the set of labels for all successors. You're guaranteed that the number of
|
|
successors of the patchpoint's basic block will be the same as when you created it.
|
|
However, like any value, a patchpoint may be cloned. This means, for example, that if
|
|
you use this to implement a table jump then the jump table must be created inside the
|
|
generator, so that you get one jump table per clone.</p>
|
|
|
|
<p>Must use the PatchpointValue class with the Patchpoint opcode.</p>
|
|
</dd>
|
|
|
|
<dt>T CheckAdd(T, T, [T2, [T3, ...]])</dt>
|
|
<dd>Checked integer addition. Works for T = Int32 or T = Int64. First first two children
|
|
are mandatory. Additional children are optional. All of the Check instructions take a
|
|
generator and value constraints like a Patchpoint. In the case of a CheckAdd, the
|
|
generator runs on the path where the integer addition overflowed. B3 assumes that
|
|
CheckAdd will side-exit upon overflow, so the generator must do some kind of
|
|
termination. Usually, this is used to implement OSR exits on integer overflow and the
|
|
optional arguments to CheckAdd will be the OSR exit state. Must use the CheckValue
|
|
class.</dd>
|
|
|
|
<dt>T CheckSub(T, T, [T2, [T3, ...]])</dt>
|
|
<dd>Checked integer subtraction. Works for T = Int32 or T = Int64. First first two
|
|
children are mandatory. Additional children are optional. All of the Check
|
|
instructions take a generator and value constraints like a Patchpoint. In the case of a
|
|
CheckSub, the generator runs on the path where the integer subtraction overflowed. B3
|
|
assumes that CheckSub will side-exit upon overflow, so the generator must do some kind
|
|
of termination. Usually, this is used to implement OSR exits on integer overflow and
|
|
the optional arguments to CheckSub will be the OSR exit state. You can use CheckSub for
|
|
checked negation by using zero for the first child. B3 will select the native negate
|
|
instruction when you do this. Must use the CheckValue class.</dd>
|
|
|
|
<dt>T CheckMul(T, T, [T2, [T3, ...]])</dt>
|
|
<dd>Checked integer multiplication. Works for T = Int32 or T = Int64. First first two
|
|
children are mandatory. Additional children are optional. All of the Check
|
|
instructions take a generator and value constraints like a Patchpoint. In the case of a
|
|
CheckMul, the generator runs on the path where the integer multiplication overflowed.
|
|
B3 assumes that CheckMul will side-exit upon overflow, so the generator must do some
|
|
kind of termination. Usually, this is used to implement OSR exits on integer overflow
|
|
and the optional arguments to CheckMul will be the OSR exit state. Must use the
|
|
CheckValue class.</dd>
|
|
|
|
<dt>Void Check(T, [T2, [T3, ...]])</dt>
|
|
<dd>Exit check. Works for T = Int32 or T = Int64. This branches on the first child. If
|
|
the first child is zero, this just falls through. If it's non-zero, it goes to the exit
|
|
path generated by the passed generator. Only the first child is mandatory. B3 assumes
|
|
that Check will side-exit when the first child is non-zero, so the generator must do
|
|
some kind of termination. Usually, this is used to implement OSR exit checks and the
|
|
optional arguments to Check will be the OSR exit state. Check supports efficient
|
|
compare/branch fusion, so you can Check for fairly sophisticated predicates. For
|
|
example, Check(Equal(LessThan(@a, @b), 0)) where @a and @b are doubles will be generated
|
|
to an instruction that branches to the exit if @a >= @b or if either @a or @b are
|
|
NaN. Must use the CheckValue class.</dd>
|
|
|
|
<dt>Void WasmBoundsCheck(Int32, pinnedGPR, offset)</dt>
|
|
<dd>Special Wasm opcode. This branches on the first child. If the first child plus the offset
|
|
produces a Int64 less than to the pinnedGPR this falls through. Otherwise, it branches to
|
|
the exit path generated by the passed generator. Unlike the Patch/Check family, the
|
|
generator used by WasmBoundsCheck should be set on the Procedure itself. The GRPReg passed in
|
|
pinnedGPR must also be marked as pinned by calling the Procedure's pinning API, or it must be
|
|
InvalidGPR, in which case the out-of-bounds limit is 4GiB. In the later case a maximum parameter
|
|
can be provided, to further constrain the out-of-bounds limit and help generate smaller
|
|
immediates. B3 assumes the WasmBoundsCheck will side-exit when it branches, so the generator
|
|
must do some kind of termination. In Wasm this is used to trap and unwind back to JS. Must use
|
|
the WasmBoundsCheckValue class.</dd>
|
|
|
|
<dt>Void Upsilon(T, ^phi)</dt>
|
|
<dd>B3 uses SSA. SSA requires that each variable gets assigned to only once anywhere in
|
|
the procedure. But that doesn't work if you have a variable that is supposed to be the
|
|
result of merging two values along control flow. B3 uses Phi values to represent value
|
|
merges, just like SSA compilers usually do. But while other SSA compilers represent the
|
|
inputs to the Phi by listing the control flow edges from which the Phi gets its values,
|
|
B3 uses the Upsilon value. Each Phi behaves as if it has a memory location associated
|
|
with it. Executing the Phi behaves like a load from that memory location.
|
|
Upsilon(@value, ^phi) behaves like a store of @value into the memory location associated
|
|
with @phi. We say "^phi" when we mean that we are writing to the memory location
|
|
associated with the Phi. Must use the UpsilonValue class.</dd>
|
|
|
|
<dt>T Phi()</dt>
|
|
<dd>Works for any type except Void. Represents a local memory location large enough to
|
|
hold T. Loads from that memory location. The only way to store to that location is
|
|
with Upsilon.</dd>
|
|
|
|
<dt>Void Jump(takenBlock)</dt>
|
|
<dd>Jumps to takenBlock. This must appear at the end of the basic block. The basic block
|
|
must have exactly one successor.</dd>
|
|
|
|
<dt>Void Branch(T, takenBlock, notTakenBlock)</dt>
|
|
<dd>Works for T = Int32 or T = Int64. Branches on the child. If the child is non-zero,
|
|
it branches to the takenBlock. Otherwise it branches to the notTakenBlock. Must appear
|
|
at the end of the basic block. The block must have exactly two successors.</dd>
|
|
|
|
<dt>Void Switch(T, cases...)</dt>
|
|
<dd>Works for T = Int32 or T = Int64. Switches on the child. Contains a list of switch
|
|
cases. Each switch case has an integer constant and a target block. The switch value
|
|
also contains a fall-through target in case the child has a value that wasn't mentioned
|
|
in the cases list. Must use the SwitchValue class. Must appear at the end of the basic
|
|
block. The block must have one successor for each case, plus a successor for the
|
|
fall-through (default) case. You can manage the successors of a block containing a Switch
|
|
using API in SwitchValue, like SwitchValue::appendCase() and
|
|
SwitchValue::setFallThrough().</dd>
|
|
|
|
<dt>Void EntrySwitch()</dt>
|
|
<dd>
|
|
<p>B3 supports multiple procedure entrypoints. The way you create multiple entrypoints is
|
|
by placing an EntrySwitch at the end of the root block. The root block must then have a
|
|
successor for each entrypoint. Additionally, you must tell the Procedure how many
|
|
entrypoints you want. For example:</p>
|
|
<pre><code>Procedure proc;
|
|
proc.setNumEntrypoints(3);
|
|
BasicBlock* root = proc.addBlock();
|
|
BasicBlock* firstEntrypoint = proc.addBlock();
|
|
BasicBlock* secondEntrypoint = proc.addBlock();
|
|
BasicBlock* thirdEntrypoint = proc.addBlock();
|
|
root->appendNew<Value>(proc, EntrySwitch, Origin());
|
|
root->appendSuccessor(firstEntrypoint);
|
|
root->appendSuccessor(secondEntrypoint);
|
|
root->appendSuccessor(thirdEntrypoint);</code></pre>
|
|
<p>This is the canonical way to use EntrySwitch, however the semantics are flexible enough
|
|
to allow its use anywhere in the control flow graph. You can have an arbitrary number of
|
|
EntrySwitches. This flexibility ensures that EntrySwitch works even when B3 does
|
|
transformations that move code above the EntrySwitch, duplicate the EntrySwitch itself,
|
|
or do any number of other unexpected things.</p>
|
|
<p>EntrySwitch behaves as if each Procedure has a variable called Entry. Each physical
|
|
entrypoint sets Entry to the index of that entrypoint (so 0, 1, or 2, above) and jumps to
|
|
the root block. EntrySwitch is just a switch on the Entry variable. Transformations over
|
|
code that uses EntrySwitch are valid so long as they don't change the procedure's
|
|
behavior under these semantics.</p>
|
|
<p>EntrySwitch is implemented without any actual variables or switching. Instead, all code
|
|
that lies on some path from the root block to some EntrySwitch is cloned for each
|
|
entrypoint. This lowering is done as a very late phase in Air, so most of the compiler
|
|
does not have to know anything about entrypoints. Most of the compiler treats EntrySwitch
|
|
as an opaque control flow construct. EntrySwitch lowering is allowed to clone an
|
|
arbitrary amount of code. However, normal uses of EntrySwitch will place it at the end of
|
|
an empty root block and B3 will only hoist a handful of things above EntrySwitch. This
|
|
leads to only a small amount of cloned code in practice.</p>
|
|
</dd>
|
|
|
|
<dt>Void Return(T <i>(optional)</i>)</dt>
|
|
<dd>
|
|
<p>
|
|
Returns the control flow to the caller and terminates the procedure.
|
|
Must appear at the end of the basic block. The block must have zero successors.
|
|
</p>
|
|
<p>
|
|
If the node has a child, its value is returned. The type of the child can be any type except Void.
|
|
</p>
|
|
</dd>
|
|
|
|
<dt>Void Oops()</dt>
|
|
<dd>Indicates unreachable code. This may be implemented as either a trap or as a bare
|
|
fall-through, since B3 is allowed to assume that this will never be reached. Must appear
|
|
at the end of the basic block. The block must have zero successors. Note that we also use
|
|
the Oops opcode to mean "no such opcode" in some B3 transformations.</dd>
|
|
</dl>
|
|
|
|
<h2>Flags</h2>
|
|
|
|
<p>This section describes flags. These may be set in
|
|
<a href="http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/b3/B3Kind.h"><code>Kind</code></a>.</p>
|
|
|
|
<dl>
|
|
<dt>Chill</dt>
|
|
<dd><i>Applies to: Div, Mod.</i> You can create a chill Div/Mod by saying
|
|
<code>chill(Div)</code>. This creates a Kind that has the Chill flag set. This can only be
|
|
used with interer types. An operation is said to be chill if it returns a sensible value
|
|
whenever its non-chill form would have had undefined behavior. Chill Div turns x/0 into 0
|
|
and -2<sup>31</sup>/-1 into -2<sup>31</sup>. We recognize this in IR because it's exactly
|
|
the semantics of division on ARM64, and it's also exactly the semantics that JavaScript
|
|
wants for "(x/y)|0". Chill Mod turns x%0 into 0 and -2<sup>31</sup>%-1 into 0. This matches
|
|
the semantics of JavaScript "(x%y)|0".</dd>
|
|
|
|
<dt>Traps</dt>
|
|
<dd><i>Applies to: Load8Z, Load8S, Load16Z, Load16S, Load, Store8, Store16, Store.</i> You can
|
|
create a trapping Kind from an opcode by saying <code>trapping(opcode)</code>. For example,
|
|
<code>trapping(Load)</code>. An operation is said to be trapping if it will cause a page
|
|
fault when used on an invalid pointer and this page fault will be observed. This means that
|
|
the compiler cannot fudge when the page fault happens. This is logically equivalent to what
|
|
B3 calls <code>Effects::exitsSideways</code>, but further implies that if any of the B3
|
|
values used to fuse an Air instruction were trapping, then the Air instruction must have its
|
|
<code>Air::Kind::traps</code> flag set. The compiler won't help you identify where you
|
|
trapped. Even if you use the compiler's origin facility to track down the trap location, you
|
|
may get the origin of any B3 value that was incorporated into the fused instruction that
|
|
caused the trap. For example, "Add(Load<Traps>(ptr), $1)" may claim to trap at the Add
|
|
rather than the Load on x86, because this pattern is a perfect candidate for add-load
|
|
fusion. Nevertheless, you are guaranteed to get the trap and the trap will be observed at
|
|
the point you intended. For example, the compiler will not hoist a trapping load past any
|
|
effects, even those outside of its read range, because the trap is presumed to read top. The
|
|
compiler will not attempt to DCE a trapping load. The compiler will not attempt to sink or
|
|
eliminate any trapping stores, even if they are dead because of a guaranteed subsequent
|
|
store to the same address, because we conservatively assume that the store was done for the
|
|
trap effect. This feature is meant to support high throughput memory safety checks in
|
|
WebAssembly.</dd>
|
|
</dl>
|
|
|
|
</div>
|
|
</body>
|
|
</html>
|
|
|