haikuwebkit/Source/WTF/wtf/LikelyDenseUnsignedIntegerS...

338 lines
11 KiB
C
Raw Permalink Normal View History

Make AirAllocateRegistersByGraphColoring use less memory https://bugs.webkit.org/show_bug.cgi?id=225848 Reviewed by Filip Pizlo. Source/JavaScriptCore: We've had some jetsam problems caused by the main Air register allocator, which caused us to lower Options::maximumTmpsForGraphColoring. Hence this patch tries to improve the memory usage of the allocator. It includes several changes: - Change the datastructure used for representing the interference graph. Before it was effectively a HashSet<std::pair<uint16_t, uint16_t>. Now, it is either a Bitvector (for n < 400 for now, can be tweaked easily), or a Vector<LikelyDenseUnsignedIntegerSet<uint16_t>> otherwise. LikelyDenseUnsignedIntegerSet is a new datastructure introduced by this patch, it is either a HashSet if very sparse, or a BitVector + an amount to shift it by. This is by far the largest memory reduction in this patch, it reduces the maximum memory used for an interference graph in tsf-wasm in JetStream2 from 16MB to 700kB, and in mruby-wasm.aotoki.dev from 262MB to 20MB (the later only happen when we increase Options::maximumTmpsForGraphColoring.. this is the exact function which caused us to lower it). Its effect on smaller functions in JetStream2 is rarely as dramatic but always an improvement, and improvements between 2x and 5x are extremely common (10x to 30x are significantly rarer but do occur). - In order to easily test this change and any further change to this datastructure, the old approach was preserved as InterferenceHashSet, and a template to run two such datastructures in parallel, checking their equivalence was added: InstrumentedInterferenceGraph. Running with it and reportInterferenceGraphMemoryUse set to true was used to compute the numbers given above. - There was already some template parameter to change the size of the tmp indices from unsigned to uint16_t but the code failed to compile unless it was unsigned. I fixed this, made more consistent use of it, and switched to uint16_t in the very common case that we have less than 65k Tmps (we can have more despite the option because of spilling). This halved the memory usage of various other datastructures in the register allocator - unspillableTmps was a HashSet<unsigned>. Since it is often quite dense (often around 20% on JetStream2), I replaced it by a Bitvector instead - m_biases was a HashMap<IndexType, HashSet<IndexType>>. Since it is extremely rare that the sets have more than 8 elements (from looking at some instrumented run of JetStream2), I replaced it by HashMap<IndexType, SmallSet<IndexType>>. This not only significantly reduces memory, but nearly halves the time spent in assignColors(around 80ms -> 40ms in JetStream 2) - UseCounts was needlessly general: it is only used by the register allocator (all other references to UseCounts refer to the completely different B3::UseCounts), so there is no point in it computing, and then storing lots of irrelevant data. A float is also more than enough precision (especially since it is pretty much always 1, 10, 100, or 1000 in practice…). Also, since we only need it indexed by Tmps, we can use a Vector with AbsoluteTmpMapper instead of its HashMap. These changes are not just memory savings, they also make selectSpill way faster (570ms -> 250ms on my machine on JetStream2) - While I was at it, I did a couple of other tweaks to the logic of selectSpill. In particular, instead of having to check for isFastTmp every time, I just put the fast tmps directly in unspillableTmps, which prevents them from getting added to m_spillWorklist in the first place. This + a bit of clean-up (for example putting an early exit instead of setting score to infinity in the case of dead tmps) resulted in a further perf win (to roughly 200ms spent in selectSpill() on JetStream2) All together, this patch reduces the time spent in the register allocator by roughly 15 to 20% in JetStream2 (tested both with the Briggs and the IRC allocators on my MBP 2019). I do not yet have precise performance numbers for this exact patch, but benchmarking a previous version of it (with a less optimized interference graph) resulted in significant RAMification improvements (around 1%), and more surprisingly some JetStream2 improvements on weaker machines (e.g. an iPhone 7 gained > 1%). I believe these gains come either from less trashing of the caches, or less contention caused by the memory traffic. I will try to update the bugzilla with more up-to-date thorough results when I get them. This patch does not increase Options::maximumTmpsForGraphColoring, I intend to do that in a separate patch to make it easier to revert in case of a problem. * b3/B3ReduceLoopStrength.cpp: (JSC::B3::ReduceLoopStrength::reduceByteCopyLoopsToMemcpy): * b3/air/AirAllocateRegistersAndStackByLinearScan.cpp: * b3/air/AirAllocateRegistersByGraphColoring.cpp: (JSC::B3::Air::allocateRegistersByGraphColoring): * b3/air/AirCode.h: (JSC::B3::Air::Code::forEachFastTmp const): * b3/air/AirUseCounts.h: (JSC::B3::Air::UseCounts::UseCounts): (JSC::B3::Air::UseCounts::isConstDef const): (JSC::B3::Air::UseCounts::numWarmUsesAndDefs const): (JSC::B3::Air::UseCounts::dump const): * parser/Nodes.h: Source/WTF: Two changes: the addition of LikelyDenseUnsignedIntegerSet, and various improvements to Small(Ptr)Set. The latter include: - Renaming SmallPtrSet into SmallSet, as it now supports integers as well as pointers. - Reducing its size by sharing the same storage for m_buffer and for m_smallStorage. This is safe to do, because all operations branch on isSmall() which depends purely on m_capacity. - Adding trivial size(), isEmpty() and memoryUse() methods - Adding a comment at the top of the file explaining when to use, and (more importantly) not to use SmallSet. LikelyDenseUnsignedIntegerSet is an even more specialized data structure, that can represent sets of unsigned integers very compactly if they are clustered. Finally I added an outOfLineMemoryUse() method to BitVector, making it more convenient to compare the memory consumption of different data structures in the register allocator. * WTF.xcodeproj/project.pbxproj: * wtf/BitVector.h: * wtf/CMakeLists.txt: * wtf/LikelyDenseUnsignedIntegerSet.cpp: Copied from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/LikelyDenseUnsignedIntegerSet.h: Added. (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::~LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::contains const): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::size const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::iterator): (WTF::LikelyDenseUnsignedIntegerSet::iterator::m_shift): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator++): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator* const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator== const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator!= const): (WTF::LikelyDenseUnsignedIntegerSet::begin const): (WTF::LikelyDenseUnsignedIntegerSet::end const): (WTF::LikelyDenseUnsignedIntegerSet::memoryUse const): (WTF::LikelyDenseUnsignedIntegerSet::validate const): (WTF::LikelyDenseUnsignedIntegerSet::isBitVector const): (WTF::LikelyDenseUnsignedIntegerSet::isValidValue const): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): * wtf/SmallPtrSet.h: Removed. * wtf/SmallSet.cpp: Renamed from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/SmallSet.h: Added. Tools: Simply added some tests for SmallSet. * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/SmallSet.cpp: Added. (TestWebKitAPI::testSmallSetOfUnsigned): (TestWebKitAPI::testSmallSetOfPointers): (TestWebKitAPI::testVectorsOfSmallSetsOfUnsigned): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/237893@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277714 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-19 03:24:14 +00:00
/*
* Copyright (C) 2021 Apple Inc. All Rights Reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
* OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#pragma once
#include <wtf/Assertions.h>
#include <wtf/BitVector.h>
#include <wtf/FastMalloc.h>
#include <wtf/HashFunctions.h>
#include <wtf/HashSet.h>
#include <wtf/Noncopyable.h>
#include <wtf/Variant.h>
namespace WTF {
DECLARE_ALLOCATOR_WITH_HEAP_IDENTIFIER(LikelyDenseUnsignedIntegerSet);
// This is effectively a Variant<HashSet, Pair<BitVector, IndexType>>
// If it is in BitVector mode, it keeps track of the minimum value in the set, and has the bitVector shifted by the same amount.
// So for example {4000, 4002, 4003} would be represented as the bitVector 1101 with a m_min of 4000.
// It shifts between the two modes whenever that would at least halve its memory usage. So it will never use more than twice the optimal amount of memory, and yet should not ping-pong between the two modes too often.
template<typename IndexType>
class LikelyDenseUnsignedIntegerSet {
WTF_MAKE_FAST_ALLOCATED;
WTF_MAKE_NONCOPYABLE(LikelyDenseUnsignedIntegerSet);
static_assert(std::is_unsigned<IndexType>::value);
using Set = HashSet<IndexType, WTF::IntHash<IndexType>, WTF::UnsignedWithZeroKeyHashTraits<IndexType> >;
public:
LikelyDenseUnsignedIntegerSet()
: m_size(0)
, m_min(0)
, m_max(0)
{
AirAllocateStackByGraphColoring should use the optimized interference graphs from AirAllocateRegistersByGraphColoring https://bugs.webkit.org/show_bug.cgi?id=226258 Reviewed by Phil Pizlo. Source/JavaScriptCore: The main change in this patch is that AirAllocateStackByGraphColoring is now using the optimized datastructures in wtf/InterferenceGraph.h. This required templating most of it over the interference graph used (Small/Large/Huge), but I tried keeping some common parts out of the templated class to minimize the impact on compile times and binary size. A consequence of that change is that coalescableMoves and remappedStackSlots now store indices instead of direct pointers to StackSlots, resulting in a memory reduction of about 3x as well. Another consequence is that I had to slightly alter the way that coalescing works: instead of carefully removing the interference edges of the killed slot, we simply use mayClear() which is not guaranteed to remove anything. I believe that this is sound, because every subsequent access to m_interference checks whether a slot has been coalesced first, so dropping these edges is purely a memory saving, but has no logical effect. The new code was tested in a few ways: - running on JetStream2 with asan - running on JetStream2 with TEST_OPTIMIZED_INTERFERENCE_GRAPH - running on JetStream2 and logging the frame sizes at the end of this phase, and comparing to the results of doing the same on ToT (same average frame size) The two functions where this code had the largest memory footprint in JetStream2 were both in tsf-wasm. One has 751 stack slots, and had an interference graph of 2.1MB and a coalescableMoves vector of 440kB The other has 673 stack slots, and had an interference graph of 1.9MB and a coalescableMoves vector of 421kB. With this patch, they respectively use 79kB+146kB and 67kB+140kB The effect on the rest of JetStream2 is less likely to matter as few functions used more than a few dozens of kB in this phase, but in percentages are just as huge. More importantly (and the reason I wrote this patch in the first place), I checked mruby-wasm.aotoki.dev which with a few other pages forced us to lower Options::maximumTmpsForGraphColoring because of jetsams. It has two massive functions that reach this phase if I increase Options::maximumTmpsForGraphColoring: - about 6k stack slots -> 215MB + 6MB (interference graph + coalescableMoves) - about 9k stack slots -> 395MB + 4MB After this patch, they respectively use 4.5MB+2MB and 9MB+1.5MB, or roughly a 40x improvement. Combined with the recent improvements to the register allocator, I hope to be able to increase Options::maximumTmpsForGraphColoring soon (in a different patch for easier bisection if either cause a perf regression). This would be helpful, since its lowering cratered our performance on some other wasm application by 8x. In terms of compile times, this patch lowered the time spent in AllocateStackByGraphColoring over the course of a run of JetStream2 from roughly 350ms to roughly 270ms. This is almost certainly negligible, but at least it guarantees that it did not regress. * b3/air/AirAllocateRegistersByGraphColoring.cpp: * b3/air/AirAllocateStackByGraphColoring.cpp: (JSC::B3::Air::allocateStackByGraphColoring): Source/WTF: I moved the interference graphs datastructures from AirAllocateRegistersByGraphColoring to their own wtf/InterferenceGraph.h file. There are three of them: - SmallInterferenceGraph, best for n < 400 - LargeInterferenceGraph, for n < 2**16 - HugeInterferenceGraph, for n up to 2**32 I also added "Iterable" versions of them, that have an operator[] method whose result you can iterate on to get all the indices which interfere with a given index. SmallIterableInterferenceGraph is the same as the non-iterable version, but the Large and Huge versions are a bit slower than their counterparts and use 2x memory. All of these were tested by running JetStream2 with the TEST_OPTIMIZED_INTERFERENCE_GRAPH set to 1. This flag makes the optimized datastructures run in parallel with a reference implementation, and their results are checked for equality on every method call. There is one small difference allowed: iteration is not guaranteed to go through elements in the same order. I also added a clear() method to LikelyDenseUnsignedIntegerSet, and added the NotNull flag to its various uses of placement new. * WTF.xcodeproj/project.pbxproj: * wtf/CMakeLists.txt: * wtf/HashSet.h: (WTF::W>::memoryUse const): * wtf/InterferenceGraph.h: Added. (WTF::InterferenceBitVector::contains): (WTF::InterferenceBitVector::addAndReturnIsNewEntry): (WTF::InterferenceBitVector::add): (WTF::InterferenceBitVector::clear): (WTF::InterferenceBitVector::mayClear): (WTF::InterferenceBitVector::setMaxIndex): (WTF::InterferenceBitVector::forEach): (WTF::InterferenceBitVector::size const): (WTF::InterferenceBitVector::memoryUse const): (WTF::InterferenceBitVector::dumpMemoryUseInKB const): (WTF::InterferenceBitVector::Iterable::iterator::operator++): (WTF::InterferenceBitVector::Iterable::iterator::operator* const): (WTF::InterferenceBitVector::Iterable::iterator::operator== const): (WTF::InterferenceBitVector::Iterable::iterator::operator!= const): (WTF::InterferenceBitVector::Iterable::begin const): (WTF::InterferenceBitVector::Iterable::end const): (WTF::InterferenceBitVector::operator[] const): (WTF::InterferenceBitVector::index const): (WTF::InterferenceVector::contains): (WTF::InterferenceVector::addAndReturnIsNewEntry): (WTF::InterferenceVector::add): (WTF::InterferenceVector::clear): (WTF::InterferenceVector::mayClear): (WTF::InterferenceVector::setMaxIndex): (WTF::InterferenceVector::forEach): (WTF::InterferenceVector::size const): (WTF::InterferenceVector::memoryUse const): (WTF::InterferenceVector::dumpMemoryUseInKB const): (WTF::InterferenceVector::Iterable::begin const): (WTF::InterferenceVector::Iterable::end const): (WTF::InterferenceVector::operator[] const): (WTF::UndirectedEdgesDuplicatingAdapter::contains): (WTF::UndirectedEdgesDuplicatingAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDuplicatingAdapter::add): (WTF::UndirectedEdgesDuplicatingAdapter::clear): (WTF::UndirectedEdgesDuplicatingAdapter::mayClear): (WTF::UndirectedEdgesDuplicatingAdapter::setMaxIndex): (WTF::UndirectedEdgesDuplicatingAdapter::forEach): (WTF::UndirectedEdgesDuplicatingAdapter::size const): (WTF::UndirectedEdgesDuplicatingAdapter::memoryUse const): (WTF::UndirectedEdgesDuplicatingAdapter::dumpMemoryUseInKB const): (WTF::UndirectedEdgesDuplicatingAdapter::operator[] const): (WTF::UndirectedEdgesDedupAdapter::contains): (WTF::UndirectedEdgesDedupAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDedupAdapter::add): (WTF::UndirectedEdgesDedupAdapter::clear): (WTF::UndirectedEdgesDedupAdapter::mayClear): (WTF::UndirectedEdgesDedupAdapter::setMaxIndex): (WTF::UndirectedEdgesDedupAdapter::forEach): (WTF::UndirectedEdgesDedupAdapter::size const): (WTF::UndirectedEdgesDedupAdapter::memoryUse const): (WTF::UndirectedEdgesDedupAdapter::dumpMemoryUseInKB const): (WTF::InterferenceHashSet::contains): (WTF::InterferenceHashSet::addAndReturnIsNewEntry): (WTF::InterferenceHashSet::add): (WTF::InterferenceHashSet::clear): (WTF::InterferenceHashSet::setMaxIndex): (WTF::InterferenceHashSet::forEach): (WTF::InterferenceHashSet::size const): (WTF::InterferenceHashSet::memoryUse const): (WTF::InterferenceHashSet::dumpMemoryUseInKB const): (WTF::InstrumentedInterferenceGraph::contains): (WTF::InstrumentedInterferenceGraph::addAndReturnIsNewEntry): (WTF::InstrumentedInterferenceGraph::add): (WTF::InstrumentedInterferenceGraph::clear): (WTF::InstrumentedInterferenceGraph::mayClear): (WTF::InstrumentedInterferenceGraph::setMaxIndex): (WTF::InstrumentedInterferenceGraph::forEach): (WTF::InstrumentedInterferenceGraph::size const): (WTF::InstrumentedInterferenceGraph::memoryUse const): (WTF::InstrumentedInterferenceGraph::dumpMemoryUseInKB const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::Iterable): (WTF::InstrumentedIterableInterferenceGraph::Iterable::begin const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::end const): (WTF::InstrumentedIterableInterferenceGraph::operator[] const): * wtf/LikelyDenseUnsignedIntegerSet.h: (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::clear): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::estimateHashSetSize): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): Canonical link: https://commits.webkit.org/238229@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@278186 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-28 02:08:19 +00:00
new (NotNull, &m_inline.bitVector) BitVector;
Make AirAllocateRegistersByGraphColoring use less memory https://bugs.webkit.org/show_bug.cgi?id=225848 Reviewed by Filip Pizlo. Source/JavaScriptCore: We've had some jetsam problems caused by the main Air register allocator, which caused us to lower Options::maximumTmpsForGraphColoring. Hence this patch tries to improve the memory usage of the allocator. It includes several changes: - Change the datastructure used for representing the interference graph. Before it was effectively a HashSet<std::pair<uint16_t, uint16_t>. Now, it is either a Bitvector (for n < 400 for now, can be tweaked easily), or a Vector<LikelyDenseUnsignedIntegerSet<uint16_t>> otherwise. LikelyDenseUnsignedIntegerSet is a new datastructure introduced by this patch, it is either a HashSet if very sparse, or a BitVector + an amount to shift it by. This is by far the largest memory reduction in this patch, it reduces the maximum memory used for an interference graph in tsf-wasm in JetStream2 from 16MB to 700kB, and in mruby-wasm.aotoki.dev from 262MB to 20MB (the later only happen when we increase Options::maximumTmpsForGraphColoring.. this is the exact function which caused us to lower it). Its effect on smaller functions in JetStream2 is rarely as dramatic but always an improvement, and improvements between 2x and 5x are extremely common (10x to 30x are significantly rarer but do occur). - In order to easily test this change and any further change to this datastructure, the old approach was preserved as InterferenceHashSet, and a template to run two such datastructures in parallel, checking their equivalence was added: InstrumentedInterferenceGraph. Running with it and reportInterferenceGraphMemoryUse set to true was used to compute the numbers given above. - There was already some template parameter to change the size of the tmp indices from unsigned to uint16_t but the code failed to compile unless it was unsigned. I fixed this, made more consistent use of it, and switched to uint16_t in the very common case that we have less than 65k Tmps (we can have more despite the option because of spilling). This halved the memory usage of various other datastructures in the register allocator - unspillableTmps was a HashSet<unsigned>. Since it is often quite dense (often around 20% on JetStream2), I replaced it by a Bitvector instead - m_biases was a HashMap<IndexType, HashSet<IndexType>>. Since it is extremely rare that the sets have more than 8 elements (from looking at some instrumented run of JetStream2), I replaced it by HashMap<IndexType, SmallSet<IndexType>>. This not only significantly reduces memory, but nearly halves the time spent in assignColors(around 80ms -> 40ms in JetStream 2) - UseCounts was needlessly general: it is only used by the register allocator (all other references to UseCounts refer to the completely different B3::UseCounts), so there is no point in it computing, and then storing lots of irrelevant data. A float is also more than enough precision (especially since it is pretty much always 1, 10, 100, or 1000 in practice…). Also, since we only need it indexed by Tmps, we can use a Vector with AbsoluteTmpMapper instead of its HashMap. These changes are not just memory savings, they also make selectSpill way faster (570ms -> 250ms on my machine on JetStream2) - While I was at it, I did a couple of other tweaks to the logic of selectSpill. In particular, instead of having to check for isFastTmp every time, I just put the fast tmps directly in unspillableTmps, which prevents them from getting added to m_spillWorklist in the first place. This + a bit of clean-up (for example putting an early exit instead of setting score to infinity in the case of dead tmps) resulted in a further perf win (to roughly 200ms spent in selectSpill() on JetStream2) All together, this patch reduces the time spent in the register allocator by roughly 15 to 20% in JetStream2 (tested both with the Briggs and the IRC allocators on my MBP 2019). I do not yet have precise performance numbers for this exact patch, but benchmarking a previous version of it (with a less optimized interference graph) resulted in significant RAMification improvements (around 1%), and more surprisingly some JetStream2 improvements on weaker machines (e.g. an iPhone 7 gained > 1%). I believe these gains come either from less trashing of the caches, or less contention caused by the memory traffic. I will try to update the bugzilla with more up-to-date thorough results when I get them. This patch does not increase Options::maximumTmpsForGraphColoring, I intend to do that in a separate patch to make it easier to revert in case of a problem. * b3/B3ReduceLoopStrength.cpp: (JSC::B3::ReduceLoopStrength::reduceByteCopyLoopsToMemcpy): * b3/air/AirAllocateRegistersAndStackByLinearScan.cpp: * b3/air/AirAllocateRegistersByGraphColoring.cpp: (JSC::B3::Air::allocateRegistersByGraphColoring): * b3/air/AirCode.h: (JSC::B3::Air::Code::forEachFastTmp const): * b3/air/AirUseCounts.h: (JSC::B3::Air::UseCounts::UseCounts): (JSC::B3::Air::UseCounts::isConstDef const): (JSC::B3::Air::UseCounts::numWarmUsesAndDefs const): (JSC::B3::Air::UseCounts::dump const): * parser/Nodes.h: Source/WTF: Two changes: the addition of LikelyDenseUnsignedIntegerSet, and various improvements to Small(Ptr)Set. The latter include: - Renaming SmallPtrSet into SmallSet, as it now supports integers as well as pointers. - Reducing its size by sharing the same storage for m_buffer and for m_smallStorage. This is safe to do, because all operations branch on isSmall() which depends purely on m_capacity. - Adding trivial size(), isEmpty() and memoryUse() methods - Adding a comment at the top of the file explaining when to use, and (more importantly) not to use SmallSet. LikelyDenseUnsignedIntegerSet is an even more specialized data structure, that can represent sets of unsigned integers very compactly if they are clustered. Finally I added an outOfLineMemoryUse() method to BitVector, making it more convenient to compare the memory consumption of different data structures in the register allocator. * WTF.xcodeproj/project.pbxproj: * wtf/BitVector.h: * wtf/CMakeLists.txt: * wtf/LikelyDenseUnsignedIntegerSet.cpp: Copied from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/LikelyDenseUnsignedIntegerSet.h: Added. (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::~LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::contains const): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::size const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::iterator): (WTF::LikelyDenseUnsignedIntegerSet::iterator::m_shift): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator++): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator* const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator== const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator!= const): (WTF::LikelyDenseUnsignedIntegerSet::begin const): (WTF::LikelyDenseUnsignedIntegerSet::end const): (WTF::LikelyDenseUnsignedIntegerSet::memoryUse const): (WTF::LikelyDenseUnsignedIntegerSet::validate const): (WTF::LikelyDenseUnsignedIntegerSet::isBitVector const): (WTF::LikelyDenseUnsignedIntegerSet::isValidValue const): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): * wtf/SmallPtrSet.h: Removed. * wtf/SmallSet.cpp: Renamed from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/SmallSet.h: Added. Tools: Simply added some tests for SmallSet. * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/SmallSet.cpp: Added. (TestWebKitAPI::testSmallSetOfUnsigned): (TestWebKitAPI::testSmallSetOfPointers): (TestWebKitAPI::testVectorsOfSmallSetsOfUnsigned): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/237893@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277714 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-19 03:24:14 +00:00
}
~LikelyDenseUnsignedIntegerSet()
{
if (isBitVector())
m_inline.bitVector.~BitVector();
else
m_inline.hashSet.~Set();
}
LikelyDenseUnsignedIntegerSet(LikelyDenseUnsignedIntegerSet&& other)
: m_size(other.m_size)
, m_min(other.m_min)
, m_max(other.m_max)
{
if (isBitVector())
AirAllocateStackByGraphColoring should use the optimized interference graphs from AirAllocateRegistersByGraphColoring https://bugs.webkit.org/show_bug.cgi?id=226258 Reviewed by Phil Pizlo. Source/JavaScriptCore: The main change in this patch is that AirAllocateStackByGraphColoring is now using the optimized datastructures in wtf/InterferenceGraph.h. This required templating most of it over the interference graph used (Small/Large/Huge), but I tried keeping some common parts out of the templated class to minimize the impact on compile times and binary size. A consequence of that change is that coalescableMoves and remappedStackSlots now store indices instead of direct pointers to StackSlots, resulting in a memory reduction of about 3x as well. Another consequence is that I had to slightly alter the way that coalescing works: instead of carefully removing the interference edges of the killed slot, we simply use mayClear() which is not guaranteed to remove anything. I believe that this is sound, because every subsequent access to m_interference checks whether a slot has been coalesced first, so dropping these edges is purely a memory saving, but has no logical effect. The new code was tested in a few ways: - running on JetStream2 with asan - running on JetStream2 with TEST_OPTIMIZED_INTERFERENCE_GRAPH - running on JetStream2 and logging the frame sizes at the end of this phase, and comparing to the results of doing the same on ToT (same average frame size) The two functions where this code had the largest memory footprint in JetStream2 were both in tsf-wasm. One has 751 stack slots, and had an interference graph of 2.1MB and a coalescableMoves vector of 440kB The other has 673 stack slots, and had an interference graph of 1.9MB and a coalescableMoves vector of 421kB. With this patch, they respectively use 79kB+146kB and 67kB+140kB The effect on the rest of JetStream2 is less likely to matter as few functions used more than a few dozens of kB in this phase, but in percentages are just as huge. More importantly (and the reason I wrote this patch in the first place), I checked mruby-wasm.aotoki.dev which with a few other pages forced us to lower Options::maximumTmpsForGraphColoring because of jetsams. It has two massive functions that reach this phase if I increase Options::maximumTmpsForGraphColoring: - about 6k stack slots -> 215MB + 6MB (interference graph + coalescableMoves) - about 9k stack slots -> 395MB + 4MB After this patch, they respectively use 4.5MB+2MB and 9MB+1.5MB, or roughly a 40x improvement. Combined with the recent improvements to the register allocator, I hope to be able to increase Options::maximumTmpsForGraphColoring soon (in a different patch for easier bisection if either cause a perf regression). This would be helpful, since its lowering cratered our performance on some other wasm application by 8x. In terms of compile times, this patch lowered the time spent in AllocateStackByGraphColoring over the course of a run of JetStream2 from roughly 350ms to roughly 270ms. This is almost certainly negligible, but at least it guarantees that it did not regress. * b3/air/AirAllocateRegistersByGraphColoring.cpp: * b3/air/AirAllocateStackByGraphColoring.cpp: (JSC::B3::Air::allocateStackByGraphColoring): Source/WTF: I moved the interference graphs datastructures from AirAllocateRegistersByGraphColoring to their own wtf/InterferenceGraph.h file. There are three of them: - SmallInterferenceGraph, best for n < 400 - LargeInterferenceGraph, for n < 2**16 - HugeInterferenceGraph, for n up to 2**32 I also added "Iterable" versions of them, that have an operator[] method whose result you can iterate on to get all the indices which interfere with a given index. SmallIterableInterferenceGraph is the same as the non-iterable version, but the Large and Huge versions are a bit slower than their counterparts and use 2x memory. All of these were tested by running JetStream2 with the TEST_OPTIMIZED_INTERFERENCE_GRAPH set to 1. This flag makes the optimized datastructures run in parallel with a reference implementation, and their results are checked for equality on every method call. There is one small difference allowed: iteration is not guaranteed to go through elements in the same order. I also added a clear() method to LikelyDenseUnsignedIntegerSet, and added the NotNull flag to its various uses of placement new. * WTF.xcodeproj/project.pbxproj: * wtf/CMakeLists.txt: * wtf/HashSet.h: (WTF::W>::memoryUse const): * wtf/InterferenceGraph.h: Added. (WTF::InterferenceBitVector::contains): (WTF::InterferenceBitVector::addAndReturnIsNewEntry): (WTF::InterferenceBitVector::add): (WTF::InterferenceBitVector::clear): (WTF::InterferenceBitVector::mayClear): (WTF::InterferenceBitVector::setMaxIndex): (WTF::InterferenceBitVector::forEach): (WTF::InterferenceBitVector::size const): (WTF::InterferenceBitVector::memoryUse const): (WTF::InterferenceBitVector::dumpMemoryUseInKB const): (WTF::InterferenceBitVector::Iterable::iterator::operator++): (WTF::InterferenceBitVector::Iterable::iterator::operator* const): (WTF::InterferenceBitVector::Iterable::iterator::operator== const): (WTF::InterferenceBitVector::Iterable::iterator::operator!= const): (WTF::InterferenceBitVector::Iterable::begin const): (WTF::InterferenceBitVector::Iterable::end const): (WTF::InterferenceBitVector::operator[] const): (WTF::InterferenceBitVector::index const): (WTF::InterferenceVector::contains): (WTF::InterferenceVector::addAndReturnIsNewEntry): (WTF::InterferenceVector::add): (WTF::InterferenceVector::clear): (WTF::InterferenceVector::mayClear): (WTF::InterferenceVector::setMaxIndex): (WTF::InterferenceVector::forEach): (WTF::InterferenceVector::size const): (WTF::InterferenceVector::memoryUse const): (WTF::InterferenceVector::dumpMemoryUseInKB const): (WTF::InterferenceVector::Iterable::begin const): (WTF::InterferenceVector::Iterable::end const): (WTF::InterferenceVector::operator[] const): (WTF::UndirectedEdgesDuplicatingAdapter::contains): (WTF::UndirectedEdgesDuplicatingAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDuplicatingAdapter::add): (WTF::UndirectedEdgesDuplicatingAdapter::clear): (WTF::UndirectedEdgesDuplicatingAdapter::mayClear): (WTF::UndirectedEdgesDuplicatingAdapter::setMaxIndex): (WTF::UndirectedEdgesDuplicatingAdapter::forEach): (WTF::UndirectedEdgesDuplicatingAdapter::size const): (WTF::UndirectedEdgesDuplicatingAdapter::memoryUse const): (WTF::UndirectedEdgesDuplicatingAdapter::dumpMemoryUseInKB const): (WTF::UndirectedEdgesDuplicatingAdapter::operator[] const): (WTF::UndirectedEdgesDedupAdapter::contains): (WTF::UndirectedEdgesDedupAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDedupAdapter::add): (WTF::UndirectedEdgesDedupAdapter::clear): (WTF::UndirectedEdgesDedupAdapter::mayClear): (WTF::UndirectedEdgesDedupAdapter::setMaxIndex): (WTF::UndirectedEdgesDedupAdapter::forEach): (WTF::UndirectedEdgesDedupAdapter::size const): (WTF::UndirectedEdgesDedupAdapter::memoryUse const): (WTF::UndirectedEdgesDedupAdapter::dumpMemoryUseInKB const): (WTF::InterferenceHashSet::contains): (WTF::InterferenceHashSet::addAndReturnIsNewEntry): (WTF::InterferenceHashSet::add): (WTF::InterferenceHashSet::clear): (WTF::InterferenceHashSet::setMaxIndex): (WTF::InterferenceHashSet::forEach): (WTF::InterferenceHashSet::size const): (WTF::InterferenceHashSet::memoryUse const): (WTF::InterferenceHashSet::dumpMemoryUseInKB const): (WTF::InstrumentedInterferenceGraph::contains): (WTF::InstrumentedInterferenceGraph::addAndReturnIsNewEntry): (WTF::InstrumentedInterferenceGraph::add): (WTF::InstrumentedInterferenceGraph::clear): (WTF::InstrumentedInterferenceGraph::mayClear): (WTF::InstrumentedInterferenceGraph::setMaxIndex): (WTF::InstrumentedInterferenceGraph::forEach): (WTF::InstrumentedInterferenceGraph::size const): (WTF::InstrumentedInterferenceGraph::memoryUse const): (WTF::InstrumentedInterferenceGraph::dumpMemoryUseInKB const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::Iterable): (WTF::InstrumentedIterableInterferenceGraph::Iterable::begin const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::end const): (WTF::InstrumentedIterableInterferenceGraph::operator[] const): * wtf/LikelyDenseUnsignedIntegerSet.h: (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::clear): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::estimateHashSetSize): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): Canonical link: https://commits.webkit.org/238229@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@278186 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-28 02:08:19 +00:00
new (NotNull, &m_inline.bitVector) BitVector(WTFMove(other.m_inline.bitVector));
Make AirAllocateRegistersByGraphColoring use less memory https://bugs.webkit.org/show_bug.cgi?id=225848 Reviewed by Filip Pizlo. Source/JavaScriptCore: We've had some jetsam problems caused by the main Air register allocator, which caused us to lower Options::maximumTmpsForGraphColoring. Hence this patch tries to improve the memory usage of the allocator. It includes several changes: - Change the datastructure used for representing the interference graph. Before it was effectively a HashSet<std::pair<uint16_t, uint16_t>. Now, it is either a Bitvector (for n < 400 for now, can be tweaked easily), or a Vector<LikelyDenseUnsignedIntegerSet<uint16_t>> otherwise. LikelyDenseUnsignedIntegerSet is a new datastructure introduced by this patch, it is either a HashSet if very sparse, or a BitVector + an amount to shift it by. This is by far the largest memory reduction in this patch, it reduces the maximum memory used for an interference graph in tsf-wasm in JetStream2 from 16MB to 700kB, and in mruby-wasm.aotoki.dev from 262MB to 20MB (the later only happen when we increase Options::maximumTmpsForGraphColoring.. this is the exact function which caused us to lower it). Its effect on smaller functions in JetStream2 is rarely as dramatic but always an improvement, and improvements between 2x and 5x are extremely common (10x to 30x are significantly rarer but do occur). - In order to easily test this change and any further change to this datastructure, the old approach was preserved as InterferenceHashSet, and a template to run two such datastructures in parallel, checking their equivalence was added: InstrumentedInterferenceGraph. Running with it and reportInterferenceGraphMemoryUse set to true was used to compute the numbers given above. - There was already some template parameter to change the size of the tmp indices from unsigned to uint16_t but the code failed to compile unless it was unsigned. I fixed this, made more consistent use of it, and switched to uint16_t in the very common case that we have less than 65k Tmps (we can have more despite the option because of spilling). This halved the memory usage of various other datastructures in the register allocator - unspillableTmps was a HashSet<unsigned>. Since it is often quite dense (often around 20% on JetStream2), I replaced it by a Bitvector instead - m_biases was a HashMap<IndexType, HashSet<IndexType>>. Since it is extremely rare that the sets have more than 8 elements (from looking at some instrumented run of JetStream2), I replaced it by HashMap<IndexType, SmallSet<IndexType>>. This not only significantly reduces memory, but nearly halves the time spent in assignColors(around 80ms -> 40ms in JetStream 2) - UseCounts was needlessly general: it is only used by the register allocator (all other references to UseCounts refer to the completely different B3::UseCounts), so there is no point in it computing, and then storing lots of irrelevant data. A float is also more than enough precision (especially since it is pretty much always 1, 10, 100, or 1000 in practice…). Also, since we only need it indexed by Tmps, we can use a Vector with AbsoluteTmpMapper instead of its HashMap. These changes are not just memory savings, they also make selectSpill way faster (570ms -> 250ms on my machine on JetStream2) - While I was at it, I did a couple of other tweaks to the logic of selectSpill. In particular, instead of having to check for isFastTmp every time, I just put the fast tmps directly in unspillableTmps, which prevents them from getting added to m_spillWorklist in the first place. This + a bit of clean-up (for example putting an early exit instead of setting score to infinity in the case of dead tmps) resulted in a further perf win (to roughly 200ms spent in selectSpill() on JetStream2) All together, this patch reduces the time spent in the register allocator by roughly 15 to 20% in JetStream2 (tested both with the Briggs and the IRC allocators on my MBP 2019). I do not yet have precise performance numbers for this exact patch, but benchmarking a previous version of it (with a less optimized interference graph) resulted in significant RAMification improvements (around 1%), and more surprisingly some JetStream2 improvements on weaker machines (e.g. an iPhone 7 gained > 1%). I believe these gains come either from less trashing of the caches, or less contention caused by the memory traffic. I will try to update the bugzilla with more up-to-date thorough results when I get them. This patch does not increase Options::maximumTmpsForGraphColoring, I intend to do that in a separate patch to make it easier to revert in case of a problem. * b3/B3ReduceLoopStrength.cpp: (JSC::B3::ReduceLoopStrength::reduceByteCopyLoopsToMemcpy): * b3/air/AirAllocateRegistersAndStackByLinearScan.cpp: * b3/air/AirAllocateRegistersByGraphColoring.cpp: (JSC::B3::Air::allocateRegistersByGraphColoring): * b3/air/AirCode.h: (JSC::B3::Air::Code::forEachFastTmp const): * b3/air/AirUseCounts.h: (JSC::B3::Air::UseCounts::UseCounts): (JSC::B3::Air::UseCounts::isConstDef const): (JSC::B3::Air::UseCounts::numWarmUsesAndDefs const): (JSC::B3::Air::UseCounts::dump const): * parser/Nodes.h: Source/WTF: Two changes: the addition of LikelyDenseUnsignedIntegerSet, and various improvements to Small(Ptr)Set. The latter include: - Renaming SmallPtrSet into SmallSet, as it now supports integers as well as pointers. - Reducing its size by sharing the same storage for m_buffer and for m_smallStorage. This is safe to do, because all operations branch on isSmall() which depends purely on m_capacity. - Adding trivial size(), isEmpty() and memoryUse() methods - Adding a comment at the top of the file explaining when to use, and (more importantly) not to use SmallSet. LikelyDenseUnsignedIntegerSet is an even more specialized data structure, that can represent sets of unsigned integers very compactly if they are clustered. Finally I added an outOfLineMemoryUse() method to BitVector, making it more convenient to compare the memory consumption of different data structures in the register allocator. * WTF.xcodeproj/project.pbxproj: * wtf/BitVector.h: * wtf/CMakeLists.txt: * wtf/LikelyDenseUnsignedIntegerSet.cpp: Copied from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/LikelyDenseUnsignedIntegerSet.h: Added. (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::~LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::contains const): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::size const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::iterator): (WTF::LikelyDenseUnsignedIntegerSet::iterator::m_shift): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator++): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator* const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator== const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator!= const): (WTF::LikelyDenseUnsignedIntegerSet::begin const): (WTF::LikelyDenseUnsignedIntegerSet::end const): (WTF::LikelyDenseUnsignedIntegerSet::memoryUse const): (WTF::LikelyDenseUnsignedIntegerSet::validate const): (WTF::LikelyDenseUnsignedIntegerSet::isBitVector const): (WTF::LikelyDenseUnsignedIntegerSet::isValidValue const): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): * wtf/SmallPtrSet.h: Removed. * wtf/SmallSet.cpp: Renamed from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/SmallSet.h: Added. Tools: Simply added some tests for SmallSet. * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/SmallSet.cpp: Added. (TestWebKitAPI::testSmallSetOfUnsigned): (TestWebKitAPI::testSmallSetOfPointers): (TestWebKitAPI::testVectorsOfSmallSetsOfUnsigned): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/237893@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277714 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-19 03:24:14 +00:00
else
AirAllocateStackByGraphColoring should use the optimized interference graphs from AirAllocateRegistersByGraphColoring https://bugs.webkit.org/show_bug.cgi?id=226258 Reviewed by Phil Pizlo. Source/JavaScriptCore: The main change in this patch is that AirAllocateStackByGraphColoring is now using the optimized datastructures in wtf/InterferenceGraph.h. This required templating most of it over the interference graph used (Small/Large/Huge), but I tried keeping some common parts out of the templated class to minimize the impact on compile times and binary size. A consequence of that change is that coalescableMoves and remappedStackSlots now store indices instead of direct pointers to StackSlots, resulting in a memory reduction of about 3x as well. Another consequence is that I had to slightly alter the way that coalescing works: instead of carefully removing the interference edges of the killed slot, we simply use mayClear() which is not guaranteed to remove anything. I believe that this is sound, because every subsequent access to m_interference checks whether a slot has been coalesced first, so dropping these edges is purely a memory saving, but has no logical effect. The new code was tested in a few ways: - running on JetStream2 with asan - running on JetStream2 with TEST_OPTIMIZED_INTERFERENCE_GRAPH - running on JetStream2 and logging the frame sizes at the end of this phase, and comparing to the results of doing the same on ToT (same average frame size) The two functions where this code had the largest memory footprint in JetStream2 were both in tsf-wasm. One has 751 stack slots, and had an interference graph of 2.1MB and a coalescableMoves vector of 440kB The other has 673 stack slots, and had an interference graph of 1.9MB and a coalescableMoves vector of 421kB. With this patch, they respectively use 79kB+146kB and 67kB+140kB The effect on the rest of JetStream2 is less likely to matter as few functions used more than a few dozens of kB in this phase, but in percentages are just as huge. More importantly (and the reason I wrote this patch in the first place), I checked mruby-wasm.aotoki.dev which with a few other pages forced us to lower Options::maximumTmpsForGraphColoring because of jetsams. It has two massive functions that reach this phase if I increase Options::maximumTmpsForGraphColoring: - about 6k stack slots -> 215MB + 6MB (interference graph + coalescableMoves) - about 9k stack slots -> 395MB + 4MB After this patch, they respectively use 4.5MB+2MB and 9MB+1.5MB, or roughly a 40x improvement. Combined with the recent improvements to the register allocator, I hope to be able to increase Options::maximumTmpsForGraphColoring soon (in a different patch for easier bisection if either cause a perf regression). This would be helpful, since its lowering cratered our performance on some other wasm application by 8x. In terms of compile times, this patch lowered the time spent in AllocateStackByGraphColoring over the course of a run of JetStream2 from roughly 350ms to roughly 270ms. This is almost certainly negligible, but at least it guarantees that it did not regress. * b3/air/AirAllocateRegistersByGraphColoring.cpp: * b3/air/AirAllocateStackByGraphColoring.cpp: (JSC::B3::Air::allocateStackByGraphColoring): Source/WTF: I moved the interference graphs datastructures from AirAllocateRegistersByGraphColoring to their own wtf/InterferenceGraph.h file. There are three of them: - SmallInterferenceGraph, best for n < 400 - LargeInterferenceGraph, for n < 2**16 - HugeInterferenceGraph, for n up to 2**32 I also added "Iterable" versions of them, that have an operator[] method whose result you can iterate on to get all the indices which interfere with a given index. SmallIterableInterferenceGraph is the same as the non-iterable version, but the Large and Huge versions are a bit slower than their counterparts and use 2x memory. All of these were tested by running JetStream2 with the TEST_OPTIMIZED_INTERFERENCE_GRAPH set to 1. This flag makes the optimized datastructures run in parallel with a reference implementation, and their results are checked for equality on every method call. There is one small difference allowed: iteration is not guaranteed to go through elements in the same order. I also added a clear() method to LikelyDenseUnsignedIntegerSet, and added the NotNull flag to its various uses of placement new. * WTF.xcodeproj/project.pbxproj: * wtf/CMakeLists.txt: * wtf/HashSet.h: (WTF::W>::memoryUse const): * wtf/InterferenceGraph.h: Added. (WTF::InterferenceBitVector::contains): (WTF::InterferenceBitVector::addAndReturnIsNewEntry): (WTF::InterferenceBitVector::add): (WTF::InterferenceBitVector::clear): (WTF::InterferenceBitVector::mayClear): (WTF::InterferenceBitVector::setMaxIndex): (WTF::InterferenceBitVector::forEach): (WTF::InterferenceBitVector::size const): (WTF::InterferenceBitVector::memoryUse const): (WTF::InterferenceBitVector::dumpMemoryUseInKB const): (WTF::InterferenceBitVector::Iterable::iterator::operator++): (WTF::InterferenceBitVector::Iterable::iterator::operator* const): (WTF::InterferenceBitVector::Iterable::iterator::operator== const): (WTF::InterferenceBitVector::Iterable::iterator::operator!= const): (WTF::InterferenceBitVector::Iterable::begin const): (WTF::InterferenceBitVector::Iterable::end const): (WTF::InterferenceBitVector::operator[] const): (WTF::InterferenceBitVector::index const): (WTF::InterferenceVector::contains): (WTF::InterferenceVector::addAndReturnIsNewEntry): (WTF::InterferenceVector::add): (WTF::InterferenceVector::clear): (WTF::InterferenceVector::mayClear): (WTF::InterferenceVector::setMaxIndex): (WTF::InterferenceVector::forEach): (WTF::InterferenceVector::size const): (WTF::InterferenceVector::memoryUse const): (WTF::InterferenceVector::dumpMemoryUseInKB const): (WTF::InterferenceVector::Iterable::begin const): (WTF::InterferenceVector::Iterable::end const): (WTF::InterferenceVector::operator[] const): (WTF::UndirectedEdgesDuplicatingAdapter::contains): (WTF::UndirectedEdgesDuplicatingAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDuplicatingAdapter::add): (WTF::UndirectedEdgesDuplicatingAdapter::clear): (WTF::UndirectedEdgesDuplicatingAdapter::mayClear): (WTF::UndirectedEdgesDuplicatingAdapter::setMaxIndex): (WTF::UndirectedEdgesDuplicatingAdapter::forEach): (WTF::UndirectedEdgesDuplicatingAdapter::size const): (WTF::UndirectedEdgesDuplicatingAdapter::memoryUse const): (WTF::UndirectedEdgesDuplicatingAdapter::dumpMemoryUseInKB const): (WTF::UndirectedEdgesDuplicatingAdapter::operator[] const): (WTF::UndirectedEdgesDedupAdapter::contains): (WTF::UndirectedEdgesDedupAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDedupAdapter::add): (WTF::UndirectedEdgesDedupAdapter::clear): (WTF::UndirectedEdgesDedupAdapter::mayClear): (WTF::UndirectedEdgesDedupAdapter::setMaxIndex): (WTF::UndirectedEdgesDedupAdapter::forEach): (WTF::UndirectedEdgesDedupAdapter::size const): (WTF::UndirectedEdgesDedupAdapter::memoryUse const): (WTF::UndirectedEdgesDedupAdapter::dumpMemoryUseInKB const): (WTF::InterferenceHashSet::contains): (WTF::InterferenceHashSet::addAndReturnIsNewEntry): (WTF::InterferenceHashSet::add): (WTF::InterferenceHashSet::clear): (WTF::InterferenceHashSet::setMaxIndex): (WTF::InterferenceHashSet::forEach): (WTF::InterferenceHashSet::size const): (WTF::InterferenceHashSet::memoryUse const): (WTF::InterferenceHashSet::dumpMemoryUseInKB const): (WTF::InstrumentedInterferenceGraph::contains): (WTF::InstrumentedInterferenceGraph::addAndReturnIsNewEntry): (WTF::InstrumentedInterferenceGraph::add): (WTF::InstrumentedInterferenceGraph::clear): (WTF::InstrumentedInterferenceGraph::mayClear): (WTF::InstrumentedInterferenceGraph::setMaxIndex): (WTF::InstrumentedInterferenceGraph::forEach): (WTF::InstrumentedInterferenceGraph::size const): (WTF::InstrumentedInterferenceGraph::memoryUse const): (WTF::InstrumentedInterferenceGraph::dumpMemoryUseInKB const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::Iterable): (WTF::InstrumentedIterableInterferenceGraph::Iterable::begin const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::end const): (WTF::InstrumentedIterableInterferenceGraph::operator[] const): * wtf/LikelyDenseUnsignedIntegerSet.h: (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::clear): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::estimateHashSetSize): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): Canonical link: https://commits.webkit.org/238229@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@278186 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-28 02:08:19 +00:00
new (NotNull, &m_inline.hashSet) Set(WTFMove(other.m_inline.hashSet));
}
void clear()
{
if (isBitVector())
m_inline.bitVector.~BitVector();
else
m_inline.hashSet.~HashSet();
new (NotNull, &m_inline.bitVector) BitVector;
AirAllocateStackByGraphColoring should use the optimized interference graphs from AirAllocateRegistersByGraphColoring https://bugs.webkit.org/show_bug.cgi?id=226258 Reviewed by Phil Pizlo. Source/JavaScriptCore: The main change in this patch is that AirAllocateStackByGraphColoring is now using the optimized datastructures in wtf/InterferenceGraph.h. This required templating most of it over the interference graph used (Small/Large/Huge), but I tried keeping some common parts out of the templated class to minimize the impact on compile times and binary size. A consequence of that change is that coalescableMoves and remappedStackSlots now store indices instead of direct pointers to StackSlots, resulting in a memory reduction of about 3x as well. Another consequence is that I had to slightly alter the way that coalescing works: instead of carefully removing the interference edges of the killed slot, we simply use mayClear() which is not guaranteed to remove anything. I believe that this is sound, because every subsequent access to m_interference checks whether a slot has been coalesced first, so dropping these edges is purely a memory saving, but has no logical effect. The new code was tested in a few ways: - running on JetStream2 with asan - running on JetStream2 with TEST_OPTIMIZED_INTERFERENCE_GRAPH - running on JetStream2 and logging the frame sizes at the end of this phase, and comparing to the results of doing the same on ToT (same average frame size) The two functions where this code had the largest memory footprint in JetStream2 were both in tsf-wasm. One has 751 stack slots, and had an interference graph of 2.1MB and a coalescableMoves vector of 440kB The other has 673 stack slots, and had an interference graph of 1.9MB and a coalescableMoves vector of 421kB. With this patch, they respectively use 79kB+146kB and 67kB+140kB The effect on the rest of JetStream2 is less likely to matter as few functions used more than a few dozens of kB in this phase, but in percentages are just as huge. More importantly (and the reason I wrote this patch in the first place), I checked mruby-wasm.aotoki.dev which with a few other pages forced us to lower Options::maximumTmpsForGraphColoring because of jetsams. It has two massive functions that reach this phase if I increase Options::maximumTmpsForGraphColoring: - about 6k stack slots -> 215MB + 6MB (interference graph + coalescableMoves) - about 9k stack slots -> 395MB + 4MB After this patch, they respectively use 4.5MB+2MB and 9MB+1.5MB, or roughly a 40x improvement. Combined with the recent improvements to the register allocator, I hope to be able to increase Options::maximumTmpsForGraphColoring soon (in a different patch for easier bisection if either cause a perf regression). This would be helpful, since its lowering cratered our performance on some other wasm application by 8x. In terms of compile times, this patch lowered the time spent in AllocateStackByGraphColoring over the course of a run of JetStream2 from roughly 350ms to roughly 270ms. This is almost certainly negligible, but at least it guarantees that it did not regress. * b3/air/AirAllocateRegistersByGraphColoring.cpp: * b3/air/AirAllocateStackByGraphColoring.cpp: (JSC::B3::Air::allocateStackByGraphColoring): Source/WTF: I moved the interference graphs datastructures from AirAllocateRegistersByGraphColoring to their own wtf/InterferenceGraph.h file. There are three of them: - SmallInterferenceGraph, best for n < 400 - LargeInterferenceGraph, for n < 2**16 - HugeInterferenceGraph, for n up to 2**32 I also added "Iterable" versions of them, that have an operator[] method whose result you can iterate on to get all the indices which interfere with a given index. SmallIterableInterferenceGraph is the same as the non-iterable version, but the Large and Huge versions are a bit slower than their counterparts and use 2x memory. All of these were tested by running JetStream2 with the TEST_OPTIMIZED_INTERFERENCE_GRAPH set to 1. This flag makes the optimized datastructures run in parallel with a reference implementation, and their results are checked for equality on every method call. There is one small difference allowed: iteration is not guaranteed to go through elements in the same order. I also added a clear() method to LikelyDenseUnsignedIntegerSet, and added the NotNull flag to its various uses of placement new. * WTF.xcodeproj/project.pbxproj: * wtf/CMakeLists.txt: * wtf/HashSet.h: (WTF::W>::memoryUse const): * wtf/InterferenceGraph.h: Added. (WTF::InterferenceBitVector::contains): (WTF::InterferenceBitVector::addAndReturnIsNewEntry): (WTF::InterferenceBitVector::add): (WTF::InterferenceBitVector::clear): (WTF::InterferenceBitVector::mayClear): (WTF::InterferenceBitVector::setMaxIndex): (WTF::InterferenceBitVector::forEach): (WTF::InterferenceBitVector::size const): (WTF::InterferenceBitVector::memoryUse const): (WTF::InterferenceBitVector::dumpMemoryUseInKB const): (WTF::InterferenceBitVector::Iterable::iterator::operator++): (WTF::InterferenceBitVector::Iterable::iterator::operator* const): (WTF::InterferenceBitVector::Iterable::iterator::operator== const): (WTF::InterferenceBitVector::Iterable::iterator::operator!= const): (WTF::InterferenceBitVector::Iterable::begin const): (WTF::InterferenceBitVector::Iterable::end const): (WTF::InterferenceBitVector::operator[] const): (WTF::InterferenceBitVector::index const): (WTF::InterferenceVector::contains): (WTF::InterferenceVector::addAndReturnIsNewEntry): (WTF::InterferenceVector::add): (WTF::InterferenceVector::clear): (WTF::InterferenceVector::mayClear): (WTF::InterferenceVector::setMaxIndex): (WTF::InterferenceVector::forEach): (WTF::InterferenceVector::size const): (WTF::InterferenceVector::memoryUse const): (WTF::InterferenceVector::dumpMemoryUseInKB const): (WTF::InterferenceVector::Iterable::begin const): (WTF::InterferenceVector::Iterable::end const): (WTF::InterferenceVector::operator[] const): (WTF::UndirectedEdgesDuplicatingAdapter::contains): (WTF::UndirectedEdgesDuplicatingAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDuplicatingAdapter::add): (WTF::UndirectedEdgesDuplicatingAdapter::clear): (WTF::UndirectedEdgesDuplicatingAdapter::mayClear): (WTF::UndirectedEdgesDuplicatingAdapter::setMaxIndex): (WTF::UndirectedEdgesDuplicatingAdapter::forEach): (WTF::UndirectedEdgesDuplicatingAdapter::size const): (WTF::UndirectedEdgesDuplicatingAdapter::memoryUse const): (WTF::UndirectedEdgesDuplicatingAdapter::dumpMemoryUseInKB const): (WTF::UndirectedEdgesDuplicatingAdapter::operator[] const): (WTF::UndirectedEdgesDedupAdapter::contains): (WTF::UndirectedEdgesDedupAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDedupAdapter::add): (WTF::UndirectedEdgesDedupAdapter::clear): (WTF::UndirectedEdgesDedupAdapter::mayClear): (WTF::UndirectedEdgesDedupAdapter::setMaxIndex): (WTF::UndirectedEdgesDedupAdapter::forEach): (WTF::UndirectedEdgesDedupAdapter::size const): (WTF::UndirectedEdgesDedupAdapter::memoryUse const): (WTF::UndirectedEdgesDedupAdapter::dumpMemoryUseInKB const): (WTF::InterferenceHashSet::contains): (WTF::InterferenceHashSet::addAndReturnIsNewEntry): (WTF::InterferenceHashSet::add): (WTF::InterferenceHashSet::clear): (WTF::InterferenceHashSet::setMaxIndex): (WTF::InterferenceHashSet::forEach): (WTF::InterferenceHashSet::size const): (WTF::InterferenceHashSet::memoryUse const): (WTF::InterferenceHashSet::dumpMemoryUseInKB const): (WTF::InstrumentedInterferenceGraph::contains): (WTF::InstrumentedInterferenceGraph::addAndReturnIsNewEntry): (WTF::InstrumentedInterferenceGraph::add): (WTF::InstrumentedInterferenceGraph::clear): (WTF::InstrumentedInterferenceGraph::mayClear): (WTF::InstrumentedInterferenceGraph::setMaxIndex): (WTF::InstrumentedInterferenceGraph::forEach): (WTF::InstrumentedInterferenceGraph::size const): (WTF::InstrumentedInterferenceGraph::memoryUse const): (WTF::InstrumentedInterferenceGraph::dumpMemoryUseInKB const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::Iterable): (WTF::InstrumentedIterableInterferenceGraph::Iterable::begin const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::end const): (WTF::InstrumentedIterableInterferenceGraph::operator[] const): * wtf/LikelyDenseUnsignedIntegerSet.h: (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::clear): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::estimateHashSetSize): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): Canonical link: https://commits.webkit.org/238229@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@278186 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-28 02:08:19 +00:00
m_size = 0;
m_min = 0;
m_max = 0;
Make AirAllocateRegistersByGraphColoring use less memory https://bugs.webkit.org/show_bug.cgi?id=225848 Reviewed by Filip Pizlo. Source/JavaScriptCore: We've had some jetsam problems caused by the main Air register allocator, which caused us to lower Options::maximumTmpsForGraphColoring. Hence this patch tries to improve the memory usage of the allocator. It includes several changes: - Change the datastructure used for representing the interference graph. Before it was effectively a HashSet<std::pair<uint16_t, uint16_t>. Now, it is either a Bitvector (for n < 400 for now, can be tweaked easily), or a Vector<LikelyDenseUnsignedIntegerSet<uint16_t>> otherwise. LikelyDenseUnsignedIntegerSet is a new datastructure introduced by this patch, it is either a HashSet if very sparse, or a BitVector + an amount to shift it by. This is by far the largest memory reduction in this patch, it reduces the maximum memory used for an interference graph in tsf-wasm in JetStream2 from 16MB to 700kB, and in mruby-wasm.aotoki.dev from 262MB to 20MB (the later only happen when we increase Options::maximumTmpsForGraphColoring.. this is the exact function which caused us to lower it). Its effect on smaller functions in JetStream2 is rarely as dramatic but always an improvement, and improvements between 2x and 5x are extremely common (10x to 30x are significantly rarer but do occur). - In order to easily test this change and any further change to this datastructure, the old approach was preserved as InterferenceHashSet, and a template to run two such datastructures in parallel, checking their equivalence was added: InstrumentedInterferenceGraph. Running with it and reportInterferenceGraphMemoryUse set to true was used to compute the numbers given above. - There was already some template parameter to change the size of the tmp indices from unsigned to uint16_t but the code failed to compile unless it was unsigned. I fixed this, made more consistent use of it, and switched to uint16_t in the very common case that we have less than 65k Tmps (we can have more despite the option because of spilling). This halved the memory usage of various other datastructures in the register allocator - unspillableTmps was a HashSet<unsigned>. Since it is often quite dense (often around 20% on JetStream2), I replaced it by a Bitvector instead - m_biases was a HashMap<IndexType, HashSet<IndexType>>. Since it is extremely rare that the sets have more than 8 elements (from looking at some instrumented run of JetStream2), I replaced it by HashMap<IndexType, SmallSet<IndexType>>. This not only significantly reduces memory, but nearly halves the time spent in assignColors(around 80ms -> 40ms in JetStream 2) - UseCounts was needlessly general: it is only used by the register allocator (all other references to UseCounts refer to the completely different B3::UseCounts), so there is no point in it computing, and then storing lots of irrelevant data. A float is also more than enough precision (especially since it is pretty much always 1, 10, 100, or 1000 in practice…). Also, since we only need it indexed by Tmps, we can use a Vector with AbsoluteTmpMapper instead of its HashMap. These changes are not just memory savings, they also make selectSpill way faster (570ms -> 250ms on my machine on JetStream2) - While I was at it, I did a couple of other tweaks to the logic of selectSpill. In particular, instead of having to check for isFastTmp every time, I just put the fast tmps directly in unspillableTmps, which prevents them from getting added to m_spillWorklist in the first place. This + a bit of clean-up (for example putting an early exit instead of setting score to infinity in the case of dead tmps) resulted in a further perf win (to roughly 200ms spent in selectSpill() on JetStream2) All together, this patch reduces the time spent in the register allocator by roughly 15 to 20% in JetStream2 (tested both with the Briggs and the IRC allocators on my MBP 2019). I do not yet have precise performance numbers for this exact patch, but benchmarking a previous version of it (with a less optimized interference graph) resulted in significant RAMification improvements (around 1%), and more surprisingly some JetStream2 improvements on weaker machines (e.g. an iPhone 7 gained > 1%). I believe these gains come either from less trashing of the caches, or less contention caused by the memory traffic. I will try to update the bugzilla with more up-to-date thorough results when I get them. This patch does not increase Options::maximumTmpsForGraphColoring, I intend to do that in a separate patch to make it easier to revert in case of a problem. * b3/B3ReduceLoopStrength.cpp: (JSC::B3::ReduceLoopStrength::reduceByteCopyLoopsToMemcpy): * b3/air/AirAllocateRegistersAndStackByLinearScan.cpp: * b3/air/AirAllocateRegistersByGraphColoring.cpp: (JSC::B3::Air::allocateRegistersByGraphColoring): * b3/air/AirCode.h: (JSC::B3::Air::Code::forEachFastTmp const): * b3/air/AirUseCounts.h: (JSC::B3::Air::UseCounts::UseCounts): (JSC::B3::Air::UseCounts::isConstDef const): (JSC::B3::Air::UseCounts::numWarmUsesAndDefs const): (JSC::B3::Air::UseCounts::dump const): * parser/Nodes.h: Source/WTF: Two changes: the addition of LikelyDenseUnsignedIntegerSet, and various improvements to Small(Ptr)Set. The latter include: - Renaming SmallPtrSet into SmallSet, as it now supports integers as well as pointers. - Reducing its size by sharing the same storage for m_buffer and for m_smallStorage. This is safe to do, because all operations branch on isSmall() which depends purely on m_capacity. - Adding trivial size(), isEmpty() and memoryUse() methods - Adding a comment at the top of the file explaining when to use, and (more importantly) not to use SmallSet. LikelyDenseUnsignedIntegerSet is an even more specialized data structure, that can represent sets of unsigned integers very compactly if they are clustered. Finally I added an outOfLineMemoryUse() method to BitVector, making it more convenient to compare the memory consumption of different data structures in the register allocator. * WTF.xcodeproj/project.pbxproj: * wtf/BitVector.h: * wtf/CMakeLists.txt: * wtf/LikelyDenseUnsignedIntegerSet.cpp: Copied from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/LikelyDenseUnsignedIntegerSet.h: Added. (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::~LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::contains const): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::size const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::iterator): (WTF::LikelyDenseUnsignedIntegerSet::iterator::m_shift): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator++): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator* const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator== const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator!= const): (WTF::LikelyDenseUnsignedIntegerSet::begin const): (WTF::LikelyDenseUnsignedIntegerSet::end const): (WTF::LikelyDenseUnsignedIntegerSet::memoryUse const): (WTF::LikelyDenseUnsignedIntegerSet::validate const): (WTF::LikelyDenseUnsignedIntegerSet::isBitVector const): (WTF::LikelyDenseUnsignedIntegerSet::isValidValue const): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): * wtf/SmallPtrSet.h: Removed. * wtf/SmallSet.cpp: Renamed from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/SmallSet.h: Added. Tools: Simply added some tests for SmallSet. * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/SmallSet.cpp: Added. (TestWebKitAPI::testSmallSetOfUnsigned): (TestWebKitAPI::testSmallSetOfPointers): (TestWebKitAPI::testVectorsOfSmallSetsOfUnsigned): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/237893@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277714 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-19 03:24:14 +00:00
}
bool contains(IndexType value) const
{
ASSERT(isValidValue(value));
if (isBitVector()) {
if (m_min > value)
return false;
return m_inline.bitVector.get(value - m_min);
}
return m_inline.hashSet.contains(value);
}
// Providing an iterator in this would be possible, but none of our clients need it.
struct AddResult {
bool isNewEntry;
};
AddResult add(IndexType value)
{
ASSERT(isValidValue(value));
if (!m_size) {
ASSERT(isBitVector());
m_min = value;
m_max = value;
m_size = 1;
m_inline.bitVector.quickSet(0);
return { true };
}
if (!isBitVector()) {
bool isNewEntry = m_inline.hashSet.add(value).isNewEntry;
if (!isNewEntry)
return { false };
m_min = std::min(m_min, value);
m_max = std::max(m_max, value);
unsigned hashSetSize = m_inline.hashSet.capacity() * sizeof(IndexType);
unsigned wouldBeBitVectorSize = (m_max - m_min) / 8;
if (wouldBeBitVectorSize * 2 < hashSetSize)
transitionToBitVector();
return { true };
}
if (value >= m_min && value <= m_max) {
bool isNewEntry = !m_inline.bitVector.quickSet(value - m_min);
m_size += isNewEntry;
return { isNewEntry };
}
// We are in BitVector mode, and value is not in the bounds: we will definitely insert it as a new entry.
++m_size;
IndexType newMin = std::min(m_min, value);
IndexType newMax = std::max(m_max, value);
unsigned bitVectorSize = (newMax - newMin) / 8;
AirAllocateStackByGraphColoring should use the optimized interference graphs from AirAllocateRegistersByGraphColoring https://bugs.webkit.org/show_bug.cgi?id=226258 Reviewed by Phil Pizlo. Source/JavaScriptCore: The main change in this patch is that AirAllocateStackByGraphColoring is now using the optimized datastructures in wtf/InterferenceGraph.h. This required templating most of it over the interference graph used (Small/Large/Huge), but I tried keeping some common parts out of the templated class to minimize the impact on compile times and binary size. A consequence of that change is that coalescableMoves and remappedStackSlots now store indices instead of direct pointers to StackSlots, resulting in a memory reduction of about 3x as well. Another consequence is that I had to slightly alter the way that coalescing works: instead of carefully removing the interference edges of the killed slot, we simply use mayClear() which is not guaranteed to remove anything. I believe that this is sound, because every subsequent access to m_interference checks whether a slot has been coalesced first, so dropping these edges is purely a memory saving, but has no logical effect. The new code was tested in a few ways: - running on JetStream2 with asan - running on JetStream2 with TEST_OPTIMIZED_INTERFERENCE_GRAPH - running on JetStream2 and logging the frame sizes at the end of this phase, and comparing to the results of doing the same on ToT (same average frame size) The two functions where this code had the largest memory footprint in JetStream2 were both in tsf-wasm. One has 751 stack slots, and had an interference graph of 2.1MB and a coalescableMoves vector of 440kB The other has 673 stack slots, and had an interference graph of 1.9MB and a coalescableMoves vector of 421kB. With this patch, they respectively use 79kB+146kB and 67kB+140kB The effect on the rest of JetStream2 is less likely to matter as few functions used more than a few dozens of kB in this phase, but in percentages are just as huge. More importantly (and the reason I wrote this patch in the first place), I checked mruby-wasm.aotoki.dev which with a few other pages forced us to lower Options::maximumTmpsForGraphColoring because of jetsams. It has two massive functions that reach this phase if I increase Options::maximumTmpsForGraphColoring: - about 6k stack slots -> 215MB + 6MB (interference graph + coalescableMoves) - about 9k stack slots -> 395MB + 4MB After this patch, they respectively use 4.5MB+2MB and 9MB+1.5MB, or roughly a 40x improvement. Combined with the recent improvements to the register allocator, I hope to be able to increase Options::maximumTmpsForGraphColoring soon (in a different patch for easier bisection if either cause a perf regression). This would be helpful, since its lowering cratered our performance on some other wasm application by 8x. In terms of compile times, this patch lowered the time spent in AllocateStackByGraphColoring over the course of a run of JetStream2 from roughly 350ms to roughly 270ms. This is almost certainly negligible, but at least it guarantees that it did not regress. * b3/air/AirAllocateRegistersByGraphColoring.cpp: * b3/air/AirAllocateStackByGraphColoring.cpp: (JSC::B3::Air::allocateStackByGraphColoring): Source/WTF: I moved the interference graphs datastructures from AirAllocateRegistersByGraphColoring to their own wtf/InterferenceGraph.h file. There are three of them: - SmallInterferenceGraph, best for n < 400 - LargeInterferenceGraph, for n < 2**16 - HugeInterferenceGraph, for n up to 2**32 I also added "Iterable" versions of them, that have an operator[] method whose result you can iterate on to get all the indices which interfere with a given index. SmallIterableInterferenceGraph is the same as the non-iterable version, but the Large and Huge versions are a bit slower than their counterparts and use 2x memory. All of these were tested by running JetStream2 with the TEST_OPTIMIZED_INTERFERENCE_GRAPH set to 1. This flag makes the optimized datastructures run in parallel with a reference implementation, and their results are checked for equality on every method call. There is one small difference allowed: iteration is not guaranteed to go through elements in the same order. I also added a clear() method to LikelyDenseUnsignedIntegerSet, and added the NotNull flag to its various uses of placement new. * WTF.xcodeproj/project.pbxproj: * wtf/CMakeLists.txt: * wtf/HashSet.h: (WTF::W>::memoryUse const): * wtf/InterferenceGraph.h: Added. (WTF::InterferenceBitVector::contains): (WTF::InterferenceBitVector::addAndReturnIsNewEntry): (WTF::InterferenceBitVector::add): (WTF::InterferenceBitVector::clear): (WTF::InterferenceBitVector::mayClear): (WTF::InterferenceBitVector::setMaxIndex): (WTF::InterferenceBitVector::forEach): (WTF::InterferenceBitVector::size const): (WTF::InterferenceBitVector::memoryUse const): (WTF::InterferenceBitVector::dumpMemoryUseInKB const): (WTF::InterferenceBitVector::Iterable::iterator::operator++): (WTF::InterferenceBitVector::Iterable::iterator::operator* const): (WTF::InterferenceBitVector::Iterable::iterator::operator== const): (WTF::InterferenceBitVector::Iterable::iterator::operator!= const): (WTF::InterferenceBitVector::Iterable::begin const): (WTF::InterferenceBitVector::Iterable::end const): (WTF::InterferenceBitVector::operator[] const): (WTF::InterferenceBitVector::index const): (WTF::InterferenceVector::contains): (WTF::InterferenceVector::addAndReturnIsNewEntry): (WTF::InterferenceVector::add): (WTF::InterferenceVector::clear): (WTF::InterferenceVector::mayClear): (WTF::InterferenceVector::setMaxIndex): (WTF::InterferenceVector::forEach): (WTF::InterferenceVector::size const): (WTF::InterferenceVector::memoryUse const): (WTF::InterferenceVector::dumpMemoryUseInKB const): (WTF::InterferenceVector::Iterable::begin const): (WTF::InterferenceVector::Iterable::end const): (WTF::InterferenceVector::operator[] const): (WTF::UndirectedEdgesDuplicatingAdapter::contains): (WTF::UndirectedEdgesDuplicatingAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDuplicatingAdapter::add): (WTF::UndirectedEdgesDuplicatingAdapter::clear): (WTF::UndirectedEdgesDuplicatingAdapter::mayClear): (WTF::UndirectedEdgesDuplicatingAdapter::setMaxIndex): (WTF::UndirectedEdgesDuplicatingAdapter::forEach): (WTF::UndirectedEdgesDuplicatingAdapter::size const): (WTF::UndirectedEdgesDuplicatingAdapter::memoryUse const): (WTF::UndirectedEdgesDuplicatingAdapter::dumpMemoryUseInKB const): (WTF::UndirectedEdgesDuplicatingAdapter::operator[] const): (WTF::UndirectedEdgesDedupAdapter::contains): (WTF::UndirectedEdgesDedupAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDedupAdapter::add): (WTF::UndirectedEdgesDedupAdapter::clear): (WTF::UndirectedEdgesDedupAdapter::mayClear): (WTF::UndirectedEdgesDedupAdapter::setMaxIndex): (WTF::UndirectedEdgesDedupAdapter::forEach): (WTF::UndirectedEdgesDedupAdapter::size const): (WTF::UndirectedEdgesDedupAdapter::memoryUse const): (WTF::UndirectedEdgesDedupAdapter::dumpMemoryUseInKB const): (WTF::InterferenceHashSet::contains): (WTF::InterferenceHashSet::addAndReturnIsNewEntry): (WTF::InterferenceHashSet::add): (WTF::InterferenceHashSet::clear): (WTF::InterferenceHashSet::setMaxIndex): (WTF::InterferenceHashSet::forEach): (WTF::InterferenceHashSet::size const): (WTF::InterferenceHashSet::memoryUse const): (WTF::InterferenceHashSet::dumpMemoryUseInKB const): (WTF::InstrumentedInterferenceGraph::contains): (WTF::InstrumentedInterferenceGraph::addAndReturnIsNewEntry): (WTF::InstrumentedInterferenceGraph::add): (WTF::InstrumentedInterferenceGraph::clear): (WTF::InstrumentedInterferenceGraph::mayClear): (WTF::InstrumentedInterferenceGraph::setMaxIndex): (WTF::InstrumentedInterferenceGraph::forEach): (WTF::InstrumentedInterferenceGraph::size const): (WTF::InstrumentedInterferenceGraph::memoryUse const): (WTF::InstrumentedInterferenceGraph::dumpMemoryUseInKB const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::Iterable): (WTF::InstrumentedIterableInterferenceGraph::Iterable::begin const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::end const): (WTF::InstrumentedIterableInterferenceGraph::operator[] const): * wtf/LikelyDenseUnsignedIntegerSet.h: (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::clear): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::estimateHashSetSize): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): Canonical link: https://commits.webkit.org/238229@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@278186 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-28 02:08:19 +00:00
unsigned wouldBeHashSetSize = estimateHashSetSize(m_size);
Make AirAllocateRegistersByGraphColoring use less memory https://bugs.webkit.org/show_bug.cgi?id=225848 Reviewed by Filip Pizlo. Source/JavaScriptCore: We've had some jetsam problems caused by the main Air register allocator, which caused us to lower Options::maximumTmpsForGraphColoring. Hence this patch tries to improve the memory usage of the allocator. It includes several changes: - Change the datastructure used for representing the interference graph. Before it was effectively a HashSet<std::pair<uint16_t, uint16_t>. Now, it is either a Bitvector (for n < 400 for now, can be tweaked easily), or a Vector<LikelyDenseUnsignedIntegerSet<uint16_t>> otherwise. LikelyDenseUnsignedIntegerSet is a new datastructure introduced by this patch, it is either a HashSet if very sparse, or a BitVector + an amount to shift it by. This is by far the largest memory reduction in this patch, it reduces the maximum memory used for an interference graph in tsf-wasm in JetStream2 from 16MB to 700kB, and in mruby-wasm.aotoki.dev from 262MB to 20MB (the later only happen when we increase Options::maximumTmpsForGraphColoring.. this is the exact function which caused us to lower it). Its effect on smaller functions in JetStream2 is rarely as dramatic but always an improvement, and improvements between 2x and 5x are extremely common (10x to 30x are significantly rarer but do occur). - In order to easily test this change and any further change to this datastructure, the old approach was preserved as InterferenceHashSet, and a template to run two such datastructures in parallel, checking their equivalence was added: InstrumentedInterferenceGraph. Running with it and reportInterferenceGraphMemoryUse set to true was used to compute the numbers given above. - There was already some template parameter to change the size of the tmp indices from unsigned to uint16_t but the code failed to compile unless it was unsigned. I fixed this, made more consistent use of it, and switched to uint16_t in the very common case that we have less than 65k Tmps (we can have more despite the option because of spilling). This halved the memory usage of various other datastructures in the register allocator - unspillableTmps was a HashSet<unsigned>. Since it is often quite dense (often around 20% on JetStream2), I replaced it by a Bitvector instead - m_biases was a HashMap<IndexType, HashSet<IndexType>>. Since it is extremely rare that the sets have more than 8 elements (from looking at some instrumented run of JetStream2), I replaced it by HashMap<IndexType, SmallSet<IndexType>>. This not only significantly reduces memory, but nearly halves the time spent in assignColors(around 80ms -> 40ms in JetStream 2) - UseCounts was needlessly general: it is only used by the register allocator (all other references to UseCounts refer to the completely different B3::UseCounts), so there is no point in it computing, and then storing lots of irrelevant data. A float is also more than enough precision (especially since it is pretty much always 1, 10, 100, or 1000 in practice…). Also, since we only need it indexed by Tmps, we can use a Vector with AbsoluteTmpMapper instead of its HashMap. These changes are not just memory savings, they also make selectSpill way faster (570ms -> 250ms on my machine on JetStream2) - While I was at it, I did a couple of other tweaks to the logic of selectSpill. In particular, instead of having to check for isFastTmp every time, I just put the fast tmps directly in unspillableTmps, which prevents them from getting added to m_spillWorklist in the first place. This + a bit of clean-up (for example putting an early exit instead of setting score to infinity in the case of dead tmps) resulted in a further perf win (to roughly 200ms spent in selectSpill() on JetStream2) All together, this patch reduces the time spent in the register allocator by roughly 15 to 20% in JetStream2 (tested both with the Briggs and the IRC allocators on my MBP 2019). I do not yet have precise performance numbers for this exact patch, but benchmarking a previous version of it (with a less optimized interference graph) resulted in significant RAMification improvements (around 1%), and more surprisingly some JetStream2 improvements on weaker machines (e.g. an iPhone 7 gained > 1%). I believe these gains come either from less trashing of the caches, or less contention caused by the memory traffic. I will try to update the bugzilla with more up-to-date thorough results when I get them. This patch does not increase Options::maximumTmpsForGraphColoring, I intend to do that in a separate patch to make it easier to revert in case of a problem. * b3/B3ReduceLoopStrength.cpp: (JSC::B3::ReduceLoopStrength::reduceByteCopyLoopsToMemcpy): * b3/air/AirAllocateRegistersAndStackByLinearScan.cpp: * b3/air/AirAllocateRegistersByGraphColoring.cpp: (JSC::B3::Air::allocateRegistersByGraphColoring): * b3/air/AirCode.h: (JSC::B3::Air::Code::forEachFastTmp const): * b3/air/AirUseCounts.h: (JSC::B3::Air::UseCounts::UseCounts): (JSC::B3::Air::UseCounts::isConstDef const): (JSC::B3::Air::UseCounts::numWarmUsesAndDefs const): (JSC::B3::Air::UseCounts::dump const): * parser/Nodes.h: Source/WTF: Two changes: the addition of LikelyDenseUnsignedIntegerSet, and various improvements to Small(Ptr)Set. The latter include: - Renaming SmallPtrSet into SmallSet, as it now supports integers as well as pointers. - Reducing its size by sharing the same storage for m_buffer and for m_smallStorage. This is safe to do, because all operations branch on isSmall() which depends purely on m_capacity. - Adding trivial size(), isEmpty() and memoryUse() methods - Adding a comment at the top of the file explaining when to use, and (more importantly) not to use SmallSet. LikelyDenseUnsignedIntegerSet is an even more specialized data structure, that can represent sets of unsigned integers very compactly if they are clustered. Finally I added an outOfLineMemoryUse() method to BitVector, making it more convenient to compare the memory consumption of different data structures in the register allocator. * WTF.xcodeproj/project.pbxproj: * wtf/BitVector.h: * wtf/CMakeLists.txt: * wtf/LikelyDenseUnsignedIntegerSet.cpp: Copied from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/LikelyDenseUnsignedIntegerSet.h: Added. (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::~LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::contains const): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::size const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::iterator): (WTF::LikelyDenseUnsignedIntegerSet::iterator::m_shift): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator++): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator* const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator== const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator!= const): (WTF::LikelyDenseUnsignedIntegerSet::begin const): (WTF::LikelyDenseUnsignedIntegerSet::end const): (WTF::LikelyDenseUnsignedIntegerSet::memoryUse const): (WTF::LikelyDenseUnsignedIntegerSet::validate const): (WTF::LikelyDenseUnsignedIntegerSet::isBitVector const): (WTF::LikelyDenseUnsignedIntegerSet::isValidValue const): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): * wtf/SmallPtrSet.h: Removed. * wtf/SmallSet.cpp: Renamed from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/SmallSet.h: Added. Tools: Simply added some tests for SmallSet. * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/SmallSet.cpp: Added. (TestWebKitAPI::testSmallSetOfUnsigned): (TestWebKitAPI::testSmallSetOfPointers): (TestWebKitAPI::testVectorsOfSmallSetsOfUnsigned): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/237893@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277714 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-19 03:24:14 +00:00
if (wouldBeHashSetSize * 2 < bitVectorSize) {
transitionToHashSet();
auto result = m_inline.hashSet.add(value);
ASSERT_UNUSED(result, result.isNewEntry);
m_min = newMin;
m_max = newMax;
return { true };
}
if (m_min <= value) {
bool isNewEntry = !m_inline.bitVector.set(value - m_min);
ASSERT_UNUSED(isNewEntry, isNewEntry);
ASSERT(newMax == value);
m_max = value;
return { true };
}
BitVector newBitVector;
newBitVector.resize(m_max - value + 1);
IndexType shiftReduction = m_min - value;
for (IndexType oldIndex : m_inline.bitVector)
newBitVector.quickSet(oldIndex + shiftReduction);
newBitVector.quickSet(0);
m_inline.bitVector = WTFMove(newBitVector);
ASSERT(newMin == value);
m_min = value;
return { true };
}
unsigned size() const
{
if (isBitVector())
return m_size;
return m_inline.hashSet.size();
}
class iterator {
WTF_MAKE_FAST_ALLOCATED;
public:
iterator(BitVector::iterator it, IndexType shift)
: m_underlying({ it })
, m_shift(shift)
{
}
iterator(typename Set::iterator it, IndexType shift)
: m_underlying({ it })
, m_shift(shift)
{
}
iterator& operator++()
{
WTF::switchOn(m_underlying,
[](BitVector::iterator& it) { ++it; },
[](typename Set::iterator& it) { ++it; });
return *this;
}
IndexType operator*() const
{
return WTF::switchOn(m_underlying,
[&](const BitVector::iterator& it) { return *it + m_shift; },
[](const typename Set::iterator& it) { return *it; });
}
bool operator==(const iterator& other) const
{
return m_underlying == other.m_underlying && m_shift == other.m_shift;
}
bool operator!=(const iterator& other) const
{
return m_underlying != other.m_underlying || m_shift != other.m_shift;
}
private:
WTF::Variant<BitVector::iterator, typename Set::iterator> m_underlying;
IndexType m_shift;
};
iterator begin() const
{
if (isBitVector())
return { m_inline.bitVector.begin(), m_min };
return { m_inline.hashSet.begin(), m_min };
}
iterator end() const
{
if (isBitVector())
return { m_inline.bitVector.end(), m_min };
return { m_inline.hashSet.end(), m_min };
}
unsigned memoryUse() const
{
unsigned result = sizeof(LikelyDenseUnsignedIntegerSet);
if (isBitVector())
result += m_inline.bitVector.outOfLineMemoryUse();
else
result += m_inline.hashSet.capacity() * sizeof(IndexType);
return result;
}
void validate() const
{
IndexType min = std::numeric_limits<IndexType>::max(), max = 0;
if (isBitVector()) {
unsigned numElements = 0;
for (IndexType shiftedIndex : m_inline.bitVector) {
++numElements;
IndexType correctedIndex = m_min + shiftedIndex;
min = std::min(min, correctedIndex);
max = std::max(max, correctedIndex);
}
RELEASE_ASSERT(m_size == numElements);
} else {
for (IndexType index : m_inline.hashSet) {
min = std::min(min, index);
max = std::max(max, index);
}
}
if (m_size) {
RELEASE_ASSERT(m_min == min);
RELEASE_ASSERT(m_max == max);
}
}
private:
bool isBitVector() const { return m_size != std::numeric_limits<unsigned>::max(); }
bool isValidValue(IndexType value) const
{
return value != UnsignedWithZeroKeyHashTraits<IndexType>::emptyValue()
&& !UnsignedWithZeroKeyHashTraits<IndexType>::isDeletedValue(value);
}
AirAllocateStackByGraphColoring should use the optimized interference graphs from AirAllocateRegistersByGraphColoring https://bugs.webkit.org/show_bug.cgi?id=226258 Reviewed by Phil Pizlo. Source/JavaScriptCore: The main change in this patch is that AirAllocateStackByGraphColoring is now using the optimized datastructures in wtf/InterferenceGraph.h. This required templating most of it over the interference graph used (Small/Large/Huge), but I tried keeping some common parts out of the templated class to minimize the impact on compile times and binary size. A consequence of that change is that coalescableMoves and remappedStackSlots now store indices instead of direct pointers to StackSlots, resulting in a memory reduction of about 3x as well. Another consequence is that I had to slightly alter the way that coalescing works: instead of carefully removing the interference edges of the killed slot, we simply use mayClear() which is not guaranteed to remove anything. I believe that this is sound, because every subsequent access to m_interference checks whether a slot has been coalesced first, so dropping these edges is purely a memory saving, but has no logical effect. The new code was tested in a few ways: - running on JetStream2 with asan - running on JetStream2 with TEST_OPTIMIZED_INTERFERENCE_GRAPH - running on JetStream2 and logging the frame sizes at the end of this phase, and comparing to the results of doing the same on ToT (same average frame size) The two functions where this code had the largest memory footprint in JetStream2 were both in tsf-wasm. One has 751 stack slots, and had an interference graph of 2.1MB and a coalescableMoves vector of 440kB The other has 673 stack slots, and had an interference graph of 1.9MB and a coalescableMoves vector of 421kB. With this patch, they respectively use 79kB+146kB and 67kB+140kB The effect on the rest of JetStream2 is less likely to matter as few functions used more than a few dozens of kB in this phase, but in percentages are just as huge. More importantly (and the reason I wrote this patch in the first place), I checked mruby-wasm.aotoki.dev which with a few other pages forced us to lower Options::maximumTmpsForGraphColoring because of jetsams. It has two massive functions that reach this phase if I increase Options::maximumTmpsForGraphColoring: - about 6k stack slots -> 215MB + 6MB (interference graph + coalescableMoves) - about 9k stack slots -> 395MB + 4MB After this patch, they respectively use 4.5MB+2MB and 9MB+1.5MB, or roughly a 40x improvement. Combined with the recent improvements to the register allocator, I hope to be able to increase Options::maximumTmpsForGraphColoring soon (in a different patch for easier bisection if either cause a perf regression). This would be helpful, since its lowering cratered our performance on some other wasm application by 8x. In terms of compile times, this patch lowered the time spent in AllocateStackByGraphColoring over the course of a run of JetStream2 from roughly 350ms to roughly 270ms. This is almost certainly negligible, but at least it guarantees that it did not regress. * b3/air/AirAllocateRegistersByGraphColoring.cpp: * b3/air/AirAllocateStackByGraphColoring.cpp: (JSC::B3::Air::allocateStackByGraphColoring): Source/WTF: I moved the interference graphs datastructures from AirAllocateRegistersByGraphColoring to their own wtf/InterferenceGraph.h file. There are three of them: - SmallInterferenceGraph, best for n < 400 - LargeInterferenceGraph, for n < 2**16 - HugeInterferenceGraph, for n up to 2**32 I also added "Iterable" versions of them, that have an operator[] method whose result you can iterate on to get all the indices which interfere with a given index. SmallIterableInterferenceGraph is the same as the non-iterable version, but the Large and Huge versions are a bit slower than their counterparts and use 2x memory. All of these were tested by running JetStream2 with the TEST_OPTIMIZED_INTERFERENCE_GRAPH set to 1. This flag makes the optimized datastructures run in parallel with a reference implementation, and their results are checked for equality on every method call. There is one small difference allowed: iteration is not guaranteed to go through elements in the same order. I also added a clear() method to LikelyDenseUnsignedIntegerSet, and added the NotNull flag to its various uses of placement new. * WTF.xcodeproj/project.pbxproj: * wtf/CMakeLists.txt: * wtf/HashSet.h: (WTF::W>::memoryUse const): * wtf/InterferenceGraph.h: Added. (WTF::InterferenceBitVector::contains): (WTF::InterferenceBitVector::addAndReturnIsNewEntry): (WTF::InterferenceBitVector::add): (WTF::InterferenceBitVector::clear): (WTF::InterferenceBitVector::mayClear): (WTF::InterferenceBitVector::setMaxIndex): (WTF::InterferenceBitVector::forEach): (WTF::InterferenceBitVector::size const): (WTF::InterferenceBitVector::memoryUse const): (WTF::InterferenceBitVector::dumpMemoryUseInKB const): (WTF::InterferenceBitVector::Iterable::iterator::operator++): (WTF::InterferenceBitVector::Iterable::iterator::operator* const): (WTF::InterferenceBitVector::Iterable::iterator::operator== const): (WTF::InterferenceBitVector::Iterable::iterator::operator!= const): (WTF::InterferenceBitVector::Iterable::begin const): (WTF::InterferenceBitVector::Iterable::end const): (WTF::InterferenceBitVector::operator[] const): (WTF::InterferenceBitVector::index const): (WTF::InterferenceVector::contains): (WTF::InterferenceVector::addAndReturnIsNewEntry): (WTF::InterferenceVector::add): (WTF::InterferenceVector::clear): (WTF::InterferenceVector::mayClear): (WTF::InterferenceVector::setMaxIndex): (WTF::InterferenceVector::forEach): (WTF::InterferenceVector::size const): (WTF::InterferenceVector::memoryUse const): (WTF::InterferenceVector::dumpMemoryUseInKB const): (WTF::InterferenceVector::Iterable::begin const): (WTF::InterferenceVector::Iterable::end const): (WTF::InterferenceVector::operator[] const): (WTF::UndirectedEdgesDuplicatingAdapter::contains): (WTF::UndirectedEdgesDuplicatingAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDuplicatingAdapter::add): (WTF::UndirectedEdgesDuplicatingAdapter::clear): (WTF::UndirectedEdgesDuplicatingAdapter::mayClear): (WTF::UndirectedEdgesDuplicatingAdapter::setMaxIndex): (WTF::UndirectedEdgesDuplicatingAdapter::forEach): (WTF::UndirectedEdgesDuplicatingAdapter::size const): (WTF::UndirectedEdgesDuplicatingAdapter::memoryUse const): (WTF::UndirectedEdgesDuplicatingAdapter::dumpMemoryUseInKB const): (WTF::UndirectedEdgesDuplicatingAdapter::operator[] const): (WTF::UndirectedEdgesDedupAdapter::contains): (WTF::UndirectedEdgesDedupAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDedupAdapter::add): (WTF::UndirectedEdgesDedupAdapter::clear): (WTF::UndirectedEdgesDedupAdapter::mayClear): (WTF::UndirectedEdgesDedupAdapter::setMaxIndex): (WTF::UndirectedEdgesDedupAdapter::forEach): (WTF::UndirectedEdgesDedupAdapter::size const): (WTF::UndirectedEdgesDedupAdapter::memoryUse const): (WTF::UndirectedEdgesDedupAdapter::dumpMemoryUseInKB const): (WTF::InterferenceHashSet::contains): (WTF::InterferenceHashSet::addAndReturnIsNewEntry): (WTF::InterferenceHashSet::add): (WTF::InterferenceHashSet::clear): (WTF::InterferenceHashSet::setMaxIndex): (WTF::InterferenceHashSet::forEach): (WTF::InterferenceHashSet::size const): (WTF::InterferenceHashSet::memoryUse const): (WTF::InterferenceHashSet::dumpMemoryUseInKB const): (WTF::InstrumentedInterferenceGraph::contains): (WTF::InstrumentedInterferenceGraph::addAndReturnIsNewEntry): (WTF::InstrumentedInterferenceGraph::add): (WTF::InstrumentedInterferenceGraph::clear): (WTF::InstrumentedInterferenceGraph::mayClear): (WTF::InstrumentedInterferenceGraph::setMaxIndex): (WTF::InstrumentedInterferenceGraph::forEach): (WTF::InstrumentedInterferenceGraph::size const): (WTF::InstrumentedInterferenceGraph::memoryUse const): (WTF::InstrumentedInterferenceGraph::dumpMemoryUseInKB const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::Iterable): (WTF::InstrumentedIterableInterferenceGraph::Iterable::begin const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::end const): (WTF::InstrumentedIterableInterferenceGraph::operator[] const): * wtf/LikelyDenseUnsignedIntegerSet.h: (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::clear): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::estimateHashSetSize): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): Canonical link: https://commits.webkit.org/238229@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@278186 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-28 02:08:19 +00:00
unsigned estimateHashSetSize(unsigned n)
{
unsigned hashSetEstimatedOccupancyOverhead = 3; // max occupancy for a large HashSet is 50%
unsigned wouldBeHashSetCapacity = std::max(8u, n) * hashSetEstimatedOccupancyOverhead;
return wouldBeHashSetCapacity * sizeof(IndexType);
}
Make AirAllocateRegistersByGraphColoring use less memory https://bugs.webkit.org/show_bug.cgi?id=225848 Reviewed by Filip Pizlo. Source/JavaScriptCore: We've had some jetsam problems caused by the main Air register allocator, which caused us to lower Options::maximumTmpsForGraphColoring. Hence this patch tries to improve the memory usage of the allocator. It includes several changes: - Change the datastructure used for representing the interference graph. Before it was effectively a HashSet<std::pair<uint16_t, uint16_t>. Now, it is either a Bitvector (for n < 400 for now, can be tweaked easily), or a Vector<LikelyDenseUnsignedIntegerSet<uint16_t>> otherwise. LikelyDenseUnsignedIntegerSet is a new datastructure introduced by this patch, it is either a HashSet if very sparse, or a BitVector + an amount to shift it by. This is by far the largest memory reduction in this patch, it reduces the maximum memory used for an interference graph in tsf-wasm in JetStream2 from 16MB to 700kB, and in mruby-wasm.aotoki.dev from 262MB to 20MB (the later only happen when we increase Options::maximumTmpsForGraphColoring.. this is the exact function which caused us to lower it). Its effect on smaller functions in JetStream2 is rarely as dramatic but always an improvement, and improvements between 2x and 5x are extremely common (10x to 30x are significantly rarer but do occur). - In order to easily test this change and any further change to this datastructure, the old approach was preserved as InterferenceHashSet, and a template to run two such datastructures in parallel, checking their equivalence was added: InstrumentedInterferenceGraph. Running with it and reportInterferenceGraphMemoryUse set to true was used to compute the numbers given above. - There was already some template parameter to change the size of the tmp indices from unsigned to uint16_t but the code failed to compile unless it was unsigned. I fixed this, made more consistent use of it, and switched to uint16_t in the very common case that we have less than 65k Tmps (we can have more despite the option because of spilling). This halved the memory usage of various other datastructures in the register allocator - unspillableTmps was a HashSet<unsigned>. Since it is often quite dense (often around 20% on JetStream2), I replaced it by a Bitvector instead - m_biases was a HashMap<IndexType, HashSet<IndexType>>. Since it is extremely rare that the sets have more than 8 elements (from looking at some instrumented run of JetStream2), I replaced it by HashMap<IndexType, SmallSet<IndexType>>. This not only significantly reduces memory, but nearly halves the time spent in assignColors(around 80ms -> 40ms in JetStream 2) - UseCounts was needlessly general: it is only used by the register allocator (all other references to UseCounts refer to the completely different B3::UseCounts), so there is no point in it computing, and then storing lots of irrelevant data. A float is also more than enough precision (especially since it is pretty much always 1, 10, 100, or 1000 in practice…). Also, since we only need it indexed by Tmps, we can use a Vector with AbsoluteTmpMapper instead of its HashMap. These changes are not just memory savings, they also make selectSpill way faster (570ms -> 250ms on my machine on JetStream2) - While I was at it, I did a couple of other tweaks to the logic of selectSpill. In particular, instead of having to check for isFastTmp every time, I just put the fast tmps directly in unspillableTmps, which prevents them from getting added to m_spillWorklist in the first place. This + a bit of clean-up (for example putting an early exit instead of setting score to infinity in the case of dead tmps) resulted in a further perf win (to roughly 200ms spent in selectSpill() on JetStream2) All together, this patch reduces the time spent in the register allocator by roughly 15 to 20% in JetStream2 (tested both with the Briggs and the IRC allocators on my MBP 2019). I do not yet have precise performance numbers for this exact patch, but benchmarking a previous version of it (with a less optimized interference graph) resulted in significant RAMification improvements (around 1%), and more surprisingly some JetStream2 improvements on weaker machines (e.g. an iPhone 7 gained > 1%). I believe these gains come either from less trashing of the caches, or less contention caused by the memory traffic. I will try to update the bugzilla with more up-to-date thorough results when I get them. This patch does not increase Options::maximumTmpsForGraphColoring, I intend to do that in a separate patch to make it easier to revert in case of a problem. * b3/B3ReduceLoopStrength.cpp: (JSC::B3::ReduceLoopStrength::reduceByteCopyLoopsToMemcpy): * b3/air/AirAllocateRegistersAndStackByLinearScan.cpp: * b3/air/AirAllocateRegistersByGraphColoring.cpp: (JSC::B3::Air::allocateRegistersByGraphColoring): * b3/air/AirCode.h: (JSC::B3::Air::Code::forEachFastTmp const): * b3/air/AirUseCounts.h: (JSC::B3::Air::UseCounts::UseCounts): (JSC::B3::Air::UseCounts::isConstDef const): (JSC::B3::Air::UseCounts::numWarmUsesAndDefs const): (JSC::B3::Air::UseCounts::dump const): * parser/Nodes.h: Source/WTF: Two changes: the addition of LikelyDenseUnsignedIntegerSet, and various improvements to Small(Ptr)Set. The latter include: - Renaming SmallPtrSet into SmallSet, as it now supports integers as well as pointers. - Reducing its size by sharing the same storage for m_buffer and for m_smallStorage. This is safe to do, because all operations branch on isSmall() which depends purely on m_capacity. - Adding trivial size(), isEmpty() and memoryUse() methods - Adding a comment at the top of the file explaining when to use, and (more importantly) not to use SmallSet. LikelyDenseUnsignedIntegerSet is an even more specialized data structure, that can represent sets of unsigned integers very compactly if they are clustered. Finally I added an outOfLineMemoryUse() method to BitVector, making it more convenient to compare the memory consumption of different data structures in the register allocator. * WTF.xcodeproj/project.pbxproj: * wtf/BitVector.h: * wtf/CMakeLists.txt: * wtf/LikelyDenseUnsignedIntegerSet.cpp: Copied from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/LikelyDenseUnsignedIntegerSet.h: Added. (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::~LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::contains const): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::size const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::iterator): (WTF::LikelyDenseUnsignedIntegerSet::iterator::m_shift): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator++): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator* const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator== const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator!= const): (WTF::LikelyDenseUnsignedIntegerSet::begin const): (WTF::LikelyDenseUnsignedIntegerSet::end const): (WTF::LikelyDenseUnsignedIntegerSet::memoryUse const): (WTF::LikelyDenseUnsignedIntegerSet::validate const): (WTF::LikelyDenseUnsignedIntegerSet::isBitVector const): (WTF::LikelyDenseUnsignedIntegerSet::isValidValue const): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): * wtf/SmallPtrSet.h: Removed. * wtf/SmallSet.cpp: Renamed from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/SmallSet.h: Added. Tools: Simply added some tests for SmallSet. * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/SmallSet.cpp: Added. (TestWebKitAPI::testSmallSetOfUnsigned): (TestWebKitAPI::testSmallSetOfPointers): (TestWebKitAPI::testVectorsOfSmallSetsOfUnsigned): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/237893@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277714 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-19 03:24:14 +00:00
void transitionToHashSet()
{
ASSERT(isBitVector());
Set newSet;
newSet.reserveInitialCapacity(m_size + 1);
for (IndexType oldIndex : m_inline.bitVector)
newSet.add(oldIndex + m_min);
m_inline.bitVector.~BitVector();
AirAllocateStackByGraphColoring should use the optimized interference graphs from AirAllocateRegistersByGraphColoring https://bugs.webkit.org/show_bug.cgi?id=226258 Reviewed by Phil Pizlo. Source/JavaScriptCore: The main change in this patch is that AirAllocateStackByGraphColoring is now using the optimized datastructures in wtf/InterferenceGraph.h. This required templating most of it over the interference graph used (Small/Large/Huge), but I tried keeping some common parts out of the templated class to minimize the impact on compile times and binary size. A consequence of that change is that coalescableMoves and remappedStackSlots now store indices instead of direct pointers to StackSlots, resulting in a memory reduction of about 3x as well. Another consequence is that I had to slightly alter the way that coalescing works: instead of carefully removing the interference edges of the killed slot, we simply use mayClear() which is not guaranteed to remove anything. I believe that this is sound, because every subsequent access to m_interference checks whether a slot has been coalesced first, so dropping these edges is purely a memory saving, but has no logical effect. The new code was tested in a few ways: - running on JetStream2 with asan - running on JetStream2 with TEST_OPTIMIZED_INTERFERENCE_GRAPH - running on JetStream2 and logging the frame sizes at the end of this phase, and comparing to the results of doing the same on ToT (same average frame size) The two functions where this code had the largest memory footprint in JetStream2 were both in tsf-wasm. One has 751 stack slots, and had an interference graph of 2.1MB and a coalescableMoves vector of 440kB The other has 673 stack slots, and had an interference graph of 1.9MB and a coalescableMoves vector of 421kB. With this patch, they respectively use 79kB+146kB and 67kB+140kB The effect on the rest of JetStream2 is less likely to matter as few functions used more than a few dozens of kB in this phase, but in percentages are just as huge. More importantly (and the reason I wrote this patch in the first place), I checked mruby-wasm.aotoki.dev which with a few other pages forced us to lower Options::maximumTmpsForGraphColoring because of jetsams. It has two massive functions that reach this phase if I increase Options::maximumTmpsForGraphColoring: - about 6k stack slots -> 215MB + 6MB (interference graph + coalescableMoves) - about 9k stack slots -> 395MB + 4MB After this patch, they respectively use 4.5MB+2MB and 9MB+1.5MB, or roughly a 40x improvement. Combined with the recent improvements to the register allocator, I hope to be able to increase Options::maximumTmpsForGraphColoring soon (in a different patch for easier bisection if either cause a perf regression). This would be helpful, since its lowering cratered our performance on some other wasm application by 8x. In terms of compile times, this patch lowered the time spent in AllocateStackByGraphColoring over the course of a run of JetStream2 from roughly 350ms to roughly 270ms. This is almost certainly negligible, but at least it guarantees that it did not regress. * b3/air/AirAllocateRegistersByGraphColoring.cpp: * b3/air/AirAllocateStackByGraphColoring.cpp: (JSC::B3::Air::allocateStackByGraphColoring): Source/WTF: I moved the interference graphs datastructures from AirAllocateRegistersByGraphColoring to their own wtf/InterferenceGraph.h file. There are three of them: - SmallInterferenceGraph, best for n < 400 - LargeInterferenceGraph, for n < 2**16 - HugeInterferenceGraph, for n up to 2**32 I also added "Iterable" versions of them, that have an operator[] method whose result you can iterate on to get all the indices which interfere with a given index. SmallIterableInterferenceGraph is the same as the non-iterable version, but the Large and Huge versions are a bit slower than their counterparts and use 2x memory. All of these were tested by running JetStream2 with the TEST_OPTIMIZED_INTERFERENCE_GRAPH set to 1. This flag makes the optimized datastructures run in parallel with a reference implementation, and their results are checked for equality on every method call. There is one small difference allowed: iteration is not guaranteed to go through elements in the same order. I also added a clear() method to LikelyDenseUnsignedIntegerSet, and added the NotNull flag to its various uses of placement new. * WTF.xcodeproj/project.pbxproj: * wtf/CMakeLists.txt: * wtf/HashSet.h: (WTF::W>::memoryUse const): * wtf/InterferenceGraph.h: Added. (WTF::InterferenceBitVector::contains): (WTF::InterferenceBitVector::addAndReturnIsNewEntry): (WTF::InterferenceBitVector::add): (WTF::InterferenceBitVector::clear): (WTF::InterferenceBitVector::mayClear): (WTF::InterferenceBitVector::setMaxIndex): (WTF::InterferenceBitVector::forEach): (WTF::InterferenceBitVector::size const): (WTF::InterferenceBitVector::memoryUse const): (WTF::InterferenceBitVector::dumpMemoryUseInKB const): (WTF::InterferenceBitVector::Iterable::iterator::operator++): (WTF::InterferenceBitVector::Iterable::iterator::operator* const): (WTF::InterferenceBitVector::Iterable::iterator::operator== const): (WTF::InterferenceBitVector::Iterable::iterator::operator!= const): (WTF::InterferenceBitVector::Iterable::begin const): (WTF::InterferenceBitVector::Iterable::end const): (WTF::InterferenceBitVector::operator[] const): (WTF::InterferenceBitVector::index const): (WTF::InterferenceVector::contains): (WTF::InterferenceVector::addAndReturnIsNewEntry): (WTF::InterferenceVector::add): (WTF::InterferenceVector::clear): (WTF::InterferenceVector::mayClear): (WTF::InterferenceVector::setMaxIndex): (WTF::InterferenceVector::forEach): (WTF::InterferenceVector::size const): (WTF::InterferenceVector::memoryUse const): (WTF::InterferenceVector::dumpMemoryUseInKB const): (WTF::InterferenceVector::Iterable::begin const): (WTF::InterferenceVector::Iterable::end const): (WTF::InterferenceVector::operator[] const): (WTF::UndirectedEdgesDuplicatingAdapter::contains): (WTF::UndirectedEdgesDuplicatingAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDuplicatingAdapter::add): (WTF::UndirectedEdgesDuplicatingAdapter::clear): (WTF::UndirectedEdgesDuplicatingAdapter::mayClear): (WTF::UndirectedEdgesDuplicatingAdapter::setMaxIndex): (WTF::UndirectedEdgesDuplicatingAdapter::forEach): (WTF::UndirectedEdgesDuplicatingAdapter::size const): (WTF::UndirectedEdgesDuplicatingAdapter::memoryUse const): (WTF::UndirectedEdgesDuplicatingAdapter::dumpMemoryUseInKB const): (WTF::UndirectedEdgesDuplicatingAdapter::operator[] const): (WTF::UndirectedEdgesDedupAdapter::contains): (WTF::UndirectedEdgesDedupAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDedupAdapter::add): (WTF::UndirectedEdgesDedupAdapter::clear): (WTF::UndirectedEdgesDedupAdapter::mayClear): (WTF::UndirectedEdgesDedupAdapter::setMaxIndex): (WTF::UndirectedEdgesDedupAdapter::forEach): (WTF::UndirectedEdgesDedupAdapter::size const): (WTF::UndirectedEdgesDedupAdapter::memoryUse const): (WTF::UndirectedEdgesDedupAdapter::dumpMemoryUseInKB const): (WTF::InterferenceHashSet::contains): (WTF::InterferenceHashSet::addAndReturnIsNewEntry): (WTF::InterferenceHashSet::add): (WTF::InterferenceHashSet::clear): (WTF::InterferenceHashSet::setMaxIndex): (WTF::InterferenceHashSet::forEach): (WTF::InterferenceHashSet::size const): (WTF::InterferenceHashSet::memoryUse const): (WTF::InterferenceHashSet::dumpMemoryUseInKB const): (WTF::InstrumentedInterferenceGraph::contains): (WTF::InstrumentedInterferenceGraph::addAndReturnIsNewEntry): (WTF::InstrumentedInterferenceGraph::add): (WTF::InstrumentedInterferenceGraph::clear): (WTF::InstrumentedInterferenceGraph::mayClear): (WTF::InstrumentedInterferenceGraph::setMaxIndex): (WTF::InstrumentedInterferenceGraph::forEach): (WTF::InstrumentedInterferenceGraph::size const): (WTF::InstrumentedInterferenceGraph::memoryUse const): (WTF::InstrumentedInterferenceGraph::dumpMemoryUseInKB const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::Iterable): (WTF::InstrumentedIterableInterferenceGraph::Iterable::begin const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::end const): (WTF::InstrumentedIterableInterferenceGraph::operator[] const): * wtf/LikelyDenseUnsignedIntegerSet.h: (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::clear): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::estimateHashSetSize): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): Canonical link: https://commits.webkit.org/238229@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@278186 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-28 02:08:19 +00:00
new (NotNull, &m_inline.hashSet) Set(WTFMove(newSet));
Make AirAllocateRegistersByGraphColoring use less memory https://bugs.webkit.org/show_bug.cgi?id=225848 Reviewed by Filip Pizlo. Source/JavaScriptCore: We've had some jetsam problems caused by the main Air register allocator, which caused us to lower Options::maximumTmpsForGraphColoring. Hence this patch tries to improve the memory usage of the allocator. It includes several changes: - Change the datastructure used for representing the interference graph. Before it was effectively a HashSet<std::pair<uint16_t, uint16_t>. Now, it is either a Bitvector (for n < 400 for now, can be tweaked easily), or a Vector<LikelyDenseUnsignedIntegerSet<uint16_t>> otherwise. LikelyDenseUnsignedIntegerSet is a new datastructure introduced by this patch, it is either a HashSet if very sparse, or a BitVector + an amount to shift it by. This is by far the largest memory reduction in this patch, it reduces the maximum memory used for an interference graph in tsf-wasm in JetStream2 from 16MB to 700kB, and in mruby-wasm.aotoki.dev from 262MB to 20MB (the later only happen when we increase Options::maximumTmpsForGraphColoring.. this is the exact function which caused us to lower it). Its effect on smaller functions in JetStream2 is rarely as dramatic but always an improvement, and improvements between 2x and 5x are extremely common (10x to 30x are significantly rarer but do occur). - In order to easily test this change and any further change to this datastructure, the old approach was preserved as InterferenceHashSet, and a template to run two such datastructures in parallel, checking their equivalence was added: InstrumentedInterferenceGraph. Running with it and reportInterferenceGraphMemoryUse set to true was used to compute the numbers given above. - There was already some template parameter to change the size of the tmp indices from unsigned to uint16_t but the code failed to compile unless it was unsigned. I fixed this, made more consistent use of it, and switched to uint16_t in the very common case that we have less than 65k Tmps (we can have more despite the option because of spilling). This halved the memory usage of various other datastructures in the register allocator - unspillableTmps was a HashSet<unsigned>. Since it is often quite dense (often around 20% on JetStream2), I replaced it by a Bitvector instead - m_biases was a HashMap<IndexType, HashSet<IndexType>>. Since it is extremely rare that the sets have more than 8 elements (from looking at some instrumented run of JetStream2), I replaced it by HashMap<IndexType, SmallSet<IndexType>>. This not only significantly reduces memory, but nearly halves the time spent in assignColors(around 80ms -> 40ms in JetStream 2) - UseCounts was needlessly general: it is only used by the register allocator (all other references to UseCounts refer to the completely different B3::UseCounts), so there is no point in it computing, and then storing lots of irrelevant data. A float is also more than enough precision (especially since it is pretty much always 1, 10, 100, or 1000 in practice…). Also, since we only need it indexed by Tmps, we can use a Vector with AbsoluteTmpMapper instead of its HashMap. These changes are not just memory savings, they also make selectSpill way faster (570ms -> 250ms on my machine on JetStream2) - While I was at it, I did a couple of other tweaks to the logic of selectSpill. In particular, instead of having to check for isFastTmp every time, I just put the fast tmps directly in unspillableTmps, which prevents them from getting added to m_spillWorklist in the first place. This + a bit of clean-up (for example putting an early exit instead of setting score to infinity in the case of dead tmps) resulted in a further perf win (to roughly 200ms spent in selectSpill() on JetStream2) All together, this patch reduces the time spent in the register allocator by roughly 15 to 20% in JetStream2 (tested both with the Briggs and the IRC allocators on my MBP 2019). I do not yet have precise performance numbers for this exact patch, but benchmarking a previous version of it (with a less optimized interference graph) resulted in significant RAMification improvements (around 1%), and more surprisingly some JetStream2 improvements on weaker machines (e.g. an iPhone 7 gained > 1%). I believe these gains come either from less trashing of the caches, or less contention caused by the memory traffic. I will try to update the bugzilla with more up-to-date thorough results when I get them. This patch does not increase Options::maximumTmpsForGraphColoring, I intend to do that in a separate patch to make it easier to revert in case of a problem. * b3/B3ReduceLoopStrength.cpp: (JSC::B3::ReduceLoopStrength::reduceByteCopyLoopsToMemcpy): * b3/air/AirAllocateRegistersAndStackByLinearScan.cpp: * b3/air/AirAllocateRegistersByGraphColoring.cpp: (JSC::B3::Air::allocateRegistersByGraphColoring): * b3/air/AirCode.h: (JSC::B3::Air::Code::forEachFastTmp const): * b3/air/AirUseCounts.h: (JSC::B3::Air::UseCounts::UseCounts): (JSC::B3::Air::UseCounts::isConstDef const): (JSC::B3::Air::UseCounts::numWarmUsesAndDefs const): (JSC::B3::Air::UseCounts::dump const): * parser/Nodes.h: Source/WTF: Two changes: the addition of LikelyDenseUnsignedIntegerSet, and various improvements to Small(Ptr)Set. The latter include: - Renaming SmallPtrSet into SmallSet, as it now supports integers as well as pointers. - Reducing its size by sharing the same storage for m_buffer and for m_smallStorage. This is safe to do, because all operations branch on isSmall() which depends purely on m_capacity. - Adding trivial size(), isEmpty() and memoryUse() methods - Adding a comment at the top of the file explaining when to use, and (more importantly) not to use SmallSet. LikelyDenseUnsignedIntegerSet is an even more specialized data structure, that can represent sets of unsigned integers very compactly if they are clustered. Finally I added an outOfLineMemoryUse() method to BitVector, making it more convenient to compare the memory consumption of different data structures in the register allocator. * WTF.xcodeproj/project.pbxproj: * wtf/BitVector.h: * wtf/CMakeLists.txt: * wtf/LikelyDenseUnsignedIntegerSet.cpp: Copied from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/LikelyDenseUnsignedIntegerSet.h: Added. (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::~LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::contains const): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::size const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::iterator): (WTF::LikelyDenseUnsignedIntegerSet::iterator::m_shift): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator++): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator* const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator== const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator!= const): (WTF::LikelyDenseUnsignedIntegerSet::begin const): (WTF::LikelyDenseUnsignedIntegerSet::end const): (WTF::LikelyDenseUnsignedIntegerSet::memoryUse const): (WTF::LikelyDenseUnsignedIntegerSet::validate const): (WTF::LikelyDenseUnsignedIntegerSet::isBitVector const): (WTF::LikelyDenseUnsignedIntegerSet::isValidValue const): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): * wtf/SmallPtrSet.h: Removed. * wtf/SmallSet.cpp: Renamed from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/SmallSet.h: Added. Tools: Simply added some tests for SmallSet. * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/SmallSet.cpp: Added. (TestWebKitAPI::testSmallSetOfUnsigned): (TestWebKitAPI::testSmallSetOfPointers): (TestWebKitAPI::testVectorsOfSmallSetsOfUnsigned): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/237893@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277714 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-19 03:24:14 +00:00
m_size = std::numeric_limits<unsigned>::max();
ASSERT(!isBitVector());
}
void transitionToBitVector()
{
ASSERT(!isBitVector());
BitVector newBitVector;
newBitVector.ensureSize(m_max - m_min + 1);
m_size = 0;
for (IndexType oldValue : m_inline.hashSet) {
newBitVector.quickSet(oldValue - m_min);
++m_size;
}
ASSERT(newBitVector.quickGet(0));
m_inline.hashSet.~Set();
AirAllocateStackByGraphColoring should use the optimized interference graphs from AirAllocateRegistersByGraphColoring https://bugs.webkit.org/show_bug.cgi?id=226258 Reviewed by Phil Pizlo. Source/JavaScriptCore: The main change in this patch is that AirAllocateStackByGraphColoring is now using the optimized datastructures in wtf/InterferenceGraph.h. This required templating most of it over the interference graph used (Small/Large/Huge), but I tried keeping some common parts out of the templated class to minimize the impact on compile times and binary size. A consequence of that change is that coalescableMoves and remappedStackSlots now store indices instead of direct pointers to StackSlots, resulting in a memory reduction of about 3x as well. Another consequence is that I had to slightly alter the way that coalescing works: instead of carefully removing the interference edges of the killed slot, we simply use mayClear() which is not guaranteed to remove anything. I believe that this is sound, because every subsequent access to m_interference checks whether a slot has been coalesced first, so dropping these edges is purely a memory saving, but has no logical effect. The new code was tested in a few ways: - running on JetStream2 with asan - running on JetStream2 with TEST_OPTIMIZED_INTERFERENCE_GRAPH - running on JetStream2 and logging the frame sizes at the end of this phase, and comparing to the results of doing the same on ToT (same average frame size) The two functions where this code had the largest memory footprint in JetStream2 were both in tsf-wasm. One has 751 stack slots, and had an interference graph of 2.1MB and a coalescableMoves vector of 440kB The other has 673 stack slots, and had an interference graph of 1.9MB and a coalescableMoves vector of 421kB. With this patch, they respectively use 79kB+146kB and 67kB+140kB The effect on the rest of JetStream2 is less likely to matter as few functions used more than a few dozens of kB in this phase, but in percentages are just as huge. More importantly (and the reason I wrote this patch in the first place), I checked mruby-wasm.aotoki.dev which with a few other pages forced us to lower Options::maximumTmpsForGraphColoring because of jetsams. It has two massive functions that reach this phase if I increase Options::maximumTmpsForGraphColoring: - about 6k stack slots -> 215MB + 6MB (interference graph + coalescableMoves) - about 9k stack slots -> 395MB + 4MB After this patch, they respectively use 4.5MB+2MB and 9MB+1.5MB, or roughly a 40x improvement. Combined with the recent improvements to the register allocator, I hope to be able to increase Options::maximumTmpsForGraphColoring soon (in a different patch for easier bisection if either cause a perf regression). This would be helpful, since its lowering cratered our performance on some other wasm application by 8x. In terms of compile times, this patch lowered the time spent in AllocateStackByGraphColoring over the course of a run of JetStream2 from roughly 350ms to roughly 270ms. This is almost certainly negligible, but at least it guarantees that it did not regress. * b3/air/AirAllocateRegistersByGraphColoring.cpp: * b3/air/AirAllocateStackByGraphColoring.cpp: (JSC::B3::Air::allocateStackByGraphColoring): Source/WTF: I moved the interference graphs datastructures from AirAllocateRegistersByGraphColoring to their own wtf/InterferenceGraph.h file. There are three of them: - SmallInterferenceGraph, best for n < 400 - LargeInterferenceGraph, for n < 2**16 - HugeInterferenceGraph, for n up to 2**32 I also added "Iterable" versions of them, that have an operator[] method whose result you can iterate on to get all the indices which interfere with a given index. SmallIterableInterferenceGraph is the same as the non-iterable version, but the Large and Huge versions are a bit slower than their counterparts and use 2x memory. All of these were tested by running JetStream2 with the TEST_OPTIMIZED_INTERFERENCE_GRAPH set to 1. This flag makes the optimized datastructures run in parallel with a reference implementation, and their results are checked for equality on every method call. There is one small difference allowed: iteration is not guaranteed to go through elements in the same order. I also added a clear() method to LikelyDenseUnsignedIntegerSet, and added the NotNull flag to its various uses of placement new. * WTF.xcodeproj/project.pbxproj: * wtf/CMakeLists.txt: * wtf/HashSet.h: (WTF::W>::memoryUse const): * wtf/InterferenceGraph.h: Added. (WTF::InterferenceBitVector::contains): (WTF::InterferenceBitVector::addAndReturnIsNewEntry): (WTF::InterferenceBitVector::add): (WTF::InterferenceBitVector::clear): (WTF::InterferenceBitVector::mayClear): (WTF::InterferenceBitVector::setMaxIndex): (WTF::InterferenceBitVector::forEach): (WTF::InterferenceBitVector::size const): (WTF::InterferenceBitVector::memoryUse const): (WTF::InterferenceBitVector::dumpMemoryUseInKB const): (WTF::InterferenceBitVector::Iterable::iterator::operator++): (WTF::InterferenceBitVector::Iterable::iterator::operator* const): (WTF::InterferenceBitVector::Iterable::iterator::operator== const): (WTF::InterferenceBitVector::Iterable::iterator::operator!= const): (WTF::InterferenceBitVector::Iterable::begin const): (WTF::InterferenceBitVector::Iterable::end const): (WTF::InterferenceBitVector::operator[] const): (WTF::InterferenceBitVector::index const): (WTF::InterferenceVector::contains): (WTF::InterferenceVector::addAndReturnIsNewEntry): (WTF::InterferenceVector::add): (WTF::InterferenceVector::clear): (WTF::InterferenceVector::mayClear): (WTF::InterferenceVector::setMaxIndex): (WTF::InterferenceVector::forEach): (WTF::InterferenceVector::size const): (WTF::InterferenceVector::memoryUse const): (WTF::InterferenceVector::dumpMemoryUseInKB const): (WTF::InterferenceVector::Iterable::begin const): (WTF::InterferenceVector::Iterable::end const): (WTF::InterferenceVector::operator[] const): (WTF::UndirectedEdgesDuplicatingAdapter::contains): (WTF::UndirectedEdgesDuplicatingAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDuplicatingAdapter::add): (WTF::UndirectedEdgesDuplicatingAdapter::clear): (WTF::UndirectedEdgesDuplicatingAdapter::mayClear): (WTF::UndirectedEdgesDuplicatingAdapter::setMaxIndex): (WTF::UndirectedEdgesDuplicatingAdapter::forEach): (WTF::UndirectedEdgesDuplicatingAdapter::size const): (WTF::UndirectedEdgesDuplicatingAdapter::memoryUse const): (WTF::UndirectedEdgesDuplicatingAdapter::dumpMemoryUseInKB const): (WTF::UndirectedEdgesDuplicatingAdapter::operator[] const): (WTF::UndirectedEdgesDedupAdapter::contains): (WTF::UndirectedEdgesDedupAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDedupAdapter::add): (WTF::UndirectedEdgesDedupAdapter::clear): (WTF::UndirectedEdgesDedupAdapter::mayClear): (WTF::UndirectedEdgesDedupAdapter::setMaxIndex): (WTF::UndirectedEdgesDedupAdapter::forEach): (WTF::UndirectedEdgesDedupAdapter::size const): (WTF::UndirectedEdgesDedupAdapter::memoryUse const): (WTF::UndirectedEdgesDedupAdapter::dumpMemoryUseInKB const): (WTF::InterferenceHashSet::contains): (WTF::InterferenceHashSet::addAndReturnIsNewEntry): (WTF::InterferenceHashSet::add): (WTF::InterferenceHashSet::clear): (WTF::InterferenceHashSet::setMaxIndex): (WTF::InterferenceHashSet::forEach): (WTF::InterferenceHashSet::size const): (WTF::InterferenceHashSet::memoryUse const): (WTF::InterferenceHashSet::dumpMemoryUseInKB const): (WTF::InstrumentedInterferenceGraph::contains): (WTF::InstrumentedInterferenceGraph::addAndReturnIsNewEntry): (WTF::InstrumentedInterferenceGraph::add): (WTF::InstrumentedInterferenceGraph::clear): (WTF::InstrumentedInterferenceGraph::mayClear): (WTF::InstrumentedInterferenceGraph::setMaxIndex): (WTF::InstrumentedInterferenceGraph::forEach): (WTF::InstrumentedInterferenceGraph::size const): (WTF::InstrumentedInterferenceGraph::memoryUse const): (WTF::InstrumentedInterferenceGraph::dumpMemoryUseInKB const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::Iterable): (WTF::InstrumentedIterableInterferenceGraph::Iterable::begin const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::end const): (WTF::InstrumentedIterableInterferenceGraph::operator[] const): * wtf/LikelyDenseUnsignedIntegerSet.h: (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::clear): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::estimateHashSetSize): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): Canonical link: https://commits.webkit.org/238229@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@278186 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-28 02:08:19 +00:00
new (NotNull, &m_inline.bitVector) BitVector(WTFMove(newBitVector));
Make AirAllocateRegistersByGraphColoring use less memory https://bugs.webkit.org/show_bug.cgi?id=225848 Reviewed by Filip Pizlo. Source/JavaScriptCore: We've had some jetsam problems caused by the main Air register allocator, which caused us to lower Options::maximumTmpsForGraphColoring. Hence this patch tries to improve the memory usage of the allocator. It includes several changes: - Change the datastructure used for representing the interference graph. Before it was effectively a HashSet<std::pair<uint16_t, uint16_t>. Now, it is either a Bitvector (for n < 400 for now, can be tweaked easily), or a Vector<LikelyDenseUnsignedIntegerSet<uint16_t>> otherwise. LikelyDenseUnsignedIntegerSet is a new datastructure introduced by this patch, it is either a HashSet if very sparse, or a BitVector + an amount to shift it by. This is by far the largest memory reduction in this patch, it reduces the maximum memory used for an interference graph in tsf-wasm in JetStream2 from 16MB to 700kB, and in mruby-wasm.aotoki.dev from 262MB to 20MB (the later only happen when we increase Options::maximumTmpsForGraphColoring.. this is the exact function which caused us to lower it). Its effect on smaller functions in JetStream2 is rarely as dramatic but always an improvement, and improvements between 2x and 5x are extremely common (10x to 30x are significantly rarer but do occur). - In order to easily test this change and any further change to this datastructure, the old approach was preserved as InterferenceHashSet, and a template to run two such datastructures in parallel, checking their equivalence was added: InstrumentedInterferenceGraph. Running with it and reportInterferenceGraphMemoryUse set to true was used to compute the numbers given above. - There was already some template parameter to change the size of the tmp indices from unsigned to uint16_t but the code failed to compile unless it was unsigned. I fixed this, made more consistent use of it, and switched to uint16_t in the very common case that we have less than 65k Tmps (we can have more despite the option because of spilling). This halved the memory usage of various other datastructures in the register allocator - unspillableTmps was a HashSet<unsigned>. Since it is often quite dense (often around 20% on JetStream2), I replaced it by a Bitvector instead - m_biases was a HashMap<IndexType, HashSet<IndexType>>. Since it is extremely rare that the sets have more than 8 elements (from looking at some instrumented run of JetStream2), I replaced it by HashMap<IndexType, SmallSet<IndexType>>. This not only significantly reduces memory, but nearly halves the time spent in assignColors(around 80ms -> 40ms in JetStream 2) - UseCounts was needlessly general: it is only used by the register allocator (all other references to UseCounts refer to the completely different B3::UseCounts), so there is no point in it computing, and then storing lots of irrelevant data. A float is also more than enough precision (especially since it is pretty much always 1, 10, 100, or 1000 in practice…). Also, since we only need it indexed by Tmps, we can use a Vector with AbsoluteTmpMapper instead of its HashMap. These changes are not just memory savings, they also make selectSpill way faster (570ms -> 250ms on my machine on JetStream2) - While I was at it, I did a couple of other tweaks to the logic of selectSpill. In particular, instead of having to check for isFastTmp every time, I just put the fast tmps directly in unspillableTmps, which prevents them from getting added to m_spillWorklist in the first place. This + a bit of clean-up (for example putting an early exit instead of setting score to infinity in the case of dead tmps) resulted in a further perf win (to roughly 200ms spent in selectSpill() on JetStream2) All together, this patch reduces the time spent in the register allocator by roughly 15 to 20% in JetStream2 (tested both with the Briggs and the IRC allocators on my MBP 2019). I do not yet have precise performance numbers for this exact patch, but benchmarking a previous version of it (with a less optimized interference graph) resulted in significant RAMification improvements (around 1%), and more surprisingly some JetStream2 improvements on weaker machines (e.g. an iPhone 7 gained > 1%). I believe these gains come either from less trashing of the caches, or less contention caused by the memory traffic. I will try to update the bugzilla with more up-to-date thorough results when I get them. This patch does not increase Options::maximumTmpsForGraphColoring, I intend to do that in a separate patch to make it easier to revert in case of a problem. * b3/B3ReduceLoopStrength.cpp: (JSC::B3::ReduceLoopStrength::reduceByteCopyLoopsToMemcpy): * b3/air/AirAllocateRegistersAndStackByLinearScan.cpp: * b3/air/AirAllocateRegistersByGraphColoring.cpp: (JSC::B3::Air::allocateRegistersByGraphColoring): * b3/air/AirCode.h: (JSC::B3::Air::Code::forEachFastTmp const): * b3/air/AirUseCounts.h: (JSC::B3::Air::UseCounts::UseCounts): (JSC::B3::Air::UseCounts::isConstDef const): (JSC::B3::Air::UseCounts::numWarmUsesAndDefs const): (JSC::B3::Air::UseCounts::dump const): * parser/Nodes.h: Source/WTF: Two changes: the addition of LikelyDenseUnsignedIntegerSet, and various improvements to Small(Ptr)Set. The latter include: - Renaming SmallPtrSet into SmallSet, as it now supports integers as well as pointers. - Reducing its size by sharing the same storage for m_buffer and for m_smallStorage. This is safe to do, because all operations branch on isSmall() which depends purely on m_capacity. - Adding trivial size(), isEmpty() and memoryUse() methods - Adding a comment at the top of the file explaining when to use, and (more importantly) not to use SmallSet. LikelyDenseUnsignedIntegerSet is an even more specialized data structure, that can represent sets of unsigned integers very compactly if they are clustered. Finally I added an outOfLineMemoryUse() method to BitVector, making it more convenient to compare the memory consumption of different data structures in the register allocator. * WTF.xcodeproj/project.pbxproj: * wtf/BitVector.h: * wtf/CMakeLists.txt: * wtf/LikelyDenseUnsignedIntegerSet.cpp: Copied from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/LikelyDenseUnsignedIntegerSet.h: Added. (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::~LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::contains const): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::size const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::iterator): (WTF::LikelyDenseUnsignedIntegerSet::iterator::m_shift): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator++): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator* const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator== const): (WTF::LikelyDenseUnsignedIntegerSet::iterator::operator!= const): (WTF::LikelyDenseUnsignedIntegerSet::begin const): (WTF::LikelyDenseUnsignedIntegerSet::end const): (WTF::LikelyDenseUnsignedIntegerSet::memoryUse const): (WTF::LikelyDenseUnsignedIntegerSet::validate const): (WTF::LikelyDenseUnsignedIntegerSet::isBitVector const): (WTF::LikelyDenseUnsignedIntegerSet::isValidValue const): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): * wtf/SmallPtrSet.h: Removed. * wtf/SmallSet.cpp: Renamed from Source/WTF/wtf/SmallPtrSet.cpp. * wtf/SmallSet.h: Added. Tools: Simply added some tests for SmallSet. * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/SmallSet.cpp: Added. (TestWebKitAPI::testSmallSetOfUnsigned): (TestWebKitAPI::testSmallSetOfPointers): (TestWebKitAPI::testVectorsOfSmallSetsOfUnsigned): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/237893@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277714 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-19 03:24:14 +00:00
ASSERT(isBitVector());
}
union U {
BitVector bitVector;
Set hashSet;
U() { }
~U() { }
} m_inline;
unsigned m_size; // Only updated in BitVector mode, std::numeric_limits<unsigned>::max() to indicate HashSet mode
IndexType m_min;
IndexType m_max;
};
} // namespace WTF
using WTF::LikelyDenseUnsignedIntegerSet;