haikuwebkit/Source/WTF/wtf/ConcurrentPtrHashSet.cpp

211 lines
7.3 KiB
C++
Raw Permalink Normal View History

GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
/*
Fix race condition in ConcurrentPtrHashSet. https://bugs.webkit.org/show_bug.cgi?id=223241 rdar://74637896 Reviewed by Yusuke Suzuki. JSTests: * stress/race-to-add-opaque-roots-in-ConcurrentPtrHashSet.js: Added. Source/WTF: There exists a race condition where ConcurrentPtrHashSet::resizeIfNecessary() may not capture an entry added by ConcurrentPtrHashSet::addSlow() concurrently. ConcurrentPtrHashSet::addSlow() currently does the following: { if (table->load.exchangeAdd(1) >= table->maxLoad()) // (a1) return resizeAndAdd(ptr); // (a2) for (;;) { void* oldEntry = table->array[index].compareExchangeStrong(nullptr, ptr); // (a3) if (!oldEntry) { if (m_table.load() != table) { // (a4) // We added an entry to an old table! We need to reexecute the add on the new table. return add(ptr); // (a5) } return true; // (a6) } if (oldEntry == ptr) return false; ... // set index to next entry slot to try. } } ConcurrentPtrHashSet::resizeIfNecessary() currently does the following: { auto locker = holdLock(m_lock); // (r1) Table* table = m_table.loadRelaxed(); if (table->load.loadRelaxed() < table->maxLoad()) return; // (r2) std::unique_ptr<Table> newTable = Table::create(table->size * 2); ... for (unsigned i = 0; i < table->size; ++i) { // (r3) void* ptr = table->array[i].loadRelaxed(); if (!ptr) continue; ... // copy ptr to newTable. // (r4) } ... m_table.store(newTable.get()); // (r5) ... } Let's say thread T1 is executing addSlow(), and thread T2 is concurrently executing resizeIfNecessary(). Consider the following scenario (in chronological order): 1. T2 has arrived at just before (r5) i.e. it is already done copying the entries in the old m_table. 2. T1 executes (a3) and writes a new entry into m_table. 3. T1 checks that the table hasn't been replaced at (a4), and sees that it has not. 4. T1 returns at (a6), thinking that its new entry is committed. 5. T2 sets the new m_table at (r5), thereby discarding the new entry that T1 has just written. The fix is to set m_table to a newly introduced m_stubTable at (r2). m_stubTable is set up with a size of 0, and load value of 10. This means it is always full. With this, the following scenarios can play out: Scenario 1: T2 installs m_stubTable before T1 reaches (a1) 1. At (a1), T1 sees that m_table (which is m_stubTable) is full. 2. T1 calls resizeAndAdd() at (a2), which ends up calling resizeIfNecessary() and blocking on the lock at (r1). Scenario 2: T2 installs m_stubTable after T1 reaches just before (a3) 1. T1 writes the new entry at (a3). 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 3: T2 installs m_stubTable after T1 reaches (a3), but before (a4) 1. The new entry has already been added, but we don't know if it made the cut off for T2 to copy it or not. But, it doesn't matter because ... 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 4: T2 installs m_stubTable after T1 reaches (a4) 1. The new entry has already been added. 2. T1 checks m_table at (a4), and sees that it has NOT changed (because T2 hasn't installed m_stubTable yet). This means T2's copy loop is guaranteed to not have started yet i.e. the new entry will definitely be picked up by the copy loop. 3. T1 returns at (a6), and all is well. * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::Table::initializeStub): * wtf/ConcurrentPtrHashSet.h: Canonical link: https://commits.webkit.org/235445@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@274608 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-03-18 02:40:00 +00:00
* Copyright (C) 2017-2021 Apple Inc. All rights reserved.
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
* OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "config.h"
Use pragma once in WTF https://bugs.webkit.org/show_bug.cgi?id=190527 Reviewed by Chris Dumez. Source/WTF: We also need to consistently include wtf headers from within wtf so we can build wtf without symbol redefinition errors from including the copy in Source and the copy in the build directory. * wtf/ASCIICType.h: * wtf/Assertions.cpp: * wtf/Assertions.h: * wtf/Atomics.h: * wtf/AutomaticThread.cpp: * wtf/AutomaticThread.h: * wtf/BackwardsGraph.h: * wtf/Bag.h: * wtf/BagToHashMap.h: * wtf/BitVector.cpp: * wtf/BitVector.h: * wtf/Bitmap.h: * wtf/BloomFilter.h: * wtf/Box.h: * wtf/BubbleSort.h: * wtf/BumpPointerAllocator.h: * wtf/ByteOrder.h: * wtf/CPUTime.cpp: * wtf/CallbackAggregator.h: * wtf/CheckedArithmetic.h: * wtf/CheckedBoolean.h: * wtf/ClockType.cpp: * wtf/ClockType.h: * wtf/CommaPrinter.h: * wtf/CompilationThread.cpp: * wtf/CompilationThread.h: * wtf/Compiler.h: * wtf/ConcurrentPtrHashSet.cpp: * wtf/ConcurrentVector.h: * wtf/Condition.h: * wtf/CountingLock.cpp: * wtf/CrossThreadTaskHandler.cpp: * wtf/CryptographicUtilities.cpp: * wtf/CryptographicUtilities.h: * wtf/CryptographicallyRandomNumber.cpp: * wtf/CryptographicallyRandomNumber.h: * wtf/CurrentTime.cpp: * wtf/DataLog.cpp: * wtf/DataLog.h: * wtf/DateMath.cpp: * wtf/DateMath.h: * wtf/DecimalNumber.cpp: * wtf/DecimalNumber.h: * wtf/Deque.h: * wtf/DisallowCType.h: * wtf/Dominators.h: * wtf/DoublyLinkedList.h: * wtf/FastBitVector.cpp: * wtf/FastMalloc.cpp: * wtf/FastMalloc.h: * wtf/FeatureDefines.h: * wtf/FilePrintStream.cpp: * wtf/FilePrintStream.h: * wtf/FlipBytes.h: * wtf/FunctionDispatcher.cpp: * wtf/FunctionDispatcher.h: * wtf/GetPtr.h: * wtf/Gigacage.cpp: * wtf/GlobalVersion.cpp: * wtf/GraphNodeWorklist.h: * wtf/GregorianDateTime.cpp: * wtf/GregorianDateTime.h: * wtf/HashFunctions.h: * wtf/HashMap.h: * wtf/HashMethod.h: * wtf/HashSet.h: * wtf/HashTable.cpp: * wtf/HashTraits.h: * wtf/Indenter.h: * wtf/IndexSparseSet.h: * wtf/InlineASM.h: * wtf/Insertion.h: * wtf/IteratorAdaptors.h: * wtf/IteratorRange.h: * wtf/JSONValues.cpp: * wtf/JSValueMalloc.cpp: * wtf/LEBDecoder.h: * wtf/Language.cpp: * wtf/ListDump.h: * wtf/Lock.cpp: * wtf/Lock.h: * wtf/LockAlgorithm.h: * wtf/LockedPrintStream.cpp: * wtf/Locker.h: * wtf/MD5.cpp: * wtf/MD5.h: * wtf/MainThread.cpp: * wtf/MainThread.h: * wtf/MallocPtr.h: * wtf/MathExtras.h: * wtf/MediaTime.cpp: * wtf/MediaTime.h: * wtf/MemoryPressureHandler.cpp: * wtf/MessageQueue.h: * wtf/MetaAllocator.cpp: * wtf/MetaAllocator.h: * wtf/MetaAllocatorHandle.h: * wtf/MonotonicTime.cpp: * wtf/MonotonicTime.h: * wtf/NakedPtr.h: * wtf/NoLock.h: * wtf/NoTailCalls.h: * wtf/Noncopyable.h: * wtf/NumberOfCores.cpp: * wtf/NumberOfCores.h: * wtf/OSAllocator.h: * wtf/OSAllocatorPosix.cpp: * wtf/OSRandomSource.cpp: * wtf/OSRandomSource.h: * wtf/ObjcRuntimeExtras.h: * wtf/OrderMaker.h: * wtf/PackedIntVector.h: * wtf/PageAllocation.h: * wtf/PageBlock.cpp: * wtf/PageBlock.h: * wtf/PageReservation.h: * wtf/ParallelHelperPool.cpp: * wtf/ParallelHelperPool.h: * wtf/ParallelJobs.h: * wtf/ParallelJobsLibdispatch.h: * wtf/ParallelVectorIterator.h: * wtf/ParkingLot.cpp: * wtf/ParkingLot.h: * wtf/Platform.h: * wtf/PointerComparison.h: * wtf/Poisoned.cpp: * wtf/PrintStream.cpp: * wtf/PrintStream.h: * wtf/ProcessID.h: * wtf/ProcessPrivilege.cpp: * wtf/RAMSize.cpp: * wtf/RAMSize.h: * wtf/RandomDevice.cpp: * wtf/RandomNumber.cpp: * wtf/RandomNumber.h: * wtf/RandomNumberSeed.h: * wtf/RangeSet.h: * wtf/RawPointer.h: * wtf/ReadWriteLock.cpp: * wtf/RedBlackTree.h: * wtf/Ref.h: * wtf/RefCountedArray.h: * wtf/RefCountedLeakCounter.cpp: * wtf/RefCountedLeakCounter.h: * wtf/RefCounter.h: * wtf/RefPtr.h: * wtf/RetainPtr.h: * wtf/RunLoop.cpp: * wtf/RunLoop.h: * wtf/RunLoopTimer.h: * wtf/RunLoopTimerCF.cpp: * wtf/SHA1.cpp: * wtf/SHA1.h: * wtf/SaturatedArithmetic.h: (saturatedSubtraction): * wtf/SchedulePair.h: * wtf/SchedulePairCF.cpp: * wtf/SchedulePairMac.mm: * wtf/ScopedLambda.h: * wtf/Seconds.cpp: * wtf/Seconds.h: * wtf/SegmentedVector.h: * wtf/SentinelLinkedList.h: * wtf/SharedTask.h: * wtf/SimpleStats.h: * wtf/SingleRootGraph.h: * wtf/SinglyLinkedList.h: * wtf/SixCharacterHash.cpp: * wtf/SixCharacterHash.h: * wtf/SmallPtrSet.h: * wtf/Spectrum.h: * wtf/StackBounds.cpp: * wtf/StackBounds.h: * wtf/StackStats.cpp: * wtf/StackStats.h: * wtf/StackTrace.cpp: * wtf/StdLibExtras.h: * wtf/StreamBuffer.h: * wtf/StringHashDumpContext.h: * wtf/StringPrintStream.cpp: * wtf/StringPrintStream.h: * wtf/ThreadGroup.cpp: * wtf/ThreadMessage.cpp: * wtf/ThreadSpecific.h: * wtf/Threading.cpp: * wtf/Threading.h: * wtf/ThreadingPrimitives.h: * wtf/ThreadingPthreads.cpp: * wtf/TimeWithDynamicClockType.cpp: * wtf/TimeWithDynamicClockType.h: * wtf/TimingScope.cpp: * wtf/TinyLRUCache.h: * wtf/TinyPtrSet.h: * wtf/TriState.h: * wtf/TypeCasts.h: * wtf/UUID.cpp: * wtf/UnionFind.h: * wtf/VMTags.h: * wtf/ValueCheck.h: * wtf/Vector.h: * wtf/VectorTraits.h: * wtf/WallTime.cpp: * wtf/WallTime.h: * wtf/WeakPtr.h: * wtf/WeakRandom.h: * wtf/WordLock.cpp: * wtf/WordLock.h: * wtf/WorkQueue.cpp: * wtf/WorkQueue.h: * wtf/WorkerPool.cpp: * wtf/cf/LanguageCF.cpp: * wtf/cf/RunLoopCF.cpp: * wtf/cocoa/Entitlements.mm: * wtf/cocoa/MachSendRight.cpp: * wtf/cocoa/MainThreadCocoa.mm: * wtf/cocoa/MemoryFootprintCocoa.cpp: * wtf/cocoa/WorkQueueCocoa.cpp: * wtf/dtoa.cpp: * wtf/dtoa.h: * wtf/ios/WebCoreThread.cpp: * wtf/ios/WebCoreThread.h: * wtf/mac/AppKitCompatibilityDeclarations.h: * wtf/mac/DeprecatedSymbolsUsedBySafari.mm: * wtf/mbmalloc.cpp: * wtf/persistence/PersistentCoders.cpp: * wtf/persistence/PersistentDecoder.cpp: * wtf/persistence/PersistentEncoder.cpp: * wtf/spi/cf/CFBundleSPI.h: * wtf/spi/darwin/CommonCryptoSPI.h: * wtf/text/ASCIIFastPath.h: * wtf/text/ASCIILiteral.cpp: * wtf/text/AtomicString.cpp: * wtf/text/AtomicString.h: * wtf/text/AtomicStringHash.h: * wtf/text/AtomicStringImpl.cpp: * wtf/text/AtomicStringImpl.h: * wtf/text/AtomicStringTable.cpp: * wtf/text/AtomicStringTable.h: * wtf/text/Base64.cpp: * wtf/text/CString.cpp: * wtf/text/CString.h: * wtf/text/ConversionMode.h: * wtf/text/ExternalStringImpl.cpp: * wtf/text/IntegerToStringConversion.h: * wtf/text/LChar.h: * wtf/text/LineEnding.cpp: * wtf/text/StringBuffer.h: * wtf/text/StringBuilder.cpp: * wtf/text/StringBuilder.h: * wtf/text/StringBuilderJSON.cpp: * wtf/text/StringCommon.h: * wtf/text/StringConcatenate.h: * wtf/text/StringHash.h: * wtf/text/StringImpl.cpp: * wtf/text/StringImpl.h: * wtf/text/StringOperators.h: * wtf/text/StringView.cpp: * wtf/text/StringView.h: * wtf/text/SymbolImpl.cpp: * wtf/text/SymbolRegistry.cpp: * wtf/text/SymbolRegistry.h: * wtf/text/TextBreakIterator.cpp: * wtf/text/TextBreakIterator.h: * wtf/text/TextBreakIteratorInternalICU.h: * wtf/text/TextPosition.h: * wtf/text/TextStream.cpp: * wtf/text/UniquedStringImpl.h: * wtf/text/WTFString.cpp: * wtf/text/WTFString.h: * wtf/text/cocoa/StringCocoa.mm: * wtf/text/cocoa/StringViewCocoa.mm: * wtf/text/cocoa/TextBreakIteratorInternalICUCocoa.cpp: * wtf/text/icu/UTextProvider.cpp: * wtf/text/icu/UTextProvider.h: * wtf/text/icu/UTextProviderLatin1.cpp: * wtf/text/icu/UTextProviderLatin1.h: * wtf/text/icu/UTextProviderUTF16.cpp: * wtf/text/icu/UTextProviderUTF16.h: * wtf/threads/BinarySemaphore.cpp: * wtf/threads/BinarySemaphore.h: * wtf/threads/Signals.cpp: * wtf/unicode/CharacterNames.h: * wtf/unicode/Collator.h: * wtf/unicode/CollatorDefault.cpp: * wtf/unicode/UTF8.cpp: * wtf/unicode/UTF8.h: Tools: Put WorkQueue in namespace DRT so it does not conflict with WTF::WorkQueue. * DumpRenderTree/TestRunner.cpp: (TestRunner::queueLoadHTMLString): (TestRunner::queueLoadAlternateHTMLString): (TestRunner::queueBackNavigation): (TestRunner::queueForwardNavigation): (TestRunner::queueLoadingScript): (TestRunner::queueNonLoadingScript): (TestRunner::queueReload): * DumpRenderTree/WorkQueue.cpp: (WorkQueue::singleton): Deleted. (WorkQueue::WorkQueue): Deleted. (WorkQueue::queue): Deleted. (WorkQueue::dequeue): Deleted. (WorkQueue::count): Deleted. (WorkQueue::clear): Deleted. (WorkQueue::processWork): Deleted. * DumpRenderTree/WorkQueue.h: (WorkQueue::setFrozen): Deleted. * DumpRenderTree/WorkQueueItem.h: * DumpRenderTree/mac/DumpRenderTree.mm: (runTest): * DumpRenderTree/mac/FrameLoadDelegate.mm: (-[FrameLoadDelegate processWork:]): (-[FrameLoadDelegate webView:locationChangeDone:forDataSource:]): * DumpRenderTree/mac/TestRunnerMac.mm: (TestRunner::notifyDone): (TestRunner::forceImmediateCompletion): (TestRunner::queueLoad): * DumpRenderTree/win/DumpRenderTree.cpp: (runTest): * DumpRenderTree/win/FrameLoadDelegate.cpp: (FrameLoadDelegate::processWork): (FrameLoadDelegate::locationChangeDone): * DumpRenderTree/win/TestRunnerWin.cpp: (TestRunner::notifyDone): (TestRunner::forceImmediateCompletion): (TestRunner::queueLoad): Canonical link: https://commits.webkit.org/205473@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@237099 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2018-10-15 14:24:49 +00:00
#include <wtf/ConcurrentPtrHashSet.h>
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
namespace WTF {
ConcurrentPtrHashSet::ConcurrentPtrHashSet()
{
initialize();
}
ConcurrentPtrHashSet::~ConcurrentPtrHashSet()
{
}
void ConcurrentPtrHashSet::deleteOldTables()
{
// This is just in case. It does not make it OK for other threads to call add(). But it might prevent
// some bad crashes if we did make that mistake.
Stop using holdLock() in WTF as it is not compatible with Clang thread safety analysis https://bugs.webkit.org/show_bug.cgi?id=226117 Reviewed by Darin Adler. Stop using holdLock() in WTF as it is not compatible with Clang thread safety analysis (WTF::CheckedLock) and use the Locker constructor instead. This is a step towards getting rid of holdLock() completely. * benchmarks/ConditionSpeedTest.cpp: * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): * wtf/CountingLock.h: * wtf/HashTable.cpp: (WTF::HashTableStats::recordCollisionAtCount): (WTF::HashTableStats::dumpStats): * wtf/HashTable.h: (WTF::invalidateIterators): (WTF::addIterator): (WTF::removeIterator): * wtf/LockedPrintStream.cpp: (WTF::LockedPrintStream::vprintf): (WTF::LockedPrintStream::flush): * wtf/MetaAllocator.cpp: (WTF::MetaAllocatorHandle::~MetaAllocatorHandle): * wtf/MetaAllocator.h: (WTF::MetaAllocator::allocate): (WTF::MetaAllocator::currentStatistics): * wtf/ReadWriteLock.h: * wtf/StackShotProfiler.h: (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): * wtf/StackStats.cpp: (WTF::StackStats::CheckPoint::CheckPoint): (WTF::StackStats::CheckPoint::~CheckPoint): (WTF::StackStats::probe): (WTF::StackStats::LayoutCheckPoint::LayoutCheckPoint): (WTF::StackStats::LayoutCheckPoint::~LayoutCheckPoint): * wtf/ThreadGroup.cpp: (WTF::ThreadGroup::~ThreadGroup): (WTF::ThreadGroup::add): * wtf/ThreadMessage.cpp: (WTF::sendMessageScoped): * wtf/Threading.cpp: (WTF::Thread::didExit): (WTF::Thread::addToThreadGroup): (WTF::Thread::removeFromThreadGroup): (WTF::Thread::numberOfThreadGroups): * wtf/TimingScope.cpp: * wtf/WTFConfig.cpp: (WTF::Config::permanentlyFreeze): * wtf/WTFSemaphore.h: * wtf/posix/ThreadingPOSIX.cpp: (WTF::Thread::changePriority): (WTF::Thread::waitForCompletion): (WTF::Thread::detach): (WTF::Thread::signal): (WTF::Thread::establishPlatformSpecificHandle): * wtf/threads/BinarySemaphore.cpp: (WTF::BinarySemaphore::signal): (WTF::BinarySemaphore::waitUntil): * wtf/threads/Signals.cpp: (WTF::SignalHandlers::add): (WTF::registerThreadForMachExceptionHandling): (WTF::activateSignalHandlersFor): * wtf/win/LanguageWin.cpp: (WTF::platformLanguage): * wtf/win/ThreadingWin.cpp: (WTF::Thread::changePriority): (WTF::Thread::waitForCompletion): (WTF::Thread::detach): (WTF::Thread::establishPlatformSpecificHandle): Canonical link: https://commits.webkit.org/238033@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277900 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-22 00:11:37 +00:00
Locker locker { m_lock };
Fix race condition in ConcurrentPtrHashSet. https://bugs.webkit.org/show_bug.cgi?id=223241 rdar://74637896 Reviewed by Yusuke Suzuki. JSTests: * stress/race-to-add-opaque-roots-in-ConcurrentPtrHashSet.js: Added. Source/WTF: There exists a race condition where ConcurrentPtrHashSet::resizeIfNecessary() may not capture an entry added by ConcurrentPtrHashSet::addSlow() concurrently. ConcurrentPtrHashSet::addSlow() currently does the following: { if (table->load.exchangeAdd(1) >= table->maxLoad()) // (a1) return resizeAndAdd(ptr); // (a2) for (;;) { void* oldEntry = table->array[index].compareExchangeStrong(nullptr, ptr); // (a3) if (!oldEntry) { if (m_table.load() != table) { // (a4) // We added an entry to an old table! We need to reexecute the add on the new table. return add(ptr); // (a5) } return true; // (a6) } if (oldEntry == ptr) return false; ... // set index to next entry slot to try. } } ConcurrentPtrHashSet::resizeIfNecessary() currently does the following: { auto locker = holdLock(m_lock); // (r1) Table* table = m_table.loadRelaxed(); if (table->load.loadRelaxed() < table->maxLoad()) return; // (r2) std::unique_ptr<Table> newTable = Table::create(table->size * 2); ... for (unsigned i = 0; i < table->size; ++i) { // (r3) void* ptr = table->array[i].loadRelaxed(); if (!ptr) continue; ... // copy ptr to newTable. // (r4) } ... m_table.store(newTable.get()); // (r5) ... } Let's say thread T1 is executing addSlow(), and thread T2 is concurrently executing resizeIfNecessary(). Consider the following scenario (in chronological order): 1. T2 has arrived at just before (r5) i.e. it is already done copying the entries in the old m_table. 2. T1 executes (a3) and writes a new entry into m_table. 3. T1 checks that the table hasn't been replaced at (a4), and sees that it has not. 4. T1 returns at (a6), thinking that its new entry is committed. 5. T2 sets the new m_table at (r5), thereby discarding the new entry that T1 has just written. The fix is to set m_table to a newly introduced m_stubTable at (r2). m_stubTable is set up with a size of 0, and load value of 10. This means it is always full. With this, the following scenarios can play out: Scenario 1: T2 installs m_stubTable before T1 reaches (a1) 1. At (a1), T1 sees that m_table (which is m_stubTable) is full. 2. T1 calls resizeAndAdd() at (a2), which ends up calling resizeIfNecessary() and blocking on the lock at (r1). Scenario 2: T2 installs m_stubTable after T1 reaches just before (a3) 1. T1 writes the new entry at (a3). 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 3: T2 installs m_stubTable after T1 reaches (a3), but before (a4) 1. The new entry has already been added, but we don't know if it made the cut off for T2 to copy it or not. But, it doesn't matter because ... 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 4: T2 installs m_stubTable after T1 reaches (a4) 1. The new entry has already been added. 2. T1 checks m_table at (a4), and sees that it has NOT changed (because T2 hasn't installed m_stubTable yet). This means T2's copy loop is guaranteed to not have started yet i.e. the new entry will definitely be picked up by the copy loop. 3. T1 returns at (a6), and all is well. * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::Table::initializeStub): * wtf/ConcurrentPtrHashSet.h: Canonical link: https://commits.webkit.org/235445@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@274608 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-03-18 02:40:00 +00:00
ASSERT(m_table.loadRelaxed() != &m_stubTable);
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
m_allTables.removeAllMatching(
[&] (std::unique_ptr<Table>& table) -> bool {
return table.get() != m_table.loadRelaxed();
});
}
void ConcurrentPtrHashSet::clear()
{
// This is just in case. It does not make it OK for other threads to call add(). But it might prevent
// some bad crashes if we did make that mistake.
Stop using holdLock() in WTF as it is not compatible with Clang thread safety analysis https://bugs.webkit.org/show_bug.cgi?id=226117 Reviewed by Darin Adler. Stop using holdLock() in WTF as it is not compatible with Clang thread safety analysis (WTF::CheckedLock) and use the Locker constructor instead. This is a step towards getting rid of holdLock() completely. * benchmarks/ConditionSpeedTest.cpp: * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): * wtf/CountingLock.h: * wtf/HashTable.cpp: (WTF::HashTableStats::recordCollisionAtCount): (WTF::HashTableStats::dumpStats): * wtf/HashTable.h: (WTF::invalidateIterators): (WTF::addIterator): (WTF::removeIterator): * wtf/LockedPrintStream.cpp: (WTF::LockedPrintStream::vprintf): (WTF::LockedPrintStream::flush): * wtf/MetaAllocator.cpp: (WTF::MetaAllocatorHandle::~MetaAllocatorHandle): * wtf/MetaAllocator.h: (WTF::MetaAllocator::allocate): (WTF::MetaAllocator::currentStatistics): * wtf/ReadWriteLock.h: * wtf/StackShotProfiler.h: (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): * wtf/StackStats.cpp: (WTF::StackStats::CheckPoint::CheckPoint): (WTF::StackStats::CheckPoint::~CheckPoint): (WTF::StackStats::probe): (WTF::StackStats::LayoutCheckPoint::LayoutCheckPoint): (WTF::StackStats::LayoutCheckPoint::~LayoutCheckPoint): * wtf/ThreadGroup.cpp: (WTF::ThreadGroup::~ThreadGroup): (WTF::ThreadGroup::add): * wtf/ThreadMessage.cpp: (WTF::sendMessageScoped): * wtf/Threading.cpp: (WTF::Thread::didExit): (WTF::Thread::addToThreadGroup): (WTF::Thread::removeFromThreadGroup): (WTF::Thread::numberOfThreadGroups): * wtf/TimingScope.cpp: * wtf/WTFConfig.cpp: (WTF::Config::permanentlyFreeze): * wtf/WTFSemaphore.h: * wtf/posix/ThreadingPOSIX.cpp: (WTF::Thread::changePriority): (WTF::Thread::waitForCompletion): (WTF::Thread::detach): (WTF::Thread::signal): (WTF::Thread::establishPlatformSpecificHandle): * wtf/threads/BinarySemaphore.cpp: (WTF::BinarySemaphore::signal): (WTF::BinarySemaphore::waitUntil): * wtf/threads/Signals.cpp: (WTF::SignalHandlers::add): (WTF::registerThreadForMachExceptionHandling): (WTF::activateSignalHandlersFor): * wtf/win/LanguageWin.cpp: (WTF::platformLanguage): * wtf/win/ThreadingWin.cpp: (WTF::Thread::changePriority): (WTF::Thread::waitForCompletion): (WTF::Thread::detach): (WTF::Thread::establishPlatformSpecificHandle): Canonical link: https://commits.webkit.org/238033@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277900 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-22 00:11:37 +00:00
Locker locker { m_lock };
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
m_allTables.clear();
initialize();
}
void ConcurrentPtrHashSet::initialize()
{
constexpr unsigned initialSize = 32;
std::unique_ptr<Table> table = Table::create(initialSize);
m_table.storeRelaxed(table.get());
m_allTables.append(WTFMove(table));
Fix race condition in ConcurrentPtrHashSet. https://bugs.webkit.org/show_bug.cgi?id=223241 rdar://74637896 Reviewed by Yusuke Suzuki. JSTests: * stress/race-to-add-opaque-roots-in-ConcurrentPtrHashSet.js: Added. Source/WTF: There exists a race condition where ConcurrentPtrHashSet::resizeIfNecessary() may not capture an entry added by ConcurrentPtrHashSet::addSlow() concurrently. ConcurrentPtrHashSet::addSlow() currently does the following: { if (table->load.exchangeAdd(1) >= table->maxLoad()) // (a1) return resizeAndAdd(ptr); // (a2) for (;;) { void* oldEntry = table->array[index].compareExchangeStrong(nullptr, ptr); // (a3) if (!oldEntry) { if (m_table.load() != table) { // (a4) // We added an entry to an old table! We need to reexecute the add on the new table. return add(ptr); // (a5) } return true; // (a6) } if (oldEntry == ptr) return false; ... // set index to next entry slot to try. } } ConcurrentPtrHashSet::resizeIfNecessary() currently does the following: { auto locker = holdLock(m_lock); // (r1) Table* table = m_table.loadRelaxed(); if (table->load.loadRelaxed() < table->maxLoad()) return; // (r2) std::unique_ptr<Table> newTable = Table::create(table->size * 2); ... for (unsigned i = 0; i < table->size; ++i) { // (r3) void* ptr = table->array[i].loadRelaxed(); if (!ptr) continue; ... // copy ptr to newTable. // (r4) } ... m_table.store(newTable.get()); // (r5) ... } Let's say thread T1 is executing addSlow(), and thread T2 is concurrently executing resizeIfNecessary(). Consider the following scenario (in chronological order): 1. T2 has arrived at just before (r5) i.e. it is already done copying the entries in the old m_table. 2. T1 executes (a3) and writes a new entry into m_table. 3. T1 checks that the table hasn't been replaced at (a4), and sees that it has not. 4. T1 returns at (a6), thinking that its new entry is committed. 5. T2 sets the new m_table at (r5), thereby discarding the new entry that T1 has just written. The fix is to set m_table to a newly introduced m_stubTable at (r2). m_stubTable is set up with a size of 0, and load value of 10. This means it is always full. With this, the following scenarios can play out: Scenario 1: T2 installs m_stubTable before T1 reaches (a1) 1. At (a1), T1 sees that m_table (which is m_stubTable) is full. 2. T1 calls resizeAndAdd() at (a2), which ends up calling resizeIfNecessary() and blocking on the lock at (r1). Scenario 2: T2 installs m_stubTable after T1 reaches just before (a3) 1. T1 writes the new entry at (a3). 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 3: T2 installs m_stubTable after T1 reaches (a3), but before (a4) 1. The new entry has already been added, but we don't know if it made the cut off for T2 to copy it or not. But, it doesn't matter because ... 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 4: T2 installs m_stubTable after T1 reaches (a4) 1. The new entry has already been added. 2. T1 checks m_table at (a4), and sees that it has NOT changed (because T2 hasn't installed m_stubTable yet). This means T2's copy loop is guaranteed to not have started yet i.e. the new entry will definitely be picked up by the copy loop. 3. T1 returns at (a6), and all is well. * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::Table::initializeStub): * wtf/ConcurrentPtrHashSet.h: Canonical link: https://commits.webkit.org/235445@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@274608 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-03-18 02:40:00 +00:00
m_stubTable.initializeStub();
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
}
bool ConcurrentPtrHashSet::addSlow(Table* table, unsigned mask, unsigned startIndex, unsigned index, void* ptr)
{
if (table->load.exchangeAdd(1) >= table->maxLoad())
return resizeAndAdd(ptr);
for (;;) {
void* oldEntry = table->array[index].compareExchangeStrong(nullptr, ptr);
if (!oldEntry) {
if (m_table.load() != table) {
// We added an entry to an old table! We need to reexecute the add on the new table.
return add(ptr);
}
return true;
}
if (oldEntry == ptr)
return false;
index = (index + 1) & mask;
RELEASE_ASSERT(index != startIndex);
}
}
Fix race condition in ConcurrentPtrHashSet. https://bugs.webkit.org/show_bug.cgi?id=223241 rdar://74637896 Reviewed by Yusuke Suzuki. JSTests: * stress/race-to-add-opaque-roots-in-ConcurrentPtrHashSet.js: Added. Source/WTF: There exists a race condition where ConcurrentPtrHashSet::resizeIfNecessary() may not capture an entry added by ConcurrentPtrHashSet::addSlow() concurrently. ConcurrentPtrHashSet::addSlow() currently does the following: { if (table->load.exchangeAdd(1) >= table->maxLoad()) // (a1) return resizeAndAdd(ptr); // (a2) for (;;) { void* oldEntry = table->array[index].compareExchangeStrong(nullptr, ptr); // (a3) if (!oldEntry) { if (m_table.load() != table) { // (a4) // We added an entry to an old table! We need to reexecute the add on the new table. return add(ptr); // (a5) } return true; // (a6) } if (oldEntry == ptr) return false; ... // set index to next entry slot to try. } } ConcurrentPtrHashSet::resizeIfNecessary() currently does the following: { auto locker = holdLock(m_lock); // (r1) Table* table = m_table.loadRelaxed(); if (table->load.loadRelaxed() < table->maxLoad()) return; // (r2) std::unique_ptr<Table> newTable = Table::create(table->size * 2); ... for (unsigned i = 0; i < table->size; ++i) { // (r3) void* ptr = table->array[i].loadRelaxed(); if (!ptr) continue; ... // copy ptr to newTable. // (r4) } ... m_table.store(newTable.get()); // (r5) ... } Let's say thread T1 is executing addSlow(), and thread T2 is concurrently executing resizeIfNecessary(). Consider the following scenario (in chronological order): 1. T2 has arrived at just before (r5) i.e. it is already done copying the entries in the old m_table. 2. T1 executes (a3) and writes a new entry into m_table. 3. T1 checks that the table hasn't been replaced at (a4), and sees that it has not. 4. T1 returns at (a6), thinking that its new entry is committed. 5. T2 sets the new m_table at (r5), thereby discarding the new entry that T1 has just written. The fix is to set m_table to a newly introduced m_stubTable at (r2). m_stubTable is set up with a size of 0, and load value of 10. This means it is always full. With this, the following scenarios can play out: Scenario 1: T2 installs m_stubTable before T1 reaches (a1) 1. At (a1), T1 sees that m_table (which is m_stubTable) is full. 2. T1 calls resizeAndAdd() at (a2), which ends up calling resizeIfNecessary() and blocking on the lock at (r1). Scenario 2: T2 installs m_stubTable after T1 reaches just before (a3) 1. T1 writes the new entry at (a3). 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 3: T2 installs m_stubTable after T1 reaches (a3), but before (a4) 1. The new entry has already been added, but we don't know if it made the cut off for T2 to copy it or not. But, it doesn't matter because ... 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 4: T2 installs m_stubTable after T1 reaches (a4) 1. The new entry has already been added. 2. T1 checks m_table at (a4), and sees that it has NOT changed (because T2 hasn't installed m_stubTable yet). This means T2's copy loop is guaranteed to not have started yet i.e. the new entry will definitely be picked up by the copy loop. 3. T1 returns at (a6), and all is well. * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::Table::initializeStub): * wtf/ConcurrentPtrHashSet.h: Canonical link: https://commits.webkit.org/235445@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@274608 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-03-18 02:40:00 +00:00
bool ConcurrentPtrHashSet::containsImplSlow(void* ptr) const
{
Stop using holdLock() in WTF as it is not compatible with Clang thread safety analysis https://bugs.webkit.org/show_bug.cgi?id=226117 Reviewed by Darin Adler. Stop using holdLock() in WTF as it is not compatible with Clang thread safety analysis (WTF::CheckedLock) and use the Locker constructor instead. This is a step towards getting rid of holdLock() completely. * benchmarks/ConditionSpeedTest.cpp: * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): * wtf/CountingLock.h: * wtf/HashTable.cpp: (WTF::HashTableStats::recordCollisionAtCount): (WTF::HashTableStats::dumpStats): * wtf/HashTable.h: (WTF::invalidateIterators): (WTF::addIterator): (WTF::removeIterator): * wtf/LockedPrintStream.cpp: (WTF::LockedPrintStream::vprintf): (WTF::LockedPrintStream::flush): * wtf/MetaAllocator.cpp: (WTF::MetaAllocatorHandle::~MetaAllocatorHandle): * wtf/MetaAllocator.h: (WTF::MetaAllocator::allocate): (WTF::MetaAllocator::currentStatistics): * wtf/ReadWriteLock.h: * wtf/StackShotProfiler.h: (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): * wtf/StackStats.cpp: (WTF::StackStats::CheckPoint::CheckPoint): (WTF::StackStats::CheckPoint::~CheckPoint): (WTF::StackStats::probe): (WTF::StackStats::LayoutCheckPoint::LayoutCheckPoint): (WTF::StackStats::LayoutCheckPoint::~LayoutCheckPoint): * wtf/ThreadGroup.cpp: (WTF::ThreadGroup::~ThreadGroup): (WTF::ThreadGroup::add): * wtf/ThreadMessage.cpp: (WTF::sendMessageScoped): * wtf/Threading.cpp: (WTF::Thread::didExit): (WTF::Thread::addToThreadGroup): (WTF::Thread::removeFromThreadGroup): (WTF::Thread::numberOfThreadGroups): * wtf/TimingScope.cpp: * wtf/WTFConfig.cpp: (WTF::Config::permanentlyFreeze): * wtf/WTFSemaphore.h: * wtf/posix/ThreadingPOSIX.cpp: (WTF::Thread::changePriority): (WTF::Thread::waitForCompletion): (WTF::Thread::detach): (WTF::Thread::signal): (WTF::Thread::establishPlatformSpecificHandle): * wtf/threads/BinarySemaphore.cpp: (WTF::BinarySemaphore::signal): (WTF::BinarySemaphore::waitUntil): * wtf/threads/Signals.cpp: (WTF::SignalHandlers::add): (WTF::registerThreadForMachExceptionHandling): (WTF::activateSignalHandlersFor): * wtf/win/LanguageWin.cpp: (WTF::platformLanguage): * wtf/win/ThreadingWin.cpp: (WTF::Thread::changePriority): (WTF::Thread::waitForCompletion): (WTF::Thread::detach): (WTF::Thread::establishPlatformSpecificHandle): Canonical link: https://commits.webkit.org/238033@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277900 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-22 00:11:37 +00:00
Locker locker { m_lock };
Fix race condition in ConcurrentPtrHashSet. https://bugs.webkit.org/show_bug.cgi?id=223241 rdar://74637896 Reviewed by Yusuke Suzuki. JSTests: * stress/race-to-add-opaque-roots-in-ConcurrentPtrHashSet.js: Added. Source/WTF: There exists a race condition where ConcurrentPtrHashSet::resizeIfNecessary() may not capture an entry added by ConcurrentPtrHashSet::addSlow() concurrently. ConcurrentPtrHashSet::addSlow() currently does the following: { if (table->load.exchangeAdd(1) >= table->maxLoad()) // (a1) return resizeAndAdd(ptr); // (a2) for (;;) { void* oldEntry = table->array[index].compareExchangeStrong(nullptr, ptr); // (a3) if (!oldEntry) { if (m_table.load() != table) { // (a4) // We added an entry to an old table! We need to reexecute the add on the new table. return add(ptr); // (a5) } return true; // (a6) } if (oldEntry == ptr) return false; ... // set index to next entry slot to try. } } ConcurrentPtrHashSet::resizeIfNecessary() currently does the following: { auto locker = holdLock(m_lock); // (r1) Table* table = m_table.loadRelaxed(); if (table->load.loadRelaxed() < table->maxLoad()) return; // (r2) std::unique_ptr<Table> newTable = Table::create(table->size * 2); ... for (unsigned i = 0; i < table->size; ++i) { // (r3) void* ptr = table->array[i].loadRelaxed(); if (!ptr) continue; ... // copy ptr to newTable. // (r4) } ... m_table.store(newTable.get()); // (r5) ... } Let's say thread T1 is executing addSlow(), and thread T2 is concurrently executing resizeIfNecessary(). Consider the following scenario (in chronological order): 1. T2 has arrived at just before (r5) i.e. it is already done copying the entries in the old m_table. 2. T1 executes (a3) and writes a new entry into m_table. 3. T1 checks that the table hasn't been replaced at (a4), and sees that it has not. 4. T1 returns at (a6), thinking that its new entry is committed. 5. T2 sets the new m_table at (r5), thereby discarding the new entry that T1 has just written. The fix is to set m_table to a newly introduced m_stubTable at (r2). m_stubTable is set up with a size of 0, and load value of 10. This means it is always full. With this, the following scenarios can play out: Scenario 1: T2 installs m_stubTable before T1 reaches (a1) 1. At (a1), T1 sees that m_table (which is m_stubTable) is full. 2. T1 calls resizeAndAdd() at (a2), which ends up calling resizeIfNecessary() and blocking on the lock at (r1). Scenario 2: T2 installs m_stubTable after T1 reaches just before (a3) 1. T1 writes the new entry at (a3). 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 3: T2 installs m_stubTable after T1 reaches (a3), but before (a4) 1. The new entry has already been added, but we don't know if it made the cut off for T2 to copy it or not. But, it doesn't matter because ... 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 4: T2 installs m_stubTable after T1 reaches (a4) 1. The new entry has already been added. 2. T1 checks m_table at (a4), and sees that it has NOT changed (because T2 hasn't installed m_stubTable yet). This means T2's copy loop is guaranteed to not have started yet i.e. the new entry will definitely be picked up by the copy loop. 3. T1 returns at (a6), and all is well. * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::Table::initializeStub): * wtf/ConcurrentPtrHashSet.h: Canonical link: https://commits.webkit.org/235445@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@274608 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-03-18 02:40:00 +00:00
ASSERT(m_table.loadRelaxed() != &m_stubTable);
return containsImpl(ptr);
}
size_t ConcurrentPtrHashSet::sizeSlow() const
{
Stop using holdLock() in WTF as it is not compatible with Clang thread safety analysis https://bugs.webkit.org/show_bug.cgi?id=226117 Reviewed by Darin Adler. Stop using holdLock() in WTF as it is not compatible with Clang thread safety analysis (WTF::CheckedLock) and use the Locker constructor instead. This is a step towards getting rid of holdLock() completely. * benchmarks/ConditionSpeedTest.cpp: * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): * wtf/CountingLock.h: * wtf/HashTable.cpp: (WTF::HashTableStats::recordCollisionAtCount): (WTF::HashTableStats::dumpStats): * wtf/HashTable.h: (WTF::invalidateIterators): (WTF::addIterator): (WTF::removeIterator): * wtf/LockedPrintStream.cpp: (WTF::LockedPrintStream::vprintf): (WTF::LockedPrintStream::flush): * wtf/MetaAllocator.cpp: (WTF::MetaAllocatorHandle::~MetaAllocatorHandle): * wtf/MetaAllocator.h: (WTF::MetaAllocator::allocate): (WTF::MetaAllocator::currentStatistics): * wtf/ReadWriteLock.h: * wtf/StackShotProfiler.h: (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): * wtf/StackStats.cpp: (WTF::StackStats::CheckPoint::CheckPoint): (WTF::StackStats::CheckPoint::~CheckPoint): (WTF::StackStats::probe): (WTF::StackStats::LayoutCheckPoint::LayoutCheckPoint): (WTF::StackStats::LayoutCheckPoint::~LayoutCheckPoint): * wtf/ThreadGroup.cpp: (WTF::ThreadGroup::~ThreadGroup): (WTF::ThreadGroup::add): * wtf/ThreadMessage.cpp: (WTF::sendMessageScoped): * wtf/Threading.cpp: (WTF::Thread::didExit): (WTF::Thread::addToThreadGroup): (WTF::Thread::removeFromThreadGroup): (WTF::Thread::numberOfThreadGroups): * wtf/TimingScope.cpp: * wtf/WTFConfig.cpp: (WTF::Config::permanentlyFreeze): * wtf/WTFSemaphore.h: * wtf/posix/ThreadingPOSIX.cpp: (WTF::Thread::changePriority): (WTF::Thread::waitForCompletion): (WTF::Thread::detach): (WTF::Thread::signal): (WTF::Thread::establishPlatformSpecificHandle): * wtf/threads/BinarySemaphore.cpp: (WTF::BinarySemaphore::signal): (WTF::BinarySemaphore::waitUntil): * wtf/threads/Signals.cpp: (WTF::SignalHandlers::add): (WTF::registerThreadForMachExceptionHandling): (WTF::activateSignalHandlersFor): * wtf/win/LanguageWin.cpp: (WTF::platformLanguage): * wtf/win/ThreadingWin.cpp: (WTF::Thread::changePriority): (WTF::Thread::waitForCompletion): (WTF::Thread::detach): (WTF::Thread::establishPlatformSpecificHandle): Canonical link: https://commits.webkit.org/238033@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277900 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-22 00:11:37 +00:00
Locker locker { m_lock };
Fix race condition in ConcurrentPtrHashSet. https://bugs.webkit.org/show_bug.cgi?id=223241 rdar://74637896 Reviewed by Yusuke Suzuki. JSTests: * stress/race-to-add-opaque-roots-in-ConcurrentPtrHashSet.js: Added. Source/WTF: There exists a race condition where ConcurrentPtrHashSet::resizeIfNecessary() may not capture an entry added by ConcurrentPtrHashSet::addSlow() concurrently. ConcurrentPtrHashSet::addSlow() currently does the following: { if (table->load.exchangeAdd(1) >= table->maxLoad()) // (a1) return resizeAndAdd(ptr); // (a2) for (;;) { void* oldEntry = table->array[index].compareExchangeStrong(nullptr, ptr); // (a3) if (!oldEntry) { if (m_table.load() != table) { // (a4) // We added an entry to an old table! We need to reexecute the add on the new table. return add(ptr); // (a5) } return true; // (a6) } if (oldEntry == ptr) return false; ... // set index to next entry slot to try. } } ConcurrentPtrHashSet::resizeIfNecessary() currently does the following: { auto locker = holdLock(m_lock); // (r1) Table* table = m_table.loadRelaxed(); if (table->load.loadRelaxed() < table->maxLoad()) return; // (r2) std::unique_ptr<Table> newTable = Table::create(table->size * 2); ... for (unsigned i = 0; i < table->size; ++i) { // (r3) void* ptr = table->array[i].loadRelaxed(); if (!ptr) continue; ... // copy ptr to newTable. // (r4) } ... m_table.store(newTable.get()); // (r5) ... } Let's say thread T1 is executing addSlow(), and thread T2 is concurrently executing resizeIfNecessary(). Consider the following scenario (in chronological order): 1. T2 has arrived at just before (r5) i.e. it is already done copying the entries in the old m_table. 2. T1 executes (a3) and writes a new entry into m_table. 3. T1 checks that the table hasn't been replaced at (a4), and sees that it has not. 4. T1 returns at (a6), thinking that its new entry is committed. 5. T2 sets the new m_table at (r5), thereby discarding the new entry that T1 has just written. The fix is to set m_table to a newly introduced m_stubTable at (r2). m_stubTable is set up with a size of 0, and load value of 10. This means it is always full. With this, the following scenarios can play out: Scenario 1: T2 installs m_stubTable before T1 reaches (a1) 1. At (a1), T1 sees that m_table (which is m_stubTable) is full. 2. T1 calls resizeAndAdd() at (a2), which ends up calling resizeIfNecessary() and blocking on the lock at (r1). Scenario 2: T2 installs m_stubTable after T1 reaches just before (a3) 1. T1 writes the new entry at (a3). 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 3: T2 installs m_stubTable after T1 reaches (a3), but before (a4) 1. The new entry has already been added, but we don't know if it made the cut off for T2 to copy it or not. But, it doesn't matter because ... 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 4: T2 installs m_stubTable after T1 reaches (a4) 1. The new entry has already been added. 2. T1 checks m_table at (a4), and sees that it has NOT changed (because T2 hasn't installed m_stubTable yet). This means T2's copy loop is guaranteed to not have started yet i.e. the new entry will definitely be picked up by the copy loop. 3. T1 returns at (a6), and all is well. * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::Table::initializeStub): * wtf/ConcurrentPtrHashSet.h: Canonical link: https://commits.webkit.org/235445@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@274608 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-03-18 02:40:00 +00:00
ASSERT(m_table.loadRelaxed() != &m_stubTable);
return size();
}
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
void ConcurrentPtrHashSet::resizeIfNecessary()
{
Stop using holdLock() in WTF as it is not compatible with Clang thread safety analysis https://bugs.webkit.org/show_bug.cgi?id=226117 Reviewed by Darin Adler. Stop using holdLock() in WTF as it is not compatible with Clang thread safety analysis (WTF::CheckedLock) and use the Locker constructor instead. This is a step towards getting rid of holdLock() completely. * benchmarks/ConditionSpeedTest.cpp: * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): * wtf/CountingLock.h: * wtf/HashTable.cpp: (WTF::HashTableStats::recordCollisionAtCount): (WTF::HashTableStats::dumpStats): * wtf/HashTable.h: (WTF::invalidateIterators): (WTF::addIterator): (WTF::removeIterator): * wtf/LockedPrintStream.cpp: (WTF::LockedPrintStream::vprintf): (WTF::LockedPrintStream::flush): * wtf/MetaAllocator.cpp: (WTF::MetaAllocatorHandle::~MetaAllocatorHandle): * wtf/MetaAllocator.h: (WTF::MetaAllocator::allocate): (WTF::MetaAllocator::currentStatistics): * wtf/ReadWriteLock.h: * wtf/StackShotProfiler.h: (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): * wtf/StackStats.cpp: (WTF::StackStats::CheckPoint::CheckPoint): (WTF::StackStats::CheckPoint::~CheckPoint): (WTF::StackStats::probe): (WTF::StackStats::LayoutCheckPoint::LayoutCheckPoint): (WTF::StackStats::LayoutCheckPoint::~LayoutCheckPoint): * wtf/ThreadGroup.cpp: (WTF::ThreadGroup::~ThreadGroup): (WTF::ThreadGroup::add): * wtf/ThreadMessage.cpp: (WTF::sendMessageScoped): * wtf/Threading.cpp: (WTF::Thread::didExit): (WTF::Thread::addToThreadGroup): (WTF::Thread::removeFromThreadGroup): (WTF::Thread::numberOfThreadGroups): * wtf/TimingScope.cpp: * wtf/WTFConfig.cpp: (WTF::Config::permanentlyFreeze): * wtf/WTFSemaphore.h: * wtf/posix/ThreadingPOSIX.cpp: (WTF::Thread::changePriority): (WTF::Thread::waitForCompletion): (WTF::Thread::detach): (WTF::Thread::signal): (WTF::Thread::establishPlatformSpecificHandle): * wtf/threads/BinarySemaphore.cpp: (WTF::BinarySemaphore::signal): (WTF::BinarySemaphore::waitUntil): * wtf/threads/Signals.cpp: (WTF::SignalHandlers::add): (WTF::registerThreadForMachExceptionHandling): (WTF::activateSignalHandlersFor): * wtf/win/LanguageWin.cpp: (WTF::platformLanguage): * wtf/win/ThreadingWin.cpp: (WTF::Thread::changePriority): (WTF::Thread::waitForCompletion): (WTF::Thread::detach): (WTF::Thread::establishPlatformSpecificHandle): Canonical link: https://commits.webkit.org/238033@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@277900 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-05-22 00:11:37 +00:00
Locker locker { m_lock };
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
Table* table = m_table.loadRelaxed();
Fix race condition in ConcurrentPtrHashSet. https://bugs.webkit.org/show_bug.cgi?id=223241 rdar://74637896 Reviewed by Yusuke Suzuki. JSTests: * stress/race-to-add-opaque-roots-in-ConcurrentPtrHashSet.js: Added. Source/WTF: There exists a race condition where ConcurrentPtrHashSet::resizeIfNecessary() may not capture an entry added by ConcurrentPtrHashSet::addSlow() concurrently. ConcurrentPtrHashSet::addSlow() currently does the following: { if (table->load.exchangeAdd(1) >= table->maxLoad()) // (a1) return resizeAndAdd(ptr); // (a2) for (;;) { void* oldEntry = table->array[index].compareExchangeStrong(nullptr, ptr); // (a3) if (!oldEntry) { if (m_table.load() != table) { // (a4) // We added an entry to an old table! We need to reexecute the add on the new table. return add(ptr); // (a5) } return true; // (a6) } if (oldEntry == ptr) return false; ... // set index to next entry slot to try. } } ConcurrentPtrHashSet::resizeIfNecessary() currently does the following: { auto locker = holdLock(m_lock); // (r1) Table* table = m_table.loadRelaxed(); if (table->load.loadRelaxed() < table->maxLoad()) return; // (r2) std::unique_ptr<Table> newTable = Table::create(table->size * 2); ... for (unsigned i = 0; i < table->size; ++i) { // (r3) void* ptr = table->array[i].loadRelaxed(); if (!ptr) continue; ... // copy ptr to newTable. // (r4) } ... m_table.store(newTable.get()); // (r5) ... } Let's say thread T1 is executing addSlow(), and thread T2 is concurrently executing resizeIfNecessary(). Consider the following scenario (in chronological order): 1. T2 has arrived at just before (r5) i.e. it is already done copying the entries in the old m_table. 2. T1 executes (a3) and writes a new entry into m_table. 3. T1 checks that the table hasn't been replaced at (a4), and sees that it has not. 4. T1 returns at (a6), thinking that its new entry is committed. 5. T2 sets the new m_table at (r5), thereby discarding the new entry that T1 has just written. The fix is to set m_table to a newly introduced m_stubTable at (r2). m_stubTable is set up with a size of 0, and load value of 10. This means it is always full. With this, the following scenarios can play out: Scenario 1: T2 installs m_stubTable before T1 reaches (a1) 1. At (a1), T1 sees that m_table (which is m_stubTable) is full. 2. T1 calls resizeAndAdd() at (a2), which ends up calling resizeIfNecessary() and blocking on the lock at (r1). Scenario 2: T2 installs m_stubTable after T1 reaches just before (a3) 1. T1 writes the new entry at (a3). 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 3: T2 installs m_stubTable after T1 reaches (a3), but before (a4) 1. The new entry has already been added, but we don't know if it made the cut off for T2 to copy it or not. But, it doesn't matter because ... 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 4: T2 installs m_stubTable after T1 reaches (a4) 1. The new entry has already been added. 2. T1 checks m_table at (a4), and sees that it has NOT changed (because T2 hasn't installed m_stubTable yet). This means T2's copy loop is guaranteed to not have started yet i.e. the new entry will definitely be picked up by the copy loop. 3. T1 returns at (a6), and all is well. * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::Table::initializeStub): * wtf/ConcurrentPtrHashSet.h: Canonical link: https://commits.webkit.org/235445@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@274608 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-03-18 02:40:00 +00:00
ASSERT(table != &m_stubTable);
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
if (table->load.loadRelaxed() < table->maxLoad())
return;
Fix race condition in ConcurrentPtrHashSet. https://bugs.webkit.org/show_bug.cgi?id=223241 rdar://74637896 Reviewed by Yusuke Suzuki. JSTests: * stress/race-to-add-opaque-roots-in-ConcurrentPtrHashSet.js: Added. Source/WTF: There exists a race condition where ConcurrentPtrHashSet::resizeIfNecessary() may not capture an entry added by ConcurrentPtrHashSet::addSlow() concurrently. ConcurrentPtrHashSet::addSlow() currently does the following: { if (table->load.exchangeAdd(1) >= table->maxLoad()) // (a1) return resizeAndAdd(ptr); // (a2) for (;;) { void* oldEntry = table->array[index].compareExchangeStrong(nullptr, ptr); // (a3) if (!oldEntry) { if (m_table.load() != table) { // (a4) // We added an entry to an old table! We need to reexecute the add on the new table. return add(ptr); // (a5) } return true; // (a6) } if (oldEntry == ptr) return false; ... // set index to next entry slot to try. } } ConcurrentPtrHashSet::resizeIfNecessary() currently does the following: { auto locker = holdLock(m_lock); // (r1) Table* table = m_table.loadRelaxed(); if (table->load.loadRelaxed() < table->maxLoad()) return; // (r2) std::unique_ptr<Table> newTable = Table::create(table->size * 2); ... for (unsigned i = 0; i < table->size; ++i) { // (r3) void* ptr = table->array[i].loadRelaxed(); if (!ptr) continue; ... // copy ptr to newTable. // (r4) } ... m_table.store(newTable.get()); // (r5) ... } Let's say thread T1 is executing addSlow(), and thread T2 is concurrently executing resizeIfNecessary(). Consider the following scenario (in chronological order): 1. T2 has arrived at just before (r5) i.e. it is already done copying the entries in the old m_table. 2. T1 executes (a3) and writes a new entry into m_table. 3. T1 checks that the table hasn't been replaced at (a4), and sees that it has not. 4. T1 returns at (a6), thinking that its new entry is committed. 5. T2 sets the new m_table at (r5), thereby discarding the new entry that T1 has just written. The fix is to set m_table to a newly introduced m_stubTable at (r2). m_stubTable is set up with a size of 0, and load value of 10. This means it is always full. With this, the following scenarios can play out: Scenario 1: T2 installs m_stubTable before T1 reaches (a1) 1. At (a1), T1 sees that m_table (which is m_stubTable) is full. 2. T1 calls resizeAndAdd() at (a2), which ends up calling resizeIfNecessary() and blocking on the lock at (r1). Scenario 2: T2 installs m_stubTable after T1 reaches just before (a3) 1. T1 writes the new entry at (a3). 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 3: T2 installs m_stubTable after T1 reaches (a3), but before (a4) 1. The new entry has already been added, but we don't know if it made the cut off for T2 to copy it or not. But, it doesn't matter because ... 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 4: T2 installs m_stubTable after T1 reaches (a4) 1. The new entry has already been added. 2. T1 checks m_table at (a4), and sees that it has NOT changed (because T2 hasn't installed m_stubTable yet). This means T2's copy loop is guaranteed to not have started yet i.e. the new entry will definitely be picked up by the copy loop. 3. T1 returns at (a6), and all is well. * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::Table::initializeStub): * wtf/ConcurrentPtrHashSet.h: Canonical link: https://commits.webkit.org/235445@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@274608 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-03-18 02:40:00 +00:00
// Stubbing out m_table with m_stubTable here is necessary to ensure that
// we don't miss copying any entries that may be concurrently be added.
//
// If addSlow() completes before this stubbing, the new entry is guaranteed
// to be copied below.
//
// If addSlow() completes after this stubbing, addSlow() will check m_table
// before it finishes, and detect that its newly added entry may not have
// made it in. As a result, it will try to re-add the entry, and end up
// blocking on resizeIfNecessary() until the resizing is donw. This is
// because m_stubTable will tell addSlow() think that the table is out of
// space and it needs to resize. NOTE: m_stubTable always says it is out of
// space.
m_table.store(&m_stubTable);
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
std::unique_ptr<Table> newTable = Table::create(table->size * 2);
unsigned mask = newTable->mask;
unsigned load = 0;
for (unsigned i = 0; i < table->size; ++i) {
void* ptr = table->array[i].loadRelaxed();
if (!ptr)
continue;
unsigned startIndex = hash(ptr) & mask;
unsigned index = startIndex;
for (;;) {
Atomic<void*>& entryRef = newTable->array[index];
void* entry = entryRef.loadRelaxed();
if (!entry) {
entryRef.storeRelaxed(ptr);
break;
}
RELEASE_ASSERT(entry != ptr);
index = (index + 1) & mask;
RELEASE_ASSERT(index != startIndex);
}
load++;
}
newTable->load.storeRelaxed(load);
Fix race condition in ConcurrentPtrHashSet. https://bugs.webkit.org/show_bug.cgi?id=223241 rdar://74637896 Reviewed by Yusuke Suzuki. JSTests: * stress/race-to-add-opaque-roots-in-ConcurrentPtrHashSet.js: Added. Source/WTF: There exists a race condition where ConcurrentPtrHashSet::resizeIfNecessary() may not capture an entry added by ConcurrentPtrHashSet::addSlow() concurrently. ConcurrentPtrHashSet::addSlow() currently does the following: { if (table->load.exchangeAdd(1) >= table->maxLoad()) // (a1) return resizeAndAdd(ptr); // (a2) for (;;) { void* oldEntry = table->array[index].compareExchangeStrong(nullptr, ptr); // (a3) if (!oldEntry) { if (m_table.load() != table) { // (a4) // We added an entry to an old table! We need to reexecute the add on the new table. return add(ptr); // (a5) } return true; // (a6) } if (oldEntry == ptr) return false; ... // set index to next entry slot to try. } } ConcurrentPtrHashSet::resizeIfNecessary() currently does the following: { auto locker = holdLock(m_lock); // (r1) Table* table = m_table.loadRelaxed(); if (table->load.loadRelaxed() < table->maxLoad()) return; // (r2) std::unique_ptr<Table> newTable = Table::create(table->size * 2); ... for (unsigned i = 0; i < table->size; ++i) { // (r3) void* ptr = table->array[i].loadRelaxed(); if (!ptr) continue; ... // copy ptr to newTable. // (r4) } ... m_table.store(newTable.get()); // (r5) ... } Let's say thread T1 is executing addSlow(), and thread T2 is concurrently executing resizeIfNecessary(). Consider the following scenario (in chronological order): 1. T2 has arrived at just before (r5) i.e. it is already done copying the entries in the old m_table. 2. T1 executes (a3) and writes a new entry into m_table. 3. T1 checks that the table hasn't been replaced at (a4), and sees that it has not. 4. T1 returns at (a6), thinking that its new entry is committed. 5. T2 sets the new m_table at (r5), thereby discarding the new entry that T1 has just written. The fix is to set m_table to a newly introduced m_stubTable at (r2). m_stubTable is set up with a size of 0, and load value of 10. This means it is always full. With this, the following scenarios can play out: Scenario 1: T2 installs m_stubTable before T1 reaches (a1) 1. At (a1), T1 sees that m_table (which is m_stubTable) is full. 2. T1 calls resizeAndAdd() at (a2), which ends up calling resizeIfNecessary() and blocking on the lock at (r1). Scenario 2: T2 installs m_stubTable after T1 reaches just before (a3) 1. T1 writes the new entry at (a3). 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 3: T2 installs m_stubTable after T1 reaches (a3), but before (a4) 1. The new entry has already been added, but we don't know if it made the cut off for T2 to copy it or not. But, it doesn't matter because ... 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 4: T2 installs m_stubTable after T1 reaches (a4) 1. The new entry has already been added. 2. T1 checks m_table at (a4), and sees that it has NOT changed (because T2 hasn't installed m_stubTable yet). This means T2's copy loop is guaranteed to not have started yet i.e. the new entry will definitely be picked up by the copy loop. 3. T1 returns at (a6), and all is well. * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::Table::initializeStub): * wtf/ConcurrentPtrHashSet.h: Canonical link: https://commits.webkit.org/235445@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@274608 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-03-18 02:40:00 +00:00
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
m_table.store(newTable.get());
Fix race condition in ConcurrentPtrHashSet. https://bugs.webkit.org/show_bug.cgi?id=223241 rdar://74637896 Reviewed by Yusuke Suzuki. JSTests: * stress/race-to-add-opaque-roots-in-ConcurrentPtrHashSet.js: Added. Source/WTF: There exists a race condition where ConcurrentPtrHashSet::resizeIfNecessary() may not capture an entry added by ConcurrentPtrHashSet::addSlow() concurrently. ConcurrentPtrHashSet::addSlow() currently does the following: { if (table->load.exchangeAdd(1) >= table->maxLoad()) // (a1) return resizeAndAdd(ptr); // (a2) for (;;) { void* oldEntry = table->array[index].compareExchangeStrong(nullptr, ptr); // (a3) if (!oldEntry) { if (m_table.load() != table) { // (a4) // We added an entry to an old table! We need to reexecute the add on the new table. return add(ptr); // (a5) } return true; // (a6) } if (oldEntry == ptr) return false; ... // set index to next entry slot to try. } } ConcurrentPtrHashSet::resizeIfNecessary() currently does the following: { auto locker = holdLock(m_lock); // (r1) Table* table = m_table.loadRelaxed(); if (table->load.loadRelaxed() < table->maxLoad()) return; // (r2) std::unique_ptr<Table> newTable = Table::create(table->size * 2); ... for (unsigned i = 0; i < table->size; ++i) { // (r3) void* ptr = table->array[i].loadRelaxed(); if (!ptr) continue; ... // copy ptr to newTable. // (r4) } ... m_table.store(newTable.get()); // (r5) ... } Let's say thread T1 is executing addSlow(), and thread T2 is concurrently executing resizeIfNecessary(). Consider the following scenario (in chronological order): 1. T2 has arrived at just before (r5) i.e. it is already done copying the entries in the old m_table. 2. T1 executes (a3) and writes a new entry into m_table. 3. T1 checks that the table hasn't been replaced at (a4), and sees that it has not. 4. T1 returns at (a6), thinking that its new entry is committed. 5. T2 sets the new m_table at (r5), thereby discarding the new entry that T1 has just written. The fix is to set m_table to a newly introduced m_stubTable at (r2). m_stubTable is set up with a size of 0, and load value of 10. This means it is always full. With this, the following scenarios can play out: Scenario 1: T2 installs m_stubTable before T1 reaches (a1) 1. At (a1), T1 sees that m_table (which is m_stubTable) is full. 2. T1 calls resizeAndAdd() at (a2), which ends up calling resizeIfNecessary() and blocking on the lock at (r1). Scenario 2: T2 installs m_stubTable after T1 reaches just before (a3) 1. T1 writes the new entry at (a3). 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 3: T2 installs m_stubTable after T1 reaches (a3), but before (a4) 1. The new entry has already been added, but we don't know if it made the cut off for T2 to copy it or not. But, it doesn't matter because ... 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 4: T2 installs m_stubTable after T1 reaches (a4) 1. The new entry has already been added. 2. T1 checks m_table at (a4), and sees that it has NOT changed (because T2 hasn't installed m_stubTable yet). This means T2's copy loop is guaranteed to not have started yet i.e. the new entry will definitely be picked up by the copy loop. 3. T1 returns at (a6), and all is well. * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::Table::initializeStub): * wtf/ConcurrentPtrHashSet.h: Canonical link: https://commits.webkit.org/235445@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@274608 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-03-18 02:40:00 +00:00
// addSlow() will always start by exchangeAdd'ing 1 to the current m_table's
// load value before checking if it exceeds its max allowed load. For the
// real m_table, this is not an issue because at most, it will accummulate
// up to N extra adds above max load, where N is the number of threads
// concurrrently adding entries.
//
// However, m_table may be replaced with m_stubTable for each resize
// operation. As a result, the cummulative error on its load value
// may far exceed N (as specified above). To fix this, we always reset it
// here to prevent an overflow. Note: a load of stubDefaultLoadValue means
// that m_stubTable is full since its size is 0.
//
// In practice, this won't matter because we most likely won't do so many
// resize operations such that this will get to the point of overflowing.
// However, since resizing is not in the fast path, let's just be pedantic
// and reset it for correctness.
m_stubTable.load.store(Table::stubDefaultLoadValue);
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
m_allTables.append(WTFMove(newTable));
}
bool ConcurrentPtrHashSet::resizeAndAdd(void* ptr)
{
resizeIfNecessary();
return add(ptr);
}
std::unique_ptr<ConcurrentPtrHashSet::Table> ConcurrentPtrHashSet::Table::create(unsigned size)
{
std::unique_ptr<ConcurrentPtrHashSet::Table> result(new (fastMalloc(OBJECT_OFFSETOF(Table, array) + sizeof(Atomic<void*>) * size)) Table());
result->size = size;
result->mask = size - 1;
result->load.storeRelaxed(0);
for (unsigned i = 0; i < size; ++i)
result->array[i].storeRelaxed(nullptr);
return result;
}
Fix race condition in ConcurrentPtrHashSet. https://bugs.webkit.org/show_bug.cgi?id=223241 rdar://74637896 Reviewed by Yusuke Suzuki. JSTests: * stress/race-to-add-opaque-roots-in-ConcurrentPtrHashSet.js: Added. Source/WTF: There exists a race condition where ConcurrentPtrHashSet::resizeIfNecessary() may not capture an entry added by ConcurrentPtrHashSet::addSlow() concurrently. ConcurrentPtrHashSet::addSlow() currently does the following: { if (table->load.exchangeAdd(1) >= table->maxLoad()) // (a1) return resizeAndAdd(ptr); // (a2) for (;;) { void* oldEntry = table->array[index].compareExchangeStrong(nullptr, ptr); // (a3) if (!oldEntry) { if (m_table.load() != table) { // (a4) // We added an entry to an old table! We need to reexecute the add on the new table. return add(ptr); // (a5) } return true; // (a6) } if (oldEntry == ptr) return false; ... // set index to next entry slot to try. } } ConcurrentPtrHashSet::resizeIfNecessary() currently does the following: { auto locker = holdLock(m_lock); // (r1) Table* table = m_table.loadRelaxed(); if (table->load.loadRelaxed() < table->maxLoad()) return; // (r2) std::unique_ptr<Table> newTable = Table::create(table->size * 2); ... for (unsigned i = 0; i < table->size; ++i) { // (r3) void* ptr = table->array[i].loadRelaxed(); if (!ptr) continue; ... // copy ptr to newTable. // (r4) } ... m_table.store(newTable.get()); // (r5) ... } Let's say thread T1 is executing addSlow(), and thread T2 is concurrently executing resizeIfNecessary(). Consider the following scenario (in chronological order): 1. T2 has arrived at just before (r5) i.e. it is already done copying the entries in the old m_table. 2. T1 executes (a3) and writes a new entry into m_table. 3. T1 checks that the table hasn't been replaced at (a4), and sees that it has not. 4. T1 returns at (a6), thinking that its new entry is committed. 5. T2 sets the new m_table at (r5), thereby discarding the new entry that T1 has just written. The fix is to set m_table to a newly introduced m_stubTable at (r2). m_stubTable is set up with a size of 0, and load value of 10. This means it is always full. With this, the following scenarios can play out: Scenario 1: T2 installs m_stubTable before T1 reaches (a1) 1. At (a1), T1 sees that m_table (which is m_stubTable) is full. 2. T1 calls resizeAndAdd() at (a2), which ends up calling resizeIfNecessary() and blocking on the lock at (r1). Scenario 2: T2 installs m_stubTable after T1 reaches just before (a3) 1. T1 writes the new entry at (a3). 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 3: T2 installs m_stubTable after T1 reaches (a3), but before (a4) 1. The new entry has already been added, but we don't know if it made the cut off for T2 to copy it or not. But, it doesn't matter because ... 2. T1 checks m_table at (a4), and sees that it has changed (now pointing to m_stubTable). 3. T1 calls add() again at (a5) to redo the operation, and ends with scenario 1. Scenario 4: T2 installs m_stubTable after T1 reaches (a4) 1. The new entry has already been added. 2. T1 checks m_table at (a4), and sees that it has NOT changed (because T2 hasn't installed m_stubTable yet). This means T2's copy loop is guaranteed to not have started yet i.e. the new entry will definitely be picked up by the copy loop. 3. T1 returns at (a6), and all is well. * wtf/ConcurrentPtrHashSet.cpp: (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::containsImplSlow const): (WTF::ConcurrentPtrHashSet::sizeSlow const): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::Table::initializeStub): * wtf/ConcurrentPtrHashSet.h: Canonical link: https://commits.webkit.org/235445@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@274608 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2021-03-18 02:40:00 +00:00
void ConcurrentPtrHashSet::Table::initializeStub()
{
// The stub table is set up to look like it is already filled up. This is
// so that it can be used during resizing to force all attempts to add to
// be routed to resizeAndAdd() where it will block until the resizing is
// done.
size = 0;
mask = 0;
load.storeRelaxed(stubDefaultLoadValue);
array[0].storeRelaxed(nullptr);
}
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
} // namespace WTF