GC constraint solving should be parallel
https://bugs.webkit.org/show_bug.cgi?id=179934
Reviewed by JF Bastien.
PerformanceTests:
Added a version of splay that measures latency in a way that run-jsc-benchmarks groks.
* Octane/splay.js: Added.
(this.Setup.setup.setup):
(this.TearDown.tearDown.tearDown):
(Benchmark):
(BenchmarkResult):
(BenchmarkResult.prototype.valueOf):
(BenchmarkSuite):
(alert):
(Math.random):
(BenchmarkSuite.ResetRNG):
(RunStep):
(BenchmarkSuite.RunSuites):
(BenchmarkSuite.CountBenchmarks):
(BenchmarkSuite.GeometricMean):
(BenchmarkSuite.GeometricMeanTime):
(BenchmarkSuite.AverageAbovePercentile):
(BenchmarkSuite.GeometricMeanLatency):
(BenchmarkSuite.FormatScore):
(BenchmarkSuite.prototype.NotifyStep):
(BenchmarkSuite.prototype.NotifyResult):
(BenchmarkSuite.prototype.NotifyError):
(BenchmarkSuite.prototype.RunSingleBenchmark):
(RunNextSetup):
(RunNextBenchmark):
(RunNextTearDown):
(BenchmarkSuite.prototype.RunStep):
(GeneratePayloadTree):
(GenerateKey):
(SplayUpdateStats):
(InsertNewNode):
(SplaySetup):
(SplayTearDown):
(SplayRun):
(SplayTree):
(SplayTree.prototype.isEmpty):
(SplayTree.prototype.insert):
(SplayTree.prototype.remove):
(SplayTree.prototype.find):
(SplayTree.prototype.findMax):
(SplayTree.prototype.findGreatestLessThan):
(SplayTree.prototype.exportKeys):
(SplayTree.prototype.splay_):
(SplayTree.Node):
(SplayTree.Node.prototype.traverse_):
(report):
(start):
Source/JavaScriptCore:
This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer
speed-up. It's more than 1% on trunk-Speedometer.
The constraint solver supports running constraints in parallel in two different ways:
- Run multiple constraints in parallel to each other. This only works for constraints that can
tolerate other constraints running concurrently to them (constraint.concurrency() ==
ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the
constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We
could probably make them concurrent, but I'm playing it safe for now.
- A constraint can create parallel work for itself, which the constraint solver will interleave
with other stuff. A constraint can report that it has parallel work by returning
ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that
constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available,
for as long as that function wants to run.
It's not possible to have a non-concurrent constraint that creates parallel work.
The parallelism is implemented in terms of the existing GC marker threads. This turns out to be
most natural for two reasons:
- No need to start any other threads.
- The constraints all want to be passed a SlotVisitor. Running on the marker threads means having
access to those threads' SlotVisitors. Also, it means less load balancing. The solver will
create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker
thread, that thread will have work it can start doing immediately. Before this change, we had to
contribute the work found by the constraint solver to the global worklist so that it could be
distributed to the marker threads by load balancing. This change probably helps to avoid that
load balancing step.
A lot of this change is about making it easy to iterate GC data structures in parallel. This
change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses
the parallel work API. That constraint iterates the marked cells in two subspaces. This change
makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells.
The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel
iterator is just an iterator that can do an atomic next() very quickly. We abstract them using
RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done
when it returns a falsish version of ... (in the current code, that's always a pointer type, so
done is indicated by null).
* API/JSMarkingConstraintPrivate.cpp:
(JSContextGroupAddMarkingConstraint):
* API/JSVirtualMachine.mm:
(scanExternalObjectGraph):
(scanExternalRememberedSet):
* JavaScriptCore.xcodeproj/project.pbxproj:
* Sources.txt:
* bytecode/AccessCase.cpp:
(JSC::AccessCase::propagateTransitions const):
* bytecode/CodeBlock.cpp:
(JSC::CodeBlock::visitWeakly):
(JSC::CodeBlock::shouldJettisonDueToOldAge):
(JSC::shouldMarkTransition):
(JSC::CodeBlock::propagateTransitions):
(JSC::CodeBlock::determineLiveness):
* dfg/DFGWorklist.cpp:
* ftl/FTLCompile.cpp:
(JSC::FTL::compile):
* heap/ConstraintParallelism.h: Added.
(WTF::printInternal):
* heap/Heap.cpp:
(JSC::Heap::Heap):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::runFixpointPhase):
(JSC::Heap::stopThePeriphery):
(JSC::Heap::resumeThePeriphery):
(JSC::Heap::addCoreConstraints):
(JSC::Heap::setBonusVisitorTask):
(JSC::Heap::runTaskInParallel):
(JSC::Heap::forEachSlotVisitor): Deleted.
* heap/Heap.h:
(JSC::Heap::worldIsRunning const):
(JSC::Heap::runFunctionInParallel):
* heap/HeapInlines.h:
(JSC::Heap::worldIsStopped const):
(JSC::Heap::isMarked):
(JSC::Heap::incrementDeferralDepth):
(JSC::Heap::decrementDeferralDepth):
(JSC::Heap::decrementDeferralDepthAndGCIfNeeded):
(JSC::Heap::forEachSlotVisitor):
(JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted.
(JSC::Heap::isMarkedConcurrently): Deleted.
* heap/HeapSnapshotBuilder.cpp:
(JSC::HeapSnapshotBuilder::appendNode):
* heap/LargeAllocation.h:
(JSC::LargeAllocation::isMarked):
(JSC::LargeAllocation::isMarkedConcurrently): Deleted.
* heap/LockDuringMarking.h:
(JSC::lockDuringMarking):
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::parallelNotEmptyBlockSource):
* heap/MarkedAllocator.h:
* heap/MarkedBlock.h:
(JSC::MarkedBlock::aboutToMark):
(JSC::MarkedBlock::isMarked):
(JSC::MarkedBlock::areMarksStaleWithDependency): Deleted.
(JSC::MarkedBlock::isMarkedConcurrently): Deleted.
* heap/MarkedSpace.h:
(JSC::MarkedSpace::activeWeakSetsBegin):
(JSC::MarkedSpace::activeWeakSetsEnd):
(JSC::MarkedSpace::newActiveWeakSetsBegin):
(JSC::MarkedSpace::newActiveWeakSetsEnd):
* heap/MarkingConstraint.cpp:
(JSC::MarkingConstraint::MarkingConstraint):
(JSC::MarkingConstraint::execute):
(JSC::MarkingConstraint::quickWorkEstimate):
(JSC::MarkingConstraint::workEstimate):
(JSC::MarkingConstraint::doParallelWork):
(JSC::MarkingConstraint::finishParallelWork):
(JSC::MarkingConstraint::doParallelWorkImpl):
(JSC::MarkingConstraint::finishParallelWorkImpl):
* heap/MarkingConstraint.h:
(JSC::MarkingConstraint::lastExecuteParallelism const):
(JSC::MarkingConstraint::parallelism const):
(JSC::MarkingConstraint::quickWorkEstimate): Deleted.
(JSC::MarkingConstraint::workEstimate): Deleted.
* heap/MarkingConstraintSet.cpp:
(JSC::MarkingConstraintSet::MarkingConstraintSet):
(JSC::MarkingConstraintSet::add):
(JSC::MarkingConstraintSet::executeConvergence):
(JSC::MarkingConstraintSet::executeConvergenceImpl):
(JSC::MarkingConstraintSet::executeAll):
(JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted.
(): Deleted.
* heap/MarkingConstraintSet.h:
* heap/MarkingConstraintSolver.cpp: Added.
(JSC::MarkingConstraintSolver::MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::~MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::didVisitSomething const):
(JSC::MarkingConstraintSolver::execute):
(JSC::MarkingConstraintSolver::drain):
(JSC::MarkingConstraintSolver::converge):
(JSC::MarkingConstraintSolver::runExecutionThread):
(JSC::MarkingConstraintSolver::didExecute):
* heap/MarkingConstraintSolver.h: Added.
* heap/OpaqueRootSet.h: Removed.
* heap/ParallelSourceAdapter.h: Added.
(JSC::ParallelSourceAdapter::ParallelSourceAdapter):
(JSC::createParallelSourceAdapter):
* heap/SimpleMarkingConstraint.cpp: Added.
(JSC::SimpleMarkingConstraint::SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::quickWorkEstimate):
(JSC::SimpleMarkingConstraint::executeImpl):
* heap/SimpleMarkingConstraint.h: Added.
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::didStartMarking):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::updateMutatorIsStopped):
(JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::performIncrementOfDraining):
(JSC::SlotVisitor::didReachTermination):
(JSC::SlotVisitor::hasWork):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::drainInParallelPassively):
(JSC::SlotVisitor::waitForTermination):
(JSC::SlotVisitor::addOpaqueRoot): Deleted.
(JSC::SlotVisitor::containsOpaqueRoot const): Deleted.
(JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted.
(JSC::SlotVisitor::mergeIfNecessary): Deleted.
(JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted.
(JSC::SlotVisitor::mergeOpaqueRoots): Deleted.
* heap/SlotVisitor.h:
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::addOpaqueRoot):
(JSC::SlotVisitor::containsOpaqueRoot const):
(JSC::SlotVisitor::vm):
(JSC::SlotVisitor::vm const):
* heap/Subspace.cpp:
(JSC::Subspace::parallelAllocatorSource):
(JSC::Subspace::parallelNotEmptyMarkedBlockSource):
* heap/Subspace.h:
* heap/SubspaceInlines.h:
(JSC::Subspace::forEachMarkedCellInParallel):
* heap/VisitCounter.h: Added.
(JSC::VisitCounter::VisitCounter):
(JSC::VisitCounter::visitCount const):
* heap/VisitingTimeout.h: Removed.
* heap/WeakBlock.cpp:
(JSC::WeakBlock::specializedVisit):
* runtime/Structure.cpp:
(JSC::Structure::isCheapDuringGC):
(JSC::Structure::markIfCheap):
Source/WebCore:
No new tests because no change in behavior. This change is best tested using DOM-GC-intensive
benchmarks like Speedometer and Dromaeo.
This parallelizes the DOM's output constraint, and makes some small changes to make this more
scalable.
* ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added.
* ForwardingHeaders/heap/VisitingTimeout.h: Removed.
* Sources.txt:
* WebCore.xcodeproj/project.pbxproj:
* bindings/js/DOMGCOutputConstraint.cpp: Added.
(WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::executeImpl):
(WebCore::DOMGCOutputConstraint::doParallelWorkImpl):
(WebCore::DOMGCOutputConstraint::finishParallelWorkImpl):
* bindings/js/DOMGCOutputConstraint.h: Added.
* bindings/js/WebCoreJSClientData.cpp:
(WebCore::JSVMClientData::initNormalWorld):
* dom/Node.cpp:
(WebCore::Node::eventTargetDataConcurrently):
(WebCore::Node::ensureEventTargetData):
(WebCore::Node::clearEventTargetData):
Source/WTF:
This does some changes to make it easier to do parallel constraint solving:
- I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse
people about what it means to have a dependency chain. I took that as an opportunity to grealy
simplify the GC's use of dependency chaining.
- Added more logic to Deque<>, since I use it for part of the load balancer.
- Made it possible to profile lock contention. See
https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements.
- Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that
to pick a lock in WebCore.
- Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions
sorta like Java's StampedLock.
* WTF.xcodeproj/project.pbxproj:
* wtf/Atomics.h:
(WTF::dependency):
(WTF::DependencyWith::DependencyWith): Deleted.
(WTF::dependencyWith): Deleted.
* wtf/BitVector.h:
(WTF::BitVector::iterator::operator++):
* wtf/CMakeLists.txt:
* wtf/ConcurrentPtrHashSet.cpp: Added.
(WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::deleteOldTables):
(WTF::ConcurrentPtrHashSet::clear):
(WTF::ConcurrentPtrHashSet::initialize):
(WTF::ConcurrentPtrHashSet::addSlow):
(WTF::ConcurrentPtrHashSet::resizeIfNecessary):
(WTF::ConcurrentPtrHashSet::resizeAndAdd):
(WTF::ConcurrentPtrHashSet::Table::create):
* wtf/ConcurrentPtrHashSet.h: Added.
(WTF::ConcurrentPtrHashSet::contains):
(WTF::ConcurrentPtrHashSet::add):
(WTF::ConcurrentPtrHashSet::size const):
(WTF::ConcurrentPtrHashSet::Table::maxLoad const):
(WTF::ConcurrentPtrHashSet::hash):
(WTF::ConcurrentPtrHashSet::cast):
(WTF::ConcurrentPtrHashSet::containsImpl const):
(WTF::ConcurrentPtrHashSet::addImpl):
* wtf/Deque.h:
(WTF::inlineCapacity>::takeFirst):
* wtf/FastMalloc.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
* wtf/Locker.h:
(WTF::holdLockIf):
* wtf/ScopedLambda.h:
* wtf/SharedTask.h:
(WTF::SharedTask<PassedResultType):
(WTF::SharedTask<ResultType): Deleted.
* wtf/StackShot.h: Added.
(WTF::StackShot::StackShot):
(WTF::StackShot::operator=):
(WTF::StackShot::array const):
(WTF::StackShot::size const):
(WTF::StackShot::operator bool const):
(WTF::StackShot::operator== const):
(WTF::StackShot::hash const):
(WTF::StackShot::isHashTableDeletedValue const):
(WTF::StackShot::operator> const):
(WTF::StackShot::deletedValueArray):
(WTF::StackShotHash::hash):
(WTF::StackShotHash::equal):
* wtf/StackShotProfiler.h: Added.
(WTF::StackShotProfiler::StackShotProfiler):
(WTF::StackShotProfiler::profile):
(WTF::StackShotProfiler::run):
Tools:
* Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark.
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC).
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/196360@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
|
|
|
/*
|
|
|
|
* Copyright (C) 2017 Apple Inc. All rights reserved.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
|
|
|
|
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
|
|
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR
|
|
|
|
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
|
|
|
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
|
|
|
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
|
|
|
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
|
|
|
|
* OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
|
|
|
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
|
|
|
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#pragma once
|
|
|
|
|
|
|
|
#include <wtf/DataLog.h>
|
|
|
|
#include <wtf/LockAlgorithm.h>
|
|
|
|
|
|
|
|
namespace WTF {
|
|
|
|
|
|
|
|
// This is mostly just a word-sized WTF::Lock. It supports basically everything that lock supports. But as
|
|
|
|
// a bonus, it atomically counts lock() calls and allows you to perform an optimistic read transaction by
|
|
|
|
// comparing the count before and after the transaction. If at the start of the transaction the lock is
|
|
|
|
// not held and the count remains the same throughout the transaction, then you know that nobody could
|
|
|
|
// have modified your data structure while you ran. You can even use this to optimistically read pointers
|
|
|
|
// that could become dangling under concurrent writes, if you just revalidate the count every time you're
|
|
|
|
// about to do something dangerous.
|
|
|
|
//
|
|
|
|
// This is largely inspired by StampedLock from Java:
|
|
|
|
// https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/CountingLock.html
|
|
|
|
//
|
|
|
|
// This is simplified a lot compared to StampedLock. Unlike StampedLock, it uses an exclusive lock as a
|
|
|
|
// fallback. There is no way to acquire a CountingLock for read. The only read access is via optimistic
|
|
|
|
// read transactions.
|
|
|
|
//
|
|
|
|
// CountingLock provides two ways of doing optimistic reads:
|
|
|
|
//
|
|
|
|
// - The easy way, where CountingLock does all of the fencing for you. That fencing is free on x86 but
|
|
|
|
// somewhat expensive on ARM.
|
|
|
|
// - The hard way, where you do fencing yourself using Dependency. This allows you to be fenceless on both
|
|
|
|
// x86 and ARM.
|
|
|
|
//
|
|
|
|
// The latter is important for us because some GC paths are known to be sensitive to fences on ARM.
|
|
|
|
|
2019-08-12 20:57:15 +00:00
|
|
|
class CountingLock final {
|
2017-12-07 03:52:09 +00:00
|
|
|
WTF_MAKE_NONCOPYABLE(CountingLock);
|
|
|
|
WTF_MAKE_FAST_ALLOCATED;
|
|
|
|
|
GC constraint solving should be parallel
https://bugs.webkit.org/show_bug.cgi?id=179934
Reviewed by JF Bastien.
PerformanceTests:
Added a version of splay that measures latency in a way that run-jsc-benchmarks groks.
* Octane/splay.js: Added.
(this.Setup.setup.setup):
(this.TearDown.tearDown.tearDown):
(Benchmark):
(BenchmarkResult):
(BenchmarkResult.prototype.valueOf):
(BenchmarkSuite):
(alert):
(Math.random):
(BenchmarkSuite.ResetRNG):
(RunStep):
(BenchmarkSuite.RunSuites):
(BenchmarkSuite.CountBenchmarks):
(BenchmarkSuite.GeometricMean):
(BenchmarkSuite.GeometricMeanTime):
(BenchmarkSuite.AverageAbovePercentile):
(BenchmarkSuite.GeometricMeanLatency):
(BenchmarkSuite.FormatScore):
(BenchmarkSuite.prototype.NotifyStep):
(BenchmarkSuite.prototype.NotifyResult):
(BenchmarkSuite.prototype.NotifyError):
(BenchmarkSuite.prototype.RunSingleBenchmark):
(RunNextSetup):
(RunNextBenchmark):
(RunNextTearDown):
(BenchmarkSuite.prototype.RunStep):
(GeneratePayloadTree):
(GenerateKey):
(SplayUpdateStats):
(InsertNewNode):
(SplaySetup):
(SplayTearDown):
(SplayRun):
(SplayTree):
(SplayTree.prototype.isEmpty):
(SplayTree.prototype.insert):
(SplayTree.prototype.remove):
(SplayTree.prototype.find):
(SplayTree.prototype.findMax):
(SplayTree.prototype.findGreatestLessThan):
(SplayTree.prototype.exportKeys):
(SplayTree.prototype.splay_):
(SplayTree.Node):
(SplayTree.Node.prototype.traverse_):
(report):
(start):
Source/JavaScriptCore:
This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer
speed-up. It's more than 1% on trunk-Speedometer.
The constraint solver supports running constraints in parallel in two different ways:
- Run multiple constraints in parallel to each other. This only works for constraints that can
tolerate other constraints running concurrently to them (constraint.concurrency() ==
ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the
constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We
could probably make them concurrent, but I'm playing it safe for now.
- A constraint can create parallel work for itself, which the constraint solver will interleave
with other stuff. A constraint can report that it has parallel work by returning
ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that
constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available,
for as long as that function wants to run.
It's not possible to have a non-concurrent constraint that creates parallel work.
The parallelism is implemented in terms of the existing GC marker threads. This turns out to be
most natural for two reasons:
- No need to start any other threads.
- The constraints all want to be passed a SlotVisitor. Running on the marker threads means having
access to those threads' SlotVisitors. Also, it means less load balancing. The solver will
create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker
thread, that thread will have work it can start doing immediately. Before this change, we had to
contribute the work found by the constraint solver to the global worklist so that it could be
distributed to the marker threads by load balancing. This change probably helps to avoid that
load balancing step.
A lot of this change is about making it easy to iterate GC data structures in parallel. This
change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses
the parallel work API. That constraint iterates the marked cells in two subspaces. This change
makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells.
The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel
iterator is just an iterator that can do an atomic next() very quickly. We abstract them using
RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done
when it returns a falsish version of ... (in the current code, that's always a pointer type, so
done is indicated by null).
* API/JSMarkingConstraintPrivate.cpp:
(JSContextGroupAddMarkingConstraint):
* API/JSVirtualMachine.mm:
(scanExternalObjectGraph):
(scanExternalRememberedSet):
* JavaScriptCore.xcodeproj/project.pbxproj:
* Sources.txt:
* bytecode/AccessCase.cpp:
(JSC::AccessCase::propagateTransitions const):
* bytecode/CodeBlock.cpp:
(JSC::CodeBlock::visitWeakly):
(JSC::CodeBlock::shouldJettisonDueToOldAge):
(JSC::shouldMarkTransition):
(JSC::CodeBlock::propagateTransitions):
(JSC::CodeBlock::determineLiveness):
* dfg/DFGWorklist.cpp:
* ftl/FTLCompile.cpp:
(JSC::FTL::compile):
* heap/ConstraintParallelism.h: Added.
(WTF::printInternal):
* heap/Heap.cpp:
(JSC::Heap::Heap):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::runFixpointPhase):
(JSC::Heap::stopThePeriphery):
(JSC::Heap::resumeThePeriphery):
(JSC::Heap::addCoreConstraints):
(JSC::Heap::setBonusVisitorTask):
(JSC::Heap::runTaskInParallel):
(JSC::Heap::forEachSlotVisitor): Deleted.
* heap/Heap.h:
(JSC::Heap::worldIsRunning const):
(JSC::Heap::runFunctionInParallel):
* heap/HeapInlines.h:
(JSC::Heap::worldIsStopped const):
(JSC::Heap::isMarked):
(JSC::Heap::incrementDeferralDepth):
(JSC::Heap::decrementDeferralDepth):
(JSC::Heap::decrementDeferralDepthAndGCIfNeeded):
(JSC::Heap::forEachSlotVisitor):
(JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted.
(JSC::Heap::isMarkedConcurrently): Deleted.
* heap/HeapSnapshotBuilder.cpp:
(JSC::HeapSnapshotBuilder::appendNode):
* heap/LargeAllocation.h:
(JSC::LargeAllocation::isMarked):
(JSC::LargeAllocation::isMarkedConcurrently): Deleted.
* heap/LockDuringMarking.h:
(JSC::lockDuringMarking):
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::parallelNotEmptyBlockSource):
* heap/MarkedAllocator.h:
* heap/MarkedBlock.h:
(JSC::MarkedBlock::aboutToMark):
(JSC::MarkedBlock::isMarked):
(JSC::MarkedBlock::areMarksStaleWithDependency): Deleted.
(JSC::MarkedBlock::isMarkedConcurrently): Deleted.
* heap/MarkedSpace.h:
(JSC::MarkedSpace::activeWeakSetsBegin):
(JSC::MarkedSpace::activeWeakSetsEnd):
(JSC::MarkedSpace::newActiveWeakSetsBegin):
(JSC::MarkedSpace::newActiveWeakSetsEnd):
* heap/MarkingConstraint.cpp:
(JSC::MarkingConstraint::MarkingConstraint):
(JSC::MarkingConstraint::execute):
(JSC::MarkingConstraint::quickWorkEstimate):
(JSC::MarkingConstraint::workEstimate):
(JSC::MarkingConstraint::doParallelWork):
(JSC::MarkingConstraint::finishParallelWork):
(JSC::MarkingConstraint::doParallelWorkImpl):
(JSC::MarkingConstraint::finishParallelWorkImpl):
* heap/MarkingConstraint.h:
(JSC::MarkingConstraint::lastExecuteParallelism const):
(JSC::MarkingConstraint::parallelism const):
(JSC::MarkingConstraint::quickWorkEstimate): Deleted.
(JSC::MarkingConstraint::workEstimate): Deleted.
* heap/MarkingConstraintSet.cpp:
(JSC::MarkingConstraintSet::MarkingConstraintSet):
(JSC::MarkingConstraintSet::add):
(JSC::MarkingConstraintSet::executeConvergence):
(JSC::MarkingConstraintSet::executeConvergenceImpl):
(JSC::MarkingConstraintSet::executeAll):
(JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted.
(): Deleted.
* heap/MarkingConstraintSet.h:
* heap/MarkingConstraintSolver.cpp: Added.
(JSC::MarkingConstraintSolver::MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::~MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::didVisitSomething const):
(JSC::MarkingConstraintSolver::execute):
(JSC::MarkingConstraintSolver::drain):
(JSC::MarkingConstraintSolver::converge):
(JSC::MarkingConstraintSolver::runExecutionThread):
(JSC::MarkingConstraintSolver::didExecute):
* heap/MarkingConstraintSolver.h: Added.
* heap/OpaqueRootSet.h: Removed.
* heap/ParallelSourceAdapter.h: Added.
(JSC::ParallelSourceAdapter::ParallelSourceAdapter):
(JSC::createParallelSourceAdapter):
* heap/SimpleMarkingConstraint.cpp: Added.
(JSC::SimpleMarkingConstraint::SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::quickWorkEstimate):
(JSC::SimpleMarkingConstraint::executeImpl):
* heap/SimpleMarkingConstraint.h: Added.
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::didStartMarking):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::updateMutatorIsStopped):
(JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::performIncrementOfDraining):
(JSC::SlotVisitor::didReachTermination):
(JSC::SlotVisitor::hasWork):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::drainInParallelPassively):
(JSC::SlotVisitor::waitForTermination):
(JSC::SlotVisitor::addOpaqueRoot): Deleted.
(JSC::SlotVisitor::containsOpaqueRoot const): Deleted.
(JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted.
(JSC::SlotVisitor::mergeIfNecessary): Deleted.
(JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted.
(JSC::SlotVisitor::mergeOpaqueRoots): Deleted.
* heap/SlotVisitor.h:
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::addOpaqueRoot):
(JSC::SlotVisitor::containsOpaqueRoot const):
(JSC::SlotVisitor::vm):
(JSC::SlotVisitor::vm const):
* heap/Subspace.cpp:
(JSC::Subspace::parallelAllocatorSource):
(JSC::Subspace::parallelNotEmptyMarkedBlockSource):
* heap/Subspace.h:
* heap/SubspaceInlines.h:
(JSC::Subspace::forEachMarkedCellInParallel):
* heap/VisitCounter.h: Added.
(JSC::VisitCounter::VisitCounter):
(JSC::VisitCounter::visitCount const):
* heap/VisitingTimeout.h: Removed.
* heap/WeakBlock.cpp:
(JSC::WeakBlock::specializedVisit):
* runtime/Structure.cpp:
(JSC::Structure::isCheapDuringGC):
(JSC::Structure::markIfCheap):
Source/WebCore:
No new tests because no change in behavior. This change is best tested using DOM-GC-intensive
benchmarks like Speedometer and Dromaeo.
This parallelizes the DOM's output constraint, and makes some small changes to make this more
scalable.
* ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added.
* ForwardingHeaders/heap/VisitingTimeout.h: Removed.
* Sources.txt:
* WebCore.xcodeproj/project.pbxproj:
* bindings/js/DOMGCOutputConstraint.cpp: Added.
(WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::executeImpl):
(WebCore::DOMGCOutputConstraint::doParallelWorkImpl):
(WebCore::DOMGCOutputConstraint::finishParallelWorkImpl):
* bindings/js/DOMGCOutputConstraint.h: Added.
* bindings/js/WebCoreJSClientData.cpp:
(WebCore::JSVMClientData::initNormalWorld):
* dom/Node.cpp:
(WebCore::Node::eventTargetDataConcurrently):
(WebCore::Node::ensureEventTargetData):
(WebCore::Node::clearEventTargetData):
Source/WTF:
This does some changes to make it easier to do parallel constraint solving:
- I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse
people about what it means to have a dependency chain. I took that as an opportunity to grealy
simplify the GC's use of dependency chaining.
- Added more logic to Deque<>, since I use it for part of the load balancer.
- Made it possible to profile lock contention. See
https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements.
- Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that
to pick a lock in WebCore.
- Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions
sorta like Java's StampedLock.
* WTF.xcodeproj/project.pbxproj:
* wtf/Atomics.h:
(WTF::dependency):
(WTF::DependencyWith::DependencyWith): Deleted.
(WTF::dependencyWith): Deleted.
* wtf/BitVector.h:
(WTF::BitVector::iterator::operator++):
* wtf/CMakeLists.txt:
* wtf/ConcurrentPtrHashSet.cpp: Added.
(WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::deleteOldTables):
(WTF::ConcurrentPtrHashSet::clear):
(WTF::ConcurrentPtrHashSet::initialize):
(WTF::ConcurrentPtrHashSet::addSlow):
(WTF::ConcurrentPtrHashSet::resizeIfNecessary):
(WTF::ConcurrentPtrHashSet::resizeAndAdd):
(WTF::ConcurrentPtrHashSet::Table::create):
* wtf/ConcurrentPtrHashSet.h: Added.
(WTF::ConcurrentPtrHashSet::contains):
(WTF::ConcurrentPtrHashSet::add):
(WTF::ConcurrentPtrHashSet::size const):
(WTF::ConcurrentPtrHashSet::Table::maxLoad const):
(WTF::ConcurrentPtrHashSet::hash):
(WTF::ConcurrentPtrHashSet::cast):
(WTF::ConcurrentPtrHashSet::containsImpl const):
(WTF::ConcurrentPtrHashSet::addImpl):
* wtf/Deque.h:
(WTF::inlineCapacity>::takeFirst):
* wtf/FastMalloc.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
* wtf/Locker.h:
(WTF::holdLockIf):
* wtf/ScopedLambda.h:
* wtf/SharedTask.h:
(WTF::SharedTask<PassedResultType):
(WTF::SharedTask<ResultType): Deleted.
* wtf/StackShot.h: Added.
(WTF::StackShot::StackShot):
(WTF::StackShot::operator=):
(WTF::StackShot::array const):
(WTF::StackShot::size const):
(WTF::StackShot::operator bool const):
(WTF::StackShot::operator== const):
(WTF::StackShot::hash const):
(WTF::StackShot::isHashTableDeletedValue const):
(WTF::StackShot::operator> const):
(WTF::StackShot::deletedValueArray):
(WTF::StackShotHash::hash):
(WTF::StackShotHash::equal):
* wtf/StackShotProfiler.h: Added.
(WTF::StackShotProfiler::StackShotProfiler):
(WTF::StackShotProfiler::profile):
(WTF::StackShotProfiler::run):
Tools:
* Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark.
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC).
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/196360@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
|
|
|
typedef unsigned LockType;
|
|
|
|
|
|
|
|
static constexpr LockType isHeldBit = 1;
|
|
|
|
static constexpr LockType hasParkedBit = 2;
|
|
|
|
static constexpr LockType mask = isHeldBit | hasParkedBit;
|
|
|
|
static constexpr LockType shift = 2;
|
|
|
|
static constexpr LockType countUnit = 4;
|
|
|
|
|
|
|
|
struct LockHooks {
|
|
|
|
static LockType lockHook(LockType value)
|
|
|
|
{
|
|
|
|
return value + countUnit;
|
|
|
|
}
|
|
|
|
|
|
|
|
static LockType unlockHook(LockType value) { return value; }
|
|
|
|
static LockType parkHook(LockType value) { return value; }
|
|
|
|
static LockType handoffHook(LockType value) { return value; }
|
|
|
|
};
|
|
|
|
|
|
|
|
typedef LockAlgorithm<LockType, isHeldBit, hasParkedBit, LockHooks> ExclusiveAlgorithm;
|
|
|
|
|
|
|
|
public:
|
2017-12-07 03:52:09 +00:00
|
|
|
CountingLock() = default;
|
GC constraint solving should be parallel
https://bugs.webkit.org/show_bug.cgi?id=179934
Reviewed by JF Bastien.
PerformanceTests:
Added a version of splay that measures latency in a way that run-jsc-benchmarks groks.
* Octane/splay.js: Added.
(this.Setup.setup.setup):
(this.TearDown.tearDown.tearDown):
(Benchmark):
(BenchmarkResult):
(BenchmarkResult.prototype.valueOf):
(BenchmarkSuite):
(alert):
(Math.random):
(BenchmarkSuite.ResetRNG):
(RunStep):
(BenchmarkSuite.RunSuites):
(BenchmarkSuite.CountBenchmarks):
(BenchmarkSuite.GeometricMean):
(BenchmarkSuite.GeometricMeanTime):
(BenchmarkSuite.AverageAbovePercentile):
(BenchmarkSuite.GeometricMeanLatency):
(BenchmarkSuite.FormatScore):
(BenchmarkSuite.prototype.NotifyStep):
(BenchmarkSuite.prototype.NotifyResult):
(BenchmarkSuite.prototype.NotifyError):
(BenchmarkSuite.prototype.RunSingleBenchmark):
(RunNextSetup):
(RunNextBenchmark):
(RunNextTearDown):
(BenchmarkSuite.prototype.RunStep):
(GeneratePayloadTree):
(GenerateKey):
(SplayUpdateStats):
(InsertNewNode):
(SplaySetup):
(SplayTearDown):
(SplayRun):
(SplayTree):
(SplayTree.prototype.isEmpty):
(SplayTree.prototype.insert):
(SplayTree.prototype.remove):
(SplayTree.prototype.find):
(SplayTree.prototype.findMax):
(SplayTree.prototype.findGreatestLessThan):
(SplayTree.prototype.exportKeys):
(SplayTree.prototype.splay_):
(SplayTree.Node):
(SplayTree.Node.prototype.traverse_):
(report):
(start):
Source/JavaScriptCore:
This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer
speed-up. It's more than 1% on trunk-Speedometer.
The constraint solver supports running constraints in parallel in two different ways:
- Run multiple constraints in parallel to each other. This only works for constraints that can
tolerate other constraints running concurrently to them (constraint.concurrency() ==
ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the
constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We
could probably make them concurrent, but I'm playing it safe for now.
- A constraint can create parallel work for itself, which the constraint solver will interleave
with other stuff. A constraint can report that it has parallel work by returning
ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that
constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available,
for as long as that function wants to run.
It's not possible to have a non-concurrent constraint that creates parallel work.
The parallelism is implemented in terms of the existing GC marker threads. This turns out to be
most natural for two reasons:
- No need to start any other threads.
- The constraints all want to be passed a SlotVisitor. Running on the marker threads means having
access to those threads' SlotVisitors. Also, it means less load balancing. The solver will
create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker
thread, that thread will have work it can start doing immediately. Before this change, we had to
contribute the work found by the constraint solver to the global worklist so that it could be
distributed to the marker threads by load balancing. This change probably helps to avoid that
load balancing step.
A lot of this change is about making it easy to iterate GC data structures in parallel. This
change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses
the parallel work API. That constraint iterates the marked cells in two subspaces. This change
makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells.
The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel
iterator is just an iterator that can do an atomic next() very quickly. We abstract them using
RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done
when it returns a falsish version of ... (in the current code, that's always a pointer type, so
done is indicated by null).
* API/JSMarkingConstraintPrivate.cpp:
(JSContextGroupAddMarkingConstraint):
* API/JSVirtualMachine.mm:
(scanExternalObjectGraph):
(scanExternalRememberedSet):
* JavaScriptCore.xcodeproj/project.pbxproj:
* Sources.txt:
* bytecode/AccessCase.cpp:
(JSC::AccessCase::propagateTransitions const):
* bytecode/CodeBlock.cpp:
(JSC::CodeBlock::visitWeakly):
(JSC::CodeBlock::shouldJettisonDueToOldAge):
(JSC::shouldMarkTransition):
(JSC::CodeBlock::propagateTransitions):
(JSC::CodeBlock::determineLiveness):
* dfg/DFGWorklist.cpp:
* ftl/FTLCompile.cpp:
(JSC::FTL::compile):
* heap/ConstraintParallelism.h: Added.
(WTF::printInternal):
* heap/Heap.cpp:
(JSC::Heap::Heap):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::runFixpointPhase):
(JSC::Heap::stopThePeriphery):
(JSC::Heap::resumeThePeriphery):
(JSC::Heap::addCoreConstraints):
(JSC::Heap::setBonusVisitorTask):
(JSC::Heap::runTaskInParallel):
(JSC::Heap::forEachSlotVisitor): Deleted.
* heap/Heap.h:
(JSC::Heap::worldIsRunning const):
(JSC::Heap::runFunctionInParallel):
* heap/HeapInlines.h:
(JSC::Heap::worldIsStopped const):
(JSC::Heap::isMarked):
(JSC::Heap::incrementDeferralDepth):
(JSC::Heap::decrementDeferralDepth):
(JSC::Heap::decrementDeferralDepthAndGCIfNeeded):
(JSC::Heap::forEachSlotVisitor):
(JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted.
(JSC::Heap::isMarkedConcurrently): Deleted.
* heap/HeapSnapshotBuilder.cpp:
(JSC::HeapSnapshotBuilder::appendNode):
* heap/LargeAllocation.h:
(JSC::LargeAllocation::isMarked):
(JSC::LargeAllocation::isMarkedConcurrently): Deleted.
* heap/LockDuringMarking.h:
(JSC::lockDuringMarking):
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::parallelNotEmptyBlockSource):
* heap/MarkedAllocator.h:
* heap/MarkedBlock.h:
(JSC::MarkedBlock::aboutToMark):
(JSC::MarkedBlock::isMarked):
(JSC::MarkedBlock::areMarksStaleWithDependency): Deleted.
(JSC::MarkedBlock::isMarkedConcurrently): Deleted.
* heap/MarkedSpace.h:
(JSC::MarkedSpace::activeWeakSetsBegin):
(JSC::MarkedSpace::activeWeakSetsEnd):
(JSC::MarkedSpace::newActiveWeakSetsBegin):
(JSC::MarkedSpace::newActiveWeakSetsEnd):
* heap/MarkingConstraint.cpp:
(JSC::MarkingConstraint::MarkingConstraint):
(JSC::MarkingConstraint::execute):
(JSC::MarkingConstraint::quickWorkEstimate):
(JSC::MarkingConstraint::workEstimate):
(JSC::MarkingConstraint::doParallelWork):
(JSC::MarkingConstraint::finishParallelWork):
(JSC::MarkingConstraint::doParallelWorkImpl):
(JSC::MarkingConstraint::finishParallelWorkImpl):
* heap/MarkingConstraint.h:
(JSC::MarkingConstraint::lastExecuteParallelism const):
(JSC::MarkingConstraint::parallelism const):
(JSC::MarkingConstraint::quickWorkEstimate): Deleted.
(JSC::MarkingConstraint::workEstimate): Deleted.
* heap/MarkingConstraintSet.cpp:
(JSC::MarkingConstraintSet::MarkingConstraintSet):
(JSC::MarkingConstraintSet::add):
(JSC::MarkingConstraintSet::executeConvergence):
(JSC::MarkingConstraintSet::executeConvergenceImpl):
(JSC::MarkingConstraintSet::executeAll):
(JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted.
(): Deleted.
* heap/MarkingConstraintSet.h:
* heap/MarkingConstraintSolver.cpp: Added.
(JSC::MarkingConstraintSolver::MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::~MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::didVisitSomething const):
(JSC::MarkingConstraintSolver::execute):
(JSC::MarkingConstraintSolver::drain):
(JSC::MarkingConstraintSolver::converge):
(JSC::MarkingConstraintSolver::runExecutionThread):
(JSC::MarkingConstraintSolver::didExecute):
* heap/MarkingConstraintSolver.h: Added.
* heap/OpaqueRootSet.h: Removed.
* heap/ParallelSourceAdapter.h: Added.
(JSC::ParallelSourceAdapter::ParallelSourceAdapter):
(JSC::createParallelSourceAdapter):
* heap/SimpleMarkingConstraint.cpp: Added.
(JSC::SimpleMarkingConstraint::SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::quickWorkEstimate):
(JSC::SimpleMarkingConstraint::executeImpl):
* heap/SimpleMarkingConstraint.h: Added.
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::didStartMarking):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::updateMutatorIsStopped):
(JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::performIncrementOfDraining):
(JSC::SlotVisitor::didReachTermination):
(JSC::SlotVisitor::hasWork):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::drainInParallelPassively):
(JSC::SlotVisitor::waitForTermination):
(JSC::SlotVisitor::addOpaqueRoot): Deleted.
(JSC::SlotVisitor::containsOpaqueRoot const): Deleted.
(JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted.
(JSC::SlotVisitor::mergeIfNecessary): Deleted.
(JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted.
(JSC::SlotVisitor::mergeOpaqueRoots): Deleted.
* heap/SlotVisitor.h:
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::addOpaqueRoot):
(JSC::SlotVisitor::containsOpaqueRoot const):
(JSC::SlotVisitor::vm):
(JSC::SlotVisitor::vm const):
* heap/Subspace.cpp:
(JSC::Subspace::parallelAllocatorSource):
(JSC::Subspace::parallelNotEmptyMarkedBlockSource):
* heap/Subspace.h:
* heap/SubspaceInlines.h:
(JSC::Subspace::forEachMarkedCellInParallel):
* heap/VisitCounter.h: Added.
(JSC::VisitCounter::VisitCounter):
(JSC::VisitCounter::visitCount const):
* heap/VisitingTimeout.h: Removed.
* heap/WeakBlock.cpp:
(JSC::WeakBlock::specializedVisit):
* runtime/Structure.cpp:
(JSC::Structure::isCheapDuringGC):
(JSC::Structure::markIfCheap):
Source/WebCore:
No new tests because no change in behavior. This change is best tested using DOM-GC-intensive
benchmarks like Speedometer and Dromaeo.
This parallelizes the DOM's output constraint, and makes some small changes to make this more
scalable.
* ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added.
* ForwardingHeaders/heap/VisitingTimeout.h: Removed.
* Sources.txt:
* WebCore.xcodeproj/project.pbxproj:
* bindings/js/DOMGCOutputConstraint.cpp: Added.
(WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::executeImpl):
(WebCore::DOMGCOutputConstraint::doParallelWorkImpl):
(WebCore::DOMGCOutputConstraint::finishParallelWorkImpl):
* bindings/js/DOMGCOutputConstraint.h: Added.
* bindings/js/WebCoreJSClientData.cpp:
(WebCore::JSVMClientData::initNormalWorld):
* dom/Node.cpp:
(WebCore::Node::eventTargetDataConcurrently):
(WebCore::Node::ensureEventTargetData):
(WebCore::Node::clearEventTargetData):
Source/WTF:
This does some changes to make it easier to do parallel constraint solving:
- I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse
people about what it means to have a dependency chain. I took that as an opportunity to grealy
simplify the GC's use of dependency chaining.
- Added more logic to Deque<>, since I use it for part of the load balancer.
- Made it possible to profile lock contention. See
https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements.
- Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that
to pick a lock in WebCore.
- Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions
sorta like Java's StampedLock.
* WTF.xcodeproj/project.pbxproj:
* wtf/Atomics.h:
(WTF::dependency):
(WTF::DependencyWith::DependencyWith): Deleted.
(WTF::dependencyWith): Deleted.
* wtf/BitVector.h:
(WTF::BitVector::iterator::operator++):
* wtf/CMakeLists.txt:
* wtf/ConcurrentPtrHashSet.cpp: Added.
(WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::deleteOldTables):
(WTF::ConcurrentPtrHashSet::clear):
(WTF::ConcurrentPtrHashSet::initialize):
(WTF::ConcurrentPtrHashSet::addSlow):
(WTF::ConcurrentPtrHashSet::resizeIfNecessary):
(WTF::ConcurrentPtrHashSet::resizeAndAdd):
(WTF::ConcurrentPtrHashSet::Table::create):
* wtf/ConcurrentPtrHashSet.h: Added.
(WTF::ConcurrentPtrHashSet::contains):
(WTF::ConcurrentPtrHashSet::add):
(WTF::ConcurrentPtrHashSet::size const):
(WTF::ConcurrentPtrHashSet::Table::maxLoad const):
(WTF::ConcurrentPtrHashSet::hash):
(WTF::ConcurrentPtrHashSet::cast):
(WTF::ConcurrentPtrHashSet::containsImpl const):
(WTF::ConcurrentPtrHashSet::addImpl):
* wtf/Deque.h:
(WTF::inlineCapacity>::takeFirst):
* wtf/FastMalloc.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
* wtf/Locker.h:
(WTF::holdLockIf):
* wtf/ScopedLambda.h:
* wtf/SharedTask.h:
(WTF::SharedTask<PassedResultType):
(WTF::SharedTask<ResultType): Deleted.
* wtf/StackShot.h: Added.
(WTF::StackShot::StackShot):
(WTF::StackShot::operator=):
(WTF::StackShot::array const):
(WTF::StackShot::size const):
(WTF::StackShot::operator bool const):
(WTF::StackShot::operator== const):
(WTF::StackShot::hash const):
(WTF::StackShot::isHashTableDeletedValue const):
(WTF::StackShot::operator> const):
(WTF::StackShot::deletedValueArray):
(WTF::StackShotHash::hash):
(WTF::StackShotHash::equal):
* wtf/StackShotProfiler.h: Added.
(WTF::StackShotProfiler::StackShotProfiler):
(WTF::StackShotProfiler::profile):
(WTF::StackShotProfiler::run):
Tools:
* Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark.
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC).
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/196360@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
|
|
|
|
|
|
|
bool tryLock()
|
|
|
|
{
|
|
|
|
return ExclusiveAlgorithm::tryLock(m_word);
|
|
|
|
}
|
|
|
|
|
|
|
|
void lock()
|
|
|
|
{
|
|
|
|
if (UNLIKELY(!ExclusiveAlgorithm::lockFast(m_word)))
|
|
|
|
lockSlow();
|
|
|
|
}
|
|
|
|
|
|
|
|
void unlock()
|
|
|
|
{
|
|
|
|
if (UNLIKELY(!ExclusiveAlgorithm::unlockFast(m_word)))
|
|
|
|
unlockSlow();
|
|
|
|
}
|
|
|
|
|
|
|
|
bool isHeld() const
|
|
|
|
{
|
|
|
|
return ExclusiveAlgorithm::isLocked(m_word);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool isLocked() const
|
|
|
|
{
|
|
|
|
return isHeld();
|
|
|
|
}
|
|
|
|
|
|
|
|
// The only thing you're allowed to infer from this value is that if it's zero, then you didn't get
|
|
|
|
// a real count.
|
|
|
|
class Count {
|
|
|
|
public:
|
|
|
|
explicit operator bool() const { return !!m_value; }
|
|
|
|
|
|
|
|
bool operator==(const Count& other) const { return m_value == other.m_value; }
|
|
|
|
bool operator!=(const Count& other) const { return m_value != other.m_value; }
|
|
|
|
|
|
|
|
private:
|
|
|
|
friend class CountingLock;
|
|
|
|
|
|
|
|
LockType m_value { 0 };
|
|
|
|
};
|
|
|
|
|
|
|
|
// Example of how to use this:
|
|
|
|
//
|
|
|
|
// int read()
|
|
|
|
// {
|
|
|
|
// if (CountingLock::Count count = m_lock.tryOptimisticRead()) {
|
|
|
|
// int value = m_things;
|
|
|
|
// if (m_lock.validate(count))
|
|
|
|
// return value; // success!
|
|
|
|
// }
|
2021-05-22 00:11:37 +00:00
|
|
|
// Locker locker { m_lock };
|
GC constraint solving should be parallel
https://bugs.webkit.org/show_bug.cgi?id=179934
Reviewed by JF Bastien.
PerformanceTests:
Added a version of splay that measures latency in a way that run-jsc-benchmarks groks.
* Octane/splay.js: Added.
(this.Setup.setup.setup):
(this.TearDown.tearDown.tearDown):
(Benchmark):
(BenchmarkResult):
(BenchmarkResult.prototype.valueOf):
(BenchmarkSuite):
(alert):
(Math.random):
(BenchmarkSuite.ResetRNG):
(RunStep):
(BenchmarkSuite.RunSuites):
(BenchmarkSuite.CountBenchmarks):
(BenchmarkSuite.GeometricMean):
(BenchmarkSuite.GeometricMeanTime):
(BenchmarkSuite.AverageAbovePercentile):
(BenchmarkSuite.GeometricMeanLatency):
(BenchmarkSuite.FormatScore):
(BenchmarkSuite.prototype.NotifyStep):
(BenchmarkSuite.prototype.NotifyResult):
(BenchmarkSuite.prototype.NotifyError):
(BenchmarkSuite.prototype.RunSingleBenchmark):
(RunNextSetup):
(RunNextBenchmark):
(RunNextTearDown):
(BenchmarkSuite.prototype.RunStep):
(GeneratePayloadTree):
(GenerateKey):
(SplayUpdateStats):
(InsertNewNode):
(SplaySetup):
(SplayTearDown):
(SplayRun):
(SplayTree):
(SplayTree.prototype.isEmpty):
(SplayTree.prototype.insert):
(SplayTree.prototype.remove):
(SplayTree.prototype.find):
(SplayTree.prototype.findMax):
(SplayTree.prototype.findGreatestLessThan):
(SplayTree.prototype.exportKeys):
(SplayTree.prototype.splay_):
(SplayTree.Node):
(SplayTree.Node.prototype.traverse_):
(report):
(start):
Source/JavaScriptCore:
This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer
speed-up. It's more than 1% on trunk-Speedometer.
The constraint solver supports running constraints in parallel in two different ways:
- Run multiple constraints in parallel to each other. This only works for constraints that can
tolerate other constraints running concurrently to them (constraint.concurrency() ==
ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the
constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We
could probably make them concurrent, but I'm playing it safe for now.
- A constraint can create parallel work for itself, which the constraint solver will interleave
with other stuff. A constraint can report that it has parallel work by returning
ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that
constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available,
for as long as that function wants to run.
It's not possible to have a non-concurrent constraint that creates parallel work.
The parallelism is implemented in terms of the existing GC marker threads. This turns out to be
most natural for two reasons:
- No need to start any other threads.
- The constraints all want to be passed a SlotVisitor. Running on the marker threads means having
access to those threads' SlotVisitors. Also, it means less load balancing. The solver will
create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker
thread, that thread will have work it can start doing immediately. Before this change, we had to
contribute the work found by the constraint solver to the global worklist so that it could be
distributed to the marker threads by load balancing. This change probably helps to avoid that
load balancing step.
A lot of this change is about making it easy to iterate GC data structures in parallel. This
change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses
the parallel work API. That constraint iterates the marked cells in two subspaces. This change
makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells.
The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel
iterator is just an iterator that can do an atomic next() very quickly. We abstract them using
RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done
when it returns a falsish version of ... (in the current code, that's always a pointer type, so
done is indicated by null).
* API/JSMarkingConstraintPrivate.cpp:
(JSContextGroupAddMarkingConstraint):
* API/JSVirtualMachine.mm:
(scanExternalObjectGraph):
(scanExternalRememberedSet):
* JavaScriptCore.xcodeproj/project.pbxproj:
* Sources.txt:
* bytecode/AccessCase.cpp:
(JSC::AccessCase::propagateTransitions const):
* bytecode/CodeBlock.cpp:
(JSC::CodeBlock::visitWeakly):
(JSC::CodeBlock::shouldJettisonDueToOldAge):
(JSC::shouldMarkTransition):
(JSC::CodeBlock::propagateTransitions):
(JSC::CodeBlock::determineLiveness):
* dfg/DFGWorklist.cpp:
* ftl/FTLCompile.cpp:
(JSC::FTL::compile):
* heap/ConstraintParallelism.h: Added.
(WTF::printInternal):
* heap/Heap.cpp:
(JSC::Heap::Heap):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::runFixpointPhase):
(JSC::Heap::stopThePeriphery):
(JSC::Heap::resumeThePeriphery):
(JSC::Heap::addCoreConstraints):
(JSC::Heap::setBonusVisitorTask):
(JSC::Heap::runTaskInParallel):
(JSC::Heap::forEachSlotVisitor): Deleted.
* heap/Heap.h:
(JSC::Heap::worldIsRunning const):
(JSC::Heap::runFunctionInParallel):
* heap/HeapInlines.h:
(JSC::Heap::worldIsStopped const):
(JSC::Heap::isMarked):
(JSC::Heap::incrementDeferralDepth):
(JSC::Heap::decrementDeferralDepth):
(JSC::Heap::decrementDeferralDepthAndGCIfNeeded):
(JSC::Heap::forEachSlotVisitor):
(JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted.
(JSC::Heap::isMarkedConcurrently): Deleted.
* heap/HeapSnapshotBuilder.cpp:
(JSC::HeapSnapshotBuilder::appendNode):
* heap/LargeAllocation.h:
(JSC::LargeAllocation::isMarked):
(JSC::LargeAllocation::isMarkedConcurrently): Deleted.
* heap/LockDuringMarking.h:
(JSC::lockDuringMarking):
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::parallelNotEmptyBlockSource):
* heap/MarkedAllocator.h:
* heap/MarkedBlock.h:
(JSC::MarkedBlock::aboutToMark):
(JSC::MarkedBlock::isMarked):
(JSC::MarkedBlock::areMarksStaleWithDependency): Deleted.
(JSC::MarkedBlock::isMarkedConcurrently): Deleted.
* heap/MarkedSpace.h:
(JSC::MarkedSpace::activeWeakSetsBegin):
(JSC::MarkedSpace::activeWeakSetsEnd):
(JSC::MarkedSpace::newActiveWeakSetsBegin):
(JSC::MarkedSpace::newActiveWeakSetsEnd):
* heap/MarkingConstraint.cpp:
(JSC::MarkingConstraint::MarkingConstraint):
(JSC::MarkingConstraint::execute):
(JSC::MarkingConstraint::quickWorkEstimate):
(JSC::MarkingConstraint::workEstimate):
(JSC::MarkingConstraint::doParallelWork):
(JSC::MarkingConstraint::finishParallelWork):
(JSC::MarkingConstraint::doParallelWorkImpl):
(JSC::MarkingConstraint::finishParallelWorkImpl):
* heap/MarkingConstraint.h:
(JSC::MarkingConstraint::lastExecuteParallelism const):
(JSC::MarkingConstraint::parallelism const):
(JSC::MarkingConstraint::quickWorkEstimate): Deleted.
(JSC::MarkingConstraint::workEstimate): Deleted.
* heap/MarkingConstraintSet.cpp:
(JSC::MarkingConstraintSet::MarkingConstraintSet):
(JSC::MarkingConstraintSet::add):
(JSC::MarkingConstraintSet::executeConvergence):
(JSC::MarkingConstraintSet::executeConvergenceImpl):
(JSC::MarkingConstraintSet::executeAll):
(JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted.
(): Deleted.
* heap/MarkingConstraintSet.h:
* heap/MarkingConstraintSolver.cpp: Added.
(JSC::MarkingConstraintSolver::MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::~MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::didVisitSomething const):
(JSC::MarkingConstraintSolver::execute):
(JSC::MarkingConstraintSolver::drain):
(JSC::MarkingConstraintSolver::converge):
(JSC::MarkingConstraintSolver::runExecutionThread):
(JSC::MarkingConstraintSolver::didExecute):
* heap/MarkingConstraintSolver.h: Added.
* heap/OpaqueRootSet.h: Removed.
* heap/ParallelSourceAdapter.h: Added.
(JSC::ParallelSourceAdapter::ParallelSourceAdapter):
(JSC::createParallelSourceAdapter):
* heap/SimpleMarkingConstraint.cpp: Added.
(JSC::SimpleMarkingConstraint::SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::quickWorkEstimate):
(JSC::SimpleMarkingConstraint::executeImpl):
* heap/SimpleMarkingConstraint.h: Added.
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::didStartMarking):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::updateMutatorIsStopped):
(JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::performIncrementOfDraining):
(JSC::SlotVisitor::didReachTermination):
(JSC::SlotVisitor::hasWork):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::drainInParallelPassively):
(JSC::SlotVisitor::waitForTermination):
(JSC::SlotVisitor::addOpaqueRoot): Deleted.
(JSC::SlotVisitor::containsOpaqueRoot const): Deleted.
(JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted.
(JSC::SlotVisitor::mergeIfNecessary): Deleted.
(JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted.
(JSC::SlotVisitor::mergeOpaqueRoots): Deleted.
* heap/SlotVisitor.h:
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::addOpaqueRoot):
(JSC::SlotVisitor::containsOpaqueRoot const):
(JSC::SlotVisitor::vm):
(JSC::SlotVisitor::vm const):
* heap/Subspace.cpp:
(JSC::Subspace::parallelAllocatorSource):
(JSC::Subspace::parallelNotEmptyMarkedBlockSource):
* heap/Subspace.h:
* heap/SubspaceInlines.h:
(JSC::Subspace::forEachMarkedCellInParallel):
* heap/VisitCounter.h: Added.
(JSC::VisitCounter::VisitCounter):
(JSC::VisitCounter::visitCount const):
* heap/VisitingTimeout.h: Removed.
* heap/WeakBlock.cpp:
(JSC::WeakBlock::specializedVisit):
* runtime/Structure.cpp:
(JSC::Structure::isCheapDuringGC):
(JSC::Structure::markIfCheap):
Source/WebCore:
No new tests because no change in behavior. This change is best tested using DOM-GC-intensive
benchmarks like Speedometer and Dromaeo.
This parallelizes the DOM's output constraint, and makes some small changes to make this more
scalable.
* ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added.
* ForwardingHeaders/heap/VisitingTimeout.h: Removed.
* Sources.txt:
* WebCore.xcodeproj/project.pbxproj:
* bindings/js/DOMGCOutputConstraint.cpp: Added.
(WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::executeImpl):
(WebCore::DOMGCOutputConstraint::doParallelWorkImpl):
(WebCore::DOMGCOutputConstraint::finishParallelWorkImpl):
* bindings/js/DOMGCOutputConstraint.h: Added.
* bindings/js/WebCoreJSClientData.cpp:
(WebCore::JSVMClientData::initNormalWorld):
* dom/Node.cpp:
(WebCore::Node::eventTargetDataConcurrently):
(WebCore::Node::ensureEventTargetData):
(WebCore::Node::clearEventTargetData):
Source/WTF:
This does some changes to make it easier to do parallel constraint solving:
- I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse
people about what it means to have a dependency chain. I took that as an opportunity to grealy
simplify the GC's use of dependency chaining.
- Added more logic to Deque<>, since I use it for part of the load balancer.
- Made it possible to profile lock contention. See
https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements.
- Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that
to pick a lock in WebCore.
- Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions
sorta like Java's StampedLock.
* WTF.xcodeproj/project.pbxproj:
* wtf/Atomics.h:
(WTF::dependency):
(WTF::DependencyWith::DependencyWith): Deleted.
(WTF::dependencyWith): Deleted.
* wtf/BitVector.h:
(WTF::BitVector::iterator::operator++):
* wtf/CMakeLists.txt:
* wtf/ConcurrentPtrHashSet.cpp: Added.
(WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::deleteOldTables):
(WTF::ConcurrentPtrHashSet::clear):
(WTF::ConcurrentPtrHashSet::initialize):
(WTF::ConcurrentPtrHashSet::addSlow):
(WTF::ConcurrentPtrHashSet::resizeIfNecessary):
(WTF::ConcurrentPtrHashSet::resizeAndAdd):
(WTF::ConcurrentPtrHashSet::Table::create):
* wtf/ConcurrentPtrHashSet.h: Added.
(WTF::ConcurrentPtrHashSet::contains):
(WTF::ConcurrentPtrHashSet::add):
(WTF::ConcurrentPtrHashSet::size const):
(WTF::ConcurrentPtrHashSet::Table::maxLoad const):
(WTF::ConcurrentPtrHashSet::hash):
(WTF::ConcurrentPtrHashSet::cast):
(WTF::ConcurrentPtrHashSet::containsImpl const):
(WTF::ConcurrentPtrHashSet::addImpl):
* wtf/Deque.h:
(WTF::inlineCapacity>::takeFirst):
* wtf/FastMalloc.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
* wtf/Locker.h:
(WTF::holdLockIf):
* wtf/ScopedLambda.h:
* wtf/SharedTask.h:
(WTF::SharedTask<PassedResultType):
(WTF::SharedTask<ResultType): Deleted.
* wtf/StackShot.h: Added.
(WTF::StackShot::StackShot):
(WTF::StackShot::operator=):
(WTF::StackShot::array const):
(WTF::StackShot::size const):
(WTF::StackShot::operator bool const):
(WTF::StackShot::operator== const):
(WTF::StackShot::hash const):
(WTF::StackShot::isHashTableDeletedValue const):
(WTF::StackShot::operator> const):
(WTF::StackShot::deletedValueArray):
(WTF::StackShotHash::hash):
(WTF::StackShotHash::equal):
* wtf/StackShotProfiler.h: Added.
(WTF::StackShotProfiler::StackShotProfiler):
(WTF::StackShotProfiler::profile):
(WTF::StackShotProfiler::run):
Tools:
* Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark.
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC).
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/196360@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
|
|
|
// int value = m_things;
|
|
|
|
// return value;
|
|
|
|
// }
|
|
|
|
//
|
|
|
|
// If tryOptimisitcRead() runs when the lock is not held, this thread will run a critical section
|
|
|
|
// without ever writing to memory. However, on ARM, this requires fencing. We use a load-acquire for
|
|
|
|
// tryOptimisticRead(). We have no choice but to use the more expensive `dmb ish` in validate(). If
|
|
|
|
// you want to avoid that, you could try to use tryOptimisticFencelessRead().
|
|
|
|
Count tryOptimisticRead()
|
|
|
|
{
|
|
|
|
LockType currentValue = m_word.load();
|
|
|
|
// FIXME: We could eliminate this check, if we think it's OK to proceed with the optimistic read
|
|
|
|
// path even after knowing that it must fail. That's probably good for perf since we expect
|
|
|
|
// failure to be super rare. We would get rid of this check and instead of calling getCount below,
|
|
|
|
// we would return currentValue ^ mask. If the lock state was empty to begin with, the result
|
|
|
|
// would be a properly blessed count (both low bits set). If the lock state was anything else, we
|
|
|
|
// would get an improperly blessed count that would not possibly succeed in validate. We could
|
|
|
|
// actually do something like "return (currentValue | hasParkedBit) ^ isHeldBit", which would mean
|
|
|
|
// that we allow parked-but-not-held-locks through.
|
|
|
|
// https://bugs.webkit.org/show_bug.cgi?id=180394
|
|
|
|
if (currentValue & isHeldBit)
|
|
|
|
return Count();
|
|
|
|
return getCount(currentValue);
|
|
|
|
}
|
|
|
|
|
|
|
|
bool validate(Count count)
|
|
|
|
{
|
|
|
|
WTF::loadLoadFence();
|
|
|
|
LockType currentValue = m_word.loadRelaxed();
|
|
|
|
return getCount(currentValue) == count;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Example of how to use this:
|
|
|
|
//
|
|
|
|
// int read()
|
|
|
|
// {
|
|
|
|
// return m_lock.doOptimizedRead(
|
|
|
|
// [&] () -> int {
|
|
|
|
// int value = m_things;
|
|
|
|
// return value;
|
|
|
|
// });
|
|
|
|
// }
|
|
|
|
template<typename Func>
|
|
|
|
auto doOptimizedRead(const Func& func)
|
|
|
|
{
|
|
|
|
Count count = tryOptimisticRead();
|
|
|
|
if (count) {
|
|
|
|
auto result = func();
|
|
|
|
if (validate(count))
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
lock();
|
|
|
|
auto result = func();
|
|
|
|
unlock();
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Example of how to use this:
|
|
|
|
//
|
|
|
|
// int read()
|
|
|
|
// {
|
|
|
|
// auto result = m_lock.tryOptimisticFencelessRead();
|
|
|
|
// if (CountingLock::Count count = result.value) {
|
|
|
|
// Dependency fenceBefore = Dependency::fence(result.input);
|
|
|
|
// auto* fencedThis = fenceBefore.consume(this);
|
|
|
|
// int value = fencedThis->m_things;
|
|
|
|
// if (m_lock.fencelessValidate(count, Dependency::fence(value)))
|
|
|
|
// return value; // success!
|
|
|
|
// }
|
2021-05-22 00:11:37 +00:00
|
|
|
// Locker locker { m_lock };
|
GC constraint solving should be parallel
https://bugs.webkit.org/show_bug.cgi?id=179934
Reviewed by JF Bastien.
PerformanceTests:
Added a version of splay that measures latency in a way that run-jsc-benchmarks groks.
* Octane/splay.js: Added.
(this.Setup.setup.setup):
(this.TearDown.tearDown.tearDown):
(Benchmark):
(BenchmarkResult):
(BenchmarkResult.prototype.valueOf):
(BenchmarkSuite):
(alert):
(Math.random):
(BenchmarkSuite.ResetRNG):
(RunStep):
(BenchmarkSuite.RunSuites):
(BenchmarkSuite.CountBenchmarks):
(BenchmarkSuite.GeometricMean):
(BenchmarkSuite.GeometricMeanTime):
(BenchmarkSuite.AverageAbovePercentile):
(BenchmarkSuite.GeometricMeanLatency):
(BenchmarkSuite.FormatScore):
(BenchmarkSuite.prototype.NotifyStep):
(BenchmarkSuite.prototype.NotifyResult):
(BenchmarkSuite.prototype.NotifyError):
(BenchmarkSuite.prototype.RunSingleBenchmark):
(RunNextSetup):
(RunNextBenchmark):
(RunNextTearDown):
(BenchmarkSuite.prototype.RunStep):
(GeneratePayloadTree):
(GenerateKey):
(SplayUpdateStats):
(InsertNewNode):
(SplaySetup):
(SplayTearDown):
(SplayRun):
(SplayTree):
(SplayTree.prototype.isEmpty):
(SplayTree.prototype.insert):
(SplayTree.prototype.remove):
(SplayTree.prototype.find):
(SplayTree.prototype.findMax):
(SplayTree.prototype.findGreatestLessThan):
(SplayTree.prototype.exportKeys):
(SplayTree.prototype.splay_):
(SplayTree.Node):
(SplayTree.Node.prototype.traverse_):
(report):
(start):
Source/JavaScriptCore:
This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer
speed-up. It's more than 1% on trunk-Speedometer.
The constraint solver supports running constraints in parallel in two different ways:
- Run multiple constraints in parallel to each other. This only works for constraints that can
tolerate other constraints running concurrently to them (constraint.concurrency() ==
ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the
constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We
could probably make them concurrent, but I'm playing it safe for now.
- A constraint can create parallel work for itself, which the constraint solver will interleave
with other stuff. A constraint can report that it has parallel work by returning
ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that
constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available,
for as long as that function wants to run.
It's not possible to have a non-concurrent constraint that creates parallel work.
The parallelism is implemented in terms of the existing GC marker threads. This turns out to be
most natural for two reasons:
- No need to start any other threads.
- The constraints all want to be passed a SlotVisitor. Running on the marker threads means having
access to those threads' SlotVisitors. Also, it means less load balancing. The solver will
create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker
thread, that thread will have work it can start doing immediately. Before this change, we had to
contribute the work found by the constraint solver to the global worklist so that it could be
distributed to the marker threads by load balancing. This change probably helps to avoid that
load balancing step.
A lot of this change is about making it easy to iterate GC data structures in parallel. This
change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses
the parallel work API. That constraint iterates the marked cells in two subspaces. This change
makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells.
The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel
iterator is just an iterator that can do an atomic next() very quickly. We abstract them using
RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done
when it returns a falsish version of ... (in the current code, that's always a pointer type, so
done is indicated by null).
* API/JSMarkingConstraintPrivate.cpp:
(JSContextGroupAddMarkingConstraint):
* API/JSVirtualMachine.mm:
(scanExternalObjectGraph):
(scanExternalRememberedSet):
* JavaScriptCore.xcodeproj/project.pbxproj:
* Sources.txt:
* bytecode/AccessCase.cpp:
(JSC::AccessCase::propagateTransitions const):
* bytecode/CodeBlock.cpp:
(JSC::CodeBlock::visitWeakly):
(JSC::CodeBlock::shouldJettisonDueToOldAge):
(JSC::shouldMarkTransition):
(JSC::CodeBlock::propagateTransitions):
(JSC::CodeBlock::determineLiveness):
* dfg/DFGWorklist.cpp:
* ftl/FTLCompile.cpp:
(JSC::FTL::compile):
* heap/ConstraintParallelism.h: Added.
(WTF::printInternal):
* heap/Heap.cpp:
(JSC::Heap::Heap):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::runFixpointPhase):
(JSC::Heap::stopThePeriphery):
(JSC::Heap::resumeThePeriphery):
(JSC::Heap::addCoreConstraints):
(JSC::Heap::setBonusVisitorTask):
(JSC::Heap::runTaskInParallel):
(JSC::Heap::forEachSlotVisitor): Deleted.
* heap/Heap.h:
(JSC::Heap::worldIsRunning const):
(JSC::Heap::runFunctionInParallel):
* heap/HeapInlines.h:
(JSC::Heap::worldIsStopped const):
(JSC::Heap::isMarked):
(JSC::Heap::incrementDeferralDepth):
(JSC::Heap::decrementDeferralDepth):
(JSC::Heap::decrementDeferralDepthAndGCIfNeeded):
(JSC::Heap::forEachSlotVisitor):
(JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted.
(JSC::Heap::isMarkedConcurrently): Deleted.
* heap/HeapSnapshotBuilder.cpp:
(JSC::HeapSnapshotBuilder::appendNode):
* heap/LargeAllocation.h:
(JSC::LargeAllocation::isMarked):
(JSC::LargeAllocation::isMarkedConcurrently): Deleted.
* heap/LockDuringMarking.h:
(JSC::lockDuringMarking):
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::parallelNotEmptyBlockSource):
* heap/MarkedAllocator.h:
* heap/MarkedBlock.h:
(JSC::MarkedBlock::aboutToMark):
(JSC::MarkedBlock::isMarked):
(JSC::MarkedBlock::areMarksStaleWithDependency): Deleted.
(JSC::MarkedBlock::isMarkedConcurrently): Deleted.
* heap/MarkedSpace.h:
(JSC::MarkedSpace::activeWeakSetsBegin):
(JSC::MarkedSpace::activeWeakSetsEnd):
(JSC::MarkedSpace::newActiveWeakSetsBegin):
(JSC::MarkedSpace::newActiveWeakSetsEnd):
* heap/MarkingConstraint.cpp:
(JSC::MarkingConstraint::MarkingConstraint):
(JSC::MarkingConstraint::execute):
(JSC::MarkingConstraint::quickWorkEstimate):
(JSC::MarkingConstraint::workEstimate):
(JSC::MarkingConstraint::doParallelWork):
(JSC::MarkingConstraint::finishParallelWork):
(JSC::MarkingConstraint::doParallelWorkImpl):
(JSC::MarkingConstraint::finishParallelWorkImpl):
* heap/MarkingConstraint.h:
(JSC::MarkingConstraint::lastExecuteParallelism const):
(JSC::MarkingConstraint::parallelism const):
(JSC::MarkingConstraint::quickWorkEstimate): Deleted.
(JSC::MarkingConstraint::workEstimate): Deleted.
* heap/MarkingConstraintSet.cpp:
(JSC::MarkingConstraintSet::MarkingConstraintSet):
(JSC::MarkingConstraintSet::add):
(JSC::MarkingConstraintSet::executeConvergence):
(JSC::MarkingConstraintSet::executeConvergenceImpl):
(JSC::MarkingConstraintSet::executeAll):
(JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted.
(): Deleted.
* heap/MarkingConstraintSet.h:
* heap/MarkingConstraintSolver.cpp: Added.
(JSC::MarkingConstraintSolver::MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::~MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::didVisitSomething const):
(JSC::MarkingConstraintSolver::execute):
(JSC::MarkingConstraintSolver::drain):
(JSC::MarkingConstraintSolver::converge):
(JSC::MarkingConstraintSolver::runExecutionThread):
(JSC::MarkingConstraintSolver::didExecute):
* heap/MarkingConstraintSolver.h: Added.
* heap/OpaqueRootSet.h: Removed.
* heap/ParallelSourceAdapter.h: Added.
(JSC::ParallelSourceAdapter::ParallelSourceAdapter):
(JSC::createParallelSourceAdapter):
* heap/SimpleMarkingConstraint.cpp: Added.
(JSC::SimpleMarkingConstraint::SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::quickWorkEstimate):
(JSC::SimpleMarkingConstraint::executeImpl):
* heap/SimpleMarkingConstraint.h: Added.
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::didStartMarking):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::updateMutatorIsStopped):
(JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::performIncrementOfDraining):
(JSC::SlotVisitor::didReachTermination):
(JSC::SlotVisitor::hasWork):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::drainInParallelPassively):
(JSC::SlotVisitor::waitForTermination):
(JSC::SlotVisitor::addOpaqueRoot): Deleted.
(JSC::SlotVisitor::containsOpaqueRoot const): Deleted.
(JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted.
(JSC::SlotVisitor::mergeIfNecessary): Deleted.
(JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted.
(JSC::SlotVisitor::mergeOpaqueRoots): Deleted.
* heap/SlotVisitor.h:
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::addOpaqueRoot):
(JSC::SlotVisitor::containsOpaqueRoot const):
(JSC::SlotVisitor::vm):
(JSC::SlotVisitor::vm const):
* heap/Subspace.cpp:
(JSC::Subspace::parallelAllocatorSource):
(JSC::Subspace::parallelNotEmptyMarkedBlockSource):
* heap/Subspace.h:
* heap/SubspaceInlines.h:
(JSC::Subspace::forEachMarkedCellInParallel):
* heap/VisitCounter.h: Added.
(JSC::VisitCounter::VisitCounter):
(JSC::VisitCounter::visitCount const):
* heap/VisitingTimeout.h: Removed.
* heap/WeakBlock.cpp:
(JSC::WeakBlock::specializedVisit):
* runtime/Structure.cpp:
(JSC::Structure::isCheapDuringGC):
(JSC::Structure::markIfCheap):
Source/WebCore:
No new tests because no change in behavior. This change is best tested using DOM-GC-intensive
benchmarks like Speedometer and Dromaeo.
This parallelizes the DOM's output constraint, and makes some small changes to make this more
scalable.
* ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added.
* ForwardingHeaders/heap/VisitingTimeout.h: Removed.
* Sources.txt:
* WebCore.xcodeproj/project.pbxproj:
* bindings/js/DOMGCOutputConstraint.cpp: Added.
(WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::executeImpl):
(WebCore::DOMGCOutputConstraint::doParallelWorkImpl):
(WebCore::DOMGCOutputConstraint::finishParallelWorkImpl):
* bindings/js/DOMGCOutputConstraint.h: Added.
* bindings/js/WebCoreJSClientData.cpp:
(WebCore::JSVMClientData::initNormalWorld):
* dom/Node.cpp:
(WebCore::Node::eventTargetDataConcurrently):
(WebCore::Node::ensureEventTargetData):
(WebCore::Node::clearEventTargetData):
Source/WTF:
This does some changes to make it easier to do parallel constraint solving:
- I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse
people about what it means to have a dependency chain. I took that as an opportunity to grealy
simplify the GC's use of dependency chaining.
- Added more logic to Deque<>, since I use it for part of the load balancer.
- Made it possible to profile lock contention. See
https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements.
- Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that
to pick a lock in WebCore.
- Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions
sorta like Java's StampedLock.
* WTF.xcodeproj/project.pbxproj:
* wtf/Atomics.h:
(WTF::dependency):
(WTF::DependencyWith::DependencyWith): Deleted.
(WTF::dependencyWith): Deleted.
* wtf/BitVector.h:
(WTF::BitVector::iterator::operator++):
* wtf/CMakeLists.txt:
* wtf/ConcurrentPtrHashSet.cpp: Added.
(WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::deleteOldTables):
(WTF::ConcurrentPtrHashSet::clear):
(WTF::ConcurrentPtrHashSet::initialize):
(WTF::ConcurrentPtrHashSet::addSlow):
(WTF::ConcurrentPtrHashSet::resizeIfNecessary):
(WTF::ConcurrentPtrHashSet::resizeAndAdd):
(WTF::ConcurrentPtrHashSet::Table::create):
* wtf/ConcurrentPtrHashSet.h: Added.
(WTF::ConcurrentPtrHashSet::contains):
(WTF::ConcurrentPtrHashSet::add):
(WTF::ConcurrentPtrHashSet::size const):
(WTF::ConcurrentPtrHashSet::Table::maxLoad const):
(WTF::ConcurrentPtrHashSet::hash):
(WTF::ConcurrentPtrHashSet::cast):
(WTF::ConcurrentPtrHashSet::containsImpl const):
(WTF::ConcurrentPtrHashSet::addImpl):
* wtf/Deque.h:
(WTF::inlineCapacity>::takeFirst):
* wtf/FastMalloc.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
* wtf/Locker.h:
(WTF::holdLockIf):
* wtf/ScopedLambda.h:
* wtf/SharedTask.h:
(WTF::SharedTask<PassedResultType):
(WTF::SharedTask<ResultType): Deleted.
* wtf/StackShot.h: Added.
(WTF::StackShot::StackShot):
(WTF::StackShot::operator=):
(WTF::StackShot::array const):
(WTF::StackShot::size const):
(WTF::StackShot::operator bool const):
(WTF::StackShot::operator== const):
(WTF::StackShot::hash const):
(WTF::StackShot::isHashTableDeletedValue const):
(WTF::StackShot::operator> const):
(WTF::StackShot::deletedValueArray):
(WTF::StackShotHash::hash):
(WTF::StackShotHash::equal):
* wtf/StackShotProfiler.h: Added.
(WTF::StackShotProfiler::StackShotProfiler):
(WTF::StackShotProfiler::profile):
(WTF::StackShotProfiler::run):
Tools:
* Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark.
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC).
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/196360@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
|
|
|
// int value = m_things;
|
|
|
|
// return value;
|
|
|
|
// }
|
|
|
|
//
|
|
|
|
// Use this to create a read transaction using dependency chains only. You have to be careful to
|
|
|
|
// thread the dependency input (the `input` field that the returns) through a Dependency, and then
|
|
|
|
// thread that Dependency into every load (except for loads that are chasing pointers loaded from
|
|
|
|
// loads that already uses that dependency). Then, to validate the read transaction, you have to pass
|
|
|
|
// both the count and another Dependency that is based on whatever loads you used to produce the
|
|
|
|
// output.
|
|
|
|
//
|
|
|
|
// On non-ARM platforms, the Dependency objects don't do anything except for Dependency::fence, which
|
|
|
|
// is a load-load fence. The idiom above does the right thing on both ARM and TSO.
|
|
|
|
//
|
|
|
|
// WARNING: This can be hard to get right. Please only use this for very short critical sections that
|
|
|
|
// are known to be sufficiently perf-critical to justify the risk.
|
|
|
|
InputAndValue<LockType, Count> tryOptimisticFencelessRead()
|
|
|
|
{
|
|
|
|
LockType currentValue = m_word.loadRelaxed();
|
|
|
|
if (currentValue & isHeldBit)
|
|
|
|
return inputAndValue(currentValue, Count());
|
|
|
|
return inputAndValue(currentValue, getCount(currentValue));
|
|
|
|
}
|
|
|
|
|
|
|
|
bool fencelessValidate(Count count, Dependency dependency)
|
|
|
|
{
|
|
|
|
LockType currentValue = dependency.consume(this)->m_word.loadRelaxed();
|
|
|
|
return getCount(currentValue) == count;
|
|
|
|
}
|
|
|
|
|
|
|
|
template<typename OptimisticFunc, typename Func>
|
|
|
|
auto doOptimizedFencelessRead(const OptimisticFunc& optimisticFunc, const Func& func)
|
|
|
|
{
|
|
|
|
auto count = tryOptimisticFencelessRead();
|
|
|
|
if (count.value) {
|
|
|
|
Dependency dependency = Dependency::fence(count.input);
|
|
|
|
auto result = optimisticFunc(dependency, count.value);
|
|
|
|
if (fencelessValidate(count.value, dependency))
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
lock();
|
|
|
|
auto result = func();
|
|
|
|
unlock();
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
WTF_EXPORT_PRIVATE void lockSlow();
|
|
|
|
WTF_EXPORT_PRIVATE void unlockSlow();
|
|
|
|
|
|
|
|
Count getCount(LockType value)
|
|
|
|
{
|
|
|
|
Count result;
|
|
|
|
result.m_value = value | mask;
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2017-12-07 03:52:09 +00:00
|
|
|
Atomic<LockType> m_word { 0 };
|
GC constraint solving should be parallel
https://bugs.webkit.org/show_bug.cgi?id=179934
Reviewed by JF Bastien.
PerformanceTests:
Added a version of splay that measures latency in a way that run-jsc-benchmarks groks.
* Octane/splay.js: Added.
(this.Setup.setup.setup):
(this.TearDown.tearDown.tearDown):
(Benchmark):
(BenchmarkResult):
(BenchmarkResult.prototype.valueOf):
(BenchmarkSuite):
(alert):
(Math.random):
(BenchmarkSuite.ResetRNG):
(RunStep):
(BenchmarkSuite.RunSuites):
(BenchmarkSuite.CountBenchmarks):
(BenchmarkSuite.GeometricMean):
(BenchmarkSuite.GeometricMeanTime):
(BenchmarkSuite.AverageAbovePercentile):
(BenchmarkSuite.GeometricMeanLatency):
(BenchmarkSuite.FormatScore):
(BenchmarkSuite.prototype.NotifyStep):
(BenchmarkSuite.prototype.NotifyResult):
(BenchmarkSuite.prototype.NotifyError):
(BenchmarkSuite.prototype.RunSingleBenchmark):
(RunNextSetup):
(RunNextBenchmark):
(RunNextTearDown):
(BenchmarkSuite.prototype.RunStep):
(GeneratePayloadTree):
(GenerateKey):
(SplayUpdateStats):
(InsertNewNode):
(SplaySetup):
(SplayTearDown):
(SplayRun):
(SplayTree):
(SplayTree.prototype.isEmpty):
(SplayTree.prototype.insert):
(SplayTree.prototype.remove):
(SplayTree.prototype.find):
(SplayTree.prototype.findMax):
(SplayTree.prototype.findGreatestLessThan):
(SplayTree.prototype.exportKeys):
(SplayTree.prototype.splay_):
(SplayTree.Node):
(SplayTree.Node.prototype.traverse_):
(report):
(start):
Source/JavaScriptCore:
This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer
speed-up. It's more than 1% on trunk-Speedometer.
The constraint solver supports running constraints in parallel in two different ways:
- Run multiple constraints in parallel to each other. This only works for constraints that can
tolerate other constraints running concurrently to them (constraint.concurrency() ==
ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the
constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We
could probably make them concurrent, but I'm playing it safe for now.
- A constraint can create parallel work for itself, which the constraint solver will interleave
with other stuff. A constraint can report that it has parallel work by returning
ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that
constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available,
for as long as that function wants to run.
It's not possible to have a non-concurrent constraint that creates parallel work.
The parallelism is implemented in terms of the existing GC marker threads. This turns out to be
most natural for two reasons:
- No need to start any other threads.
- The constraints all want to be passed a SlotVisitor. Running on the marker threads means having
access to those threads' SlotVisitors. Also, it means less load balancing. The solver will
create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker
thread, that thread will have work it can start doing immediately. Before this change, we had to
contribute the work found by the constraint solver to the global worklist so that it could be
distributed to the marker threads by load balancing. This change probably helps to avoid that
load balancing step.
A lot of this change is about making it easy to iterate GC data structures in parallel. This
change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses
the parallel work API. That constraint iterates the marked cells in two subspaces. This change
makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells.
The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel
iterator is just an iterator that can do an atomic next() very quickly. We abstract them using
RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done
when it returns a falsish version of ... (in the current code, that's always a pointer type, so
done is indicated by null).
* API/JSMarkingConstraintPrivate.cpp:
(JSContextGroupAddMarkingConstraint):
* API/JSVirtualMachine.mm:
(scanExternalObjectGraph):
(scanExternalRememberedSet):
* JavaScriptCore.xcodeproj/project.pbxproj:
* Sources.txt:
* bytecode/AccessCase.cpp:
(JSC::AccessCase::propagateTransitions const):
* bytecode/CodeBlock.cpp:
(JSC::CodeBlock::visitWeakly):
(JSC::CodeBlock::shouldJettisonDueToOldAge):
(JSC::shouldMarkTransition):
(JSC::CodeBlock::propagateTransitions):
(JSC::CodeBlock::determineLiveness):
* dfg/DFGWorklist.cpp:
* ftl/FTLCompile.cpp:
(JSC::FTL::compile):
* heap/ConstraintParallelism.h: Added.
(WTF::printInternal):
* heap/Heap.cpp:
(JSC::Heap::Heap):
(JSC::Heap::addToRememberedSet):
(JSC::Heap::runFixpointPhase):
(JSC::Heap::stopThePeriphery):
(JSC::Heap::resumeThePeriphery):
(JSC::Heap::addCoreConstraints):
(JSC::Heap::setBonusVisitorTask):
(JSC::Heap::runTaskInParallel):
(JSC::Heap::forEachSlotVisitor): Deleted.
* heap/Heap.h:
(JSC::Heap::worldIsRunning const):
(JSC::Heap::runFunctionInParallel):
* heap/HeapInlines.h:
(JSC::Heap::worldIsStopped const):
(JSC::Heap::isMarked):
(JSC::Heap::incrementDeferralDepth):
(JSC::Heap::decrementDeferralDepth):
(JSC::Heap::decrementDeferralDepthAndGCIfNeeded):
(JSC::Heap::forEachSlotVisitor):
(JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted.
(JSC::Heap::isMarkedConcurrently): Deleted.
* heap/HeapSnapshotBuilder.cpp:
(JSC::HeapSnapshotBuilder::appendNode):
* heap/LargeAllocation.h:
(JSC::LargeAllocation::isMarked):
(JSC::LargeAllocation::isMarkedConcurrently): Deleted.
* heap/LockDuringMarking.h:
(JSC::lockDuringMarking):
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::parallelNotEmptyBlockSource):
* heap/MarkedAllocator.h:
* heap/MarkedBlock.h:
(JSC::MarkedBlock::aboutToMark):
(JSC::MarkedBlock::isMarked):
(JSC::MarkedBlock::areMarksStaleWithDependency): Deleted.
(JSC::MarkedBlock::isMarkedConcurrently): Deleted.
* heap/MarkedSpace.h:
(JSC::MarkedSpace::activeWeakSetsBegin):
(JSC::MarkedSpace::activeWeakSetsEnd):
(JSC::MarkedSpace::newActiveWeakSetsBegin):
(JSC::MarkedSpace::newActiveWeakSetsEnd):
* heap/MarkingConstraint.cpp:
(JSC::MarkingConstraint::MarkingConstraint):
(JSC::MarkingConstraint::execute):
(JSC::MarkingConstraint::quickWorkEstimate):
(JSC::MarkingConstraint::workEstimate):
(JSC::MarkingConstraint::doParallelWork):
(JSC::MarkingConstraint::finishParallelWork):
(JSC::MarkingConstraint::doParallelWorkImpl):
(JSC::MarkingConstraint::finishParallelWorkImpl):
* heap/MarkingConstraint.h:
(JSC::MarkingConstraint::lastExecuteParallelism const):
(JSC::MarkingConstraint::parallelism const):
(JSC::MarkingConstraint::quickWorkEstimate): Deleted.
(JSC::MarkingConstraint::workEstimate): Deleted.
* heap/MarkingConstraintSet.cpp:
(JSC::MarkingConstraintSet::MarkingConstraintSet):
(JSC::MarkingConstraintSet::add):
(JSC::MarkingConstraintSet::executeConvergence):
(JSC::MarkingConstraintSet::executeConvergenceImpl):
(JSC::MarkingConstraintSet::executeAll):
(JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted.
(JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted.
(): Deleted.
* heap/MarkingConstraintSet.h:
* heap/MarkingConstraintSolver.cpp: Added.
(JSC::MarkingConstraintSolver::MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::~MarkingConstraintSolver):
(JSC::MarkingConstraintSolver::didVisitSomething const):
(JSC::MarkingConstraintSolver::execute):
(JSC::MarkingConstraintSolver::drain):
(JSC::MarkingConstraintSolver::converge):
(JSC::MarkingConstraintSolver::runExecutionThread):
(JSC::MarkingConstraintSolver::didExecute):
* heap/MarkingConstraintSolver.h: Added.
* heap/OpaqueRootSet.h: Removed.
* heap/ParallelSourceAdapter.h: Added.
(JSC::ParallelSourceAdapter::ParallelSourceAdapter):
(JSC::createParallelSourceAdapter):
* heap/SimpleMarkingConstraint.cpp: Added.
(JSC::SimpleMarkingConstraint::SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint):
(JSC::SimpleMarkingConstraint::quickWorkEstimate):
(JSC::SimpleMarkingConstraint::executeImpl):
* heap/SimpleMarkingConstraint.h: Added.
* heap/SlotVisitor.cpp:
(JSC::SlotVisitor::didStartMarking):
(JSC::SlotVisitor::reset):
(JSC::SlotVisitor::appendToMarkStack):
(JSC::SlotVisitor::visitChildren):
(JSC::SlotVisitor::updateMutatorIsStopped):
(JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const):
(JSC::SlotVisitor::drain):
(JSC::SlotVisitor::performIncrementOfDraining):
(JSC::SlotVisitor::didReachTermination):
(JSC::SlotVisitor::hasWork):
(JSC::SlotVisitor::drainFromShared):
(JSC::SlotVisitor::drainInParallelPassively):
(JSC::SlotVisitor::waitForTermination):
(JSC::SlotVisitor::addOpaqueRoot): Deleted.
(JSC::SlotVisitor::containsOpaqueRoot const): Deleted.
(JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted.
(JSC::SlotVisitor::mergeIfNecessary): Deleted.
(JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted.
(JSC::SlotVisitor::mergeOpaqueRoots): Deleted.
* heap/SlotVisitor.h:
* heap/SlotVisitorInlines.h:
(JSC::SlotVisitor::addOpaqueRoot):
(JSC::SlotVisitor::containsOpaqueRoot const):
(JSC::SlotVisitor::vm):
(JSC::SlotVisitor::vm const):
* heap/Subspace.cpp:
(JSC::Subspace::parallelAllocatorSource):
(JSC::Subspace::parallelNotEmptyMarkedBlockSource):
* heap/Subspace.h:
* heap/SubspaceInlines.h:
(JSC::Subspace::forEachMarkedCellInParallel):
* heap/VisitCounter.h: Added.
(JSC::VisitCounter::VisitCounter):
(JSC::VisitCounter::visitCount const):
* heap/VisitingTimeout.h: Removed.
* heap/WeakBlock.cpp:
(JSC::WeakBlock::specializedVisit):
* runtime/Structure.cpp:
(JSC::Structure::isCheapDuringGC):
(JSC::Structure::markIfCheap):
Source/WebCore:
No new tests because no change in behavior. This change is best tested using DOM-GC-intensive
benchmarks like Speedometer and Dromaeo.
This parallelizes the DOM's output constraint, and makes some small changes to make this more
scalable.
* ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added.
* ForwardingHeaders/heap/VisitingTimeout.h: Removed.
* Sources.txt:
* WebCore.xcodeproj/project.pbxproj:
* bindings/js/DOMGCOutputConstraint.cpp: Added.
(WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint):
(WebCore::DOMGCOutputConstraint::executeImpl):
(WebCore::DOMGCOutputConstraint::doParallelWorkImpl):
(WebCore::DOMGCOutputConstraint::finishParallelWorkImpl):
* bindings/js/DOMGCOutputConstraint.h: Added.
* bindings/js/WebCoreJSClientData.cpp:
(WebCore::JSVMClientData::initNormalWorld):
* dom/Node.cpp:
(WebCore::Node::eventTargetDataConcurrently):
(WebCore::Node::ensureEventTargetData):
(WebCore::Node::clearEventTargetData):
Source/WTF:
This does some changes to make it easier to do parallel constraint solving:
- I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse
people about what it means to have a dependency chain. I took that as an opportunity to grealy
simplify the GC's use of dependency chaining.
- Added more logic to Deque<>, since I use it for part of the load balancer.
- Made it possible to profile lock contention. See
https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements.
- Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that
to pick a lock in WebCore.
- Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions
sorta like Java's StampedLock.
* WTF.xcodeproj/project.pbxproj:
* wtf/Atomics.h:
(WTF::dependency):
(WTF::DependencyWith::DependencyWith): Deleted.
(WTF::dependencyWith): Deleted.
* wtf/BitVector.h:
(WTF::BitVector::iterator::operator++):
* wtf/CMakeLists.txt:
* wtf/ConcurrentPtrHashSet.cpp: Added.
(WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet):
(WTF::ConcurrentPtrHashSet::deleteOldTables):
(WTF::ConcurrentPtrHashSet::clear):
(WTF::ConcurrentPtrHashSet::initialize):
(WTF::ConcurrentPtrHashSet::addSlow):
(WTF::ConcurrentPtrHashSet::resizeIfNecessary):
(WTF::ConcurrentPtrHashSet::resizeAndAdd):
(WTF::ConcurrentPtrHashSet::Table::create):
* wtf/ConcurrentPtrHashSet.h: Added.
(WTF::ConcurrentPtrHashSet::contains):
(WTF::ConcurrentPtrHashSet::add):
(WTF::ConcurrentPtrHashSet::size const):
(WTF::ConcurrentPtrHashSet::Table::maxLoad const):
(WTF::ConcurrentPtrHashSet::hash):
(WTF::ConcurrentPtrHashSet::cast):
(WTF::ConcurrentPtrHashSet::containsImpl const):
(WTF::ConcurrentPtrHashSet::addImpl):
* wtf/Deque.h:
(WTF::inlineCapacity>::takeFirst):
* wtf/FastMalloc.h:
* wtf/Lock.cpp:
(WTF::LockBase::lockSlow):
* wtf/Locker.h:
(WTF::holdLockIf):
* wtf/ScopedLambda.h:
* wtf/SharedTask.h:
(WTF::SharedTask<PassedResultType):
(WTF::SharedTask<ResultType): Deleted.
* wtf/StackShot.h: Added.
(WTF::StackShot::StackShot):
(WTF::StackShot::operator=):
(WTF::StackShot::array const):
(WTF::StackShot::size const):
(WTF::StackShot::operator bool const):
(WTF::StackShot::operator== const):
(WTF::StackShot::hash const):
(WTF::StackShot::isHashTableDeletedValue const):
(WTF::StackShot::operator> const):
(WTF::StackShot::deletedValueArray):
(WTF::StackShotHash::hash):
(WTF::StackShotHash::equal):
* wtf/StackShotProfiler.h: Added.
(WTF::StackShotProfiler::StackShotProfiler):
(WTF::StackShotProfiler::profile):
(WTF::StackShotProfiler::run):
Tools:
* Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark.
* TestWebKitAPI/CMakeLists.txt:
* TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC).
(TestWebKitAPI::TEST):
Canonical link: https://commits.webkit.org/196360@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
} // namespace WTF
|
|
|
|
|
|
|
|
using WTF::CountingLock;
|
|
|
|
|