haikuwebkit/Source/WTF/wtf/Lock.cpp

98 lines
3.6 KiB
C++
Raw Permalink Normal View History

Lightweight locks should be adaptive https://bugs.webkit.org/show_bug.cgi?id=147545 Reviewed by Geoffrey Garen. Source/JavaScriptCore: * dfg/DFGCommon.cpp: (JSC::DFG::startCrashing): * heap/CopiedBlock.h: (JSC::CopiedBlock::workListLock): * heap/CopiedBlockInlines.h: (JSC::CopiedBlock::shouldReportLiveBytes): (JSC::CopiedBlock::reportLiveBytes): * heap/CopiedSpace.cpp: (JSC::CopiedSpace::doneFillingBlock): * heap/CopiedSpace.h: (JSC::CopiedSpace::CopiedGeneration::CopiedGeneration): * heap/CopiedSpaceInlines.h: (JSC::CopiedSpace::recycleEvacuatedBlock): * heap/GCThreadSharedData.cpp: (JSC::GCThreadSharedData::didStartCopying): * heap/GCThreadSharedData.h: (JSC::GCThreadSharedData::getNextBlocksToCopy): * heap/ListableHandler.h: (JSC::ListableHandler::List::addThreadSafe): (JSC::ListableHandler::List::addNotThreadSafe): * heap/MachineStackMarker.cpp: (JSC::MachineThreads::tryCopyOtherThreadStacks): * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::copyLater): * parser/SourceProvider.cpp: (JSC::SourceProvider::~SourceProvider): (JSC::SourceProvider::getID): * profiler/ProfilerDatabase.cpp: (JSC::Profiler::Database::addDatabaseToAtExit): (JSC::Profiler::Database::removeDatabaseFromAtExit): (JSC::Profiler::Database::removeFirstAtExitDatabase): * runtime/TypeProfilerLog.h: Source/WebCore: * bindings/objc/WebScriptObject.mm: (WebCore::getJSWrapper): (WebCore::addJSWrapper): (WebCore::removeJSWrapper): (WebCore::removeJSWrapperIfRetainCountOne): * platform/audio/mac/CARingBuffer.cpp: (WebCore::CARingBuffer::setCurrentFrameBounds): (WebCore::CARingBuffer::getCurrentFrameBounds): * platform/audio/mac/CARingBuffer.h: * platform/ios/wak/WAKWindow.mm: (-[WAKWindow setExposedScrollViewRect:]): (-[WAKWindow exposedScrollViewRect]): Source/WebKit2: * WebProcess/WebPage/EventDispatcher.cpp: (WebKit::EventDispatcher::clearQueuedTouchEventsForPage): (WebKit::EventDispatcher::getQueuedTouchEventsForPage): (WebKit::EventDispatcher::touchEvent): (WebKit::EventDispatcher::dispatchTouchEvents): * WebProcess/WebPage/EventDispatcher.h: * WebProcess/WebPage/ViewUpdateDispatcher.cpp: (WebKit::ViewUpdateDispatcher::visibleContentRectUpdate): (WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate): * WebProcess/WebPage/ViewUpdateDispatcher.h: Source/WTF: A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition overhead is lower than system locks and because they take dramatically less space than system locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock acquire is up to 10x faster and under microcontention - short critical section with two or more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4 bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit should continue to avoid system locks - they are just far too slow and far too big. But there is a problem with this idiom. System lock implementations will sleep a thread when it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU. In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for microcontention, but awful when the lock will not be released for a while. In fact, when critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is almost 100x more than the CPU time cost of a system lock. This case doesn't arise too frequently in our current uses of spinlocks, but that's probably because right now there are places where we make a conscious decision to use system locks - even though they use more memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a while to acquire the lock. The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held, the slow path tries some number of spins to acquire the lock, and if that fails, the thread is put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and the lock itself is a tagged pointer: either it is just bits telling us the complete lock state (not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path if the lock is not contended, a short burst of spinning for microcontention, and a full-blown queue for critical sections that are held for a long time. On a locking microbenchmark, this new Lock exhibits the following performance characteristics: - Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster than a system mutex. - Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster than a system mutex. - CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a SpinLock. - Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than a SpinLock. This patch replaces all uses of SpinLock with Lock, since our critical sections are not no-ops so if you do basically anything in your critical section, the Lock overhead will be invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time. This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits of having a lock that just uses a byte are still better than the CPU wastage benefits of Lock. But, this work will enable some future work to create locks that will fit in just 1.6 bits: https://bugs.webkit.org/show_bug.cgi?id=147665. Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the possibility of a null queue head. [1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf * WTF.vcxproj/WTF.vcxproj: * WTF.xcodeproj/project.pbxproj: * benchmarks: Added. * benchmarks/LockSpeedTest.cpp: Added. (main): * wtf/Atomics.h: (WTF::Atomic::compareExchangeWeak): (WTF::Atomic::compareExchangeStrong): * wtf/CMakeLists.txt: * wtf/Lock.cpp: Added. (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): * wtf/Lock.h: Added. (WTF::LockBase::lock): (WTF::LockBase::unlock): (WTF::LockBase::isHeld): (WTF::LockBase::isLocked): (WTF::Lock::Lock): * wtf/MetaAllocator.cpp: (WTF::MetaAllocator::release): (WTF::MetaAllocatorHandle::shrink): (WTF::MetaAllocator::allocate): (WTF::MetaAllocator::currentStatistics): (WTF::MetaAllocator::addFreshFreeSpace): (WTF::MetaAllocator::debugFreeSpaceSize): * wtf/MetaAllocator.h: * wtf/SpinLock.h: * wtf/ThreadingPthreads.cpp: * wtf/ThreadingWin.cpp: * wtf/text/AtomicString.cpp: * wtf/text/AtomicStringImpl.cpp: (WTF::AtomicStringTableLocker::AtomicStringTableLocker): Tools: * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/Lock.cpp: Added. (TestWebKitAPI::runLockTest): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/165908@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
/*
It should be easy to decide how WebKit yields https://bugs.webkit.org/show_bug.cgi?id=174298 Reviewed by Saam Barati. Source/bmalloc: Use sched_yield() explicitly. * bmalloc/StaticMutex.cpp: (bmalloc::StaticMutex::lockSlowCase): Source/JavaScriptCore: Use the new WTF::Thread::yield() function for yielding instead of the C++ function. * heap/Heap.cpp: (JSC::Heap::resumeThePeriphery): * heap/VisitingTimeout.h: * runtime/JSCell.cpp: (JSC::JSCell::lockSlow): (JSC::JSCell::unlockSlow): * runtime/JSCell.h: * runtime/JSCellInlines.h: (JSC::JSCell::lock): (JSC::JSCell::unlock): * runtime/JSLock.cpp: (JSC::JSLock::grabAllLocks): * runtime/SamplingProfiler.cpp: Source/WebCore: No new tests because the WebCore change is just a change to how we #include things. * inspector/InspectorPageAgent.h: * inspector/TimelineRecordFactory.h: * workers/Worker.h: * workers/WorkerGlobalScopeProxy.h: * workers/WorkerMessagingProxy.h: Source/WebKitLegacy: * Storage/StorageTracker.h: Source/WTF: Created a Thread::yield() abstraction for sched_yield(), and made WTF use it everywhere that it had previously used std::this_thread::yield(). To make it less annoying to experiment with changes to the lock algorithm in the future, this also moves the meat of the algorithm into LockAlgorithmInlines.h. Only two files include that header. Since LockAlgorithm.h no longer includes ParkingLot.h, a bunch of files in WK now need to include timing headers (Seconds, MonotonicTime, etc) manually. * WTF.xcodeproj/project.pbxproj: * benchmarks/ToyLocks.h: * wtf/CMakeLists.txt: * wtf/Lock.cpp: * wtf/LockAlgorithm.h: (WTF::LockAlgorithm::lockSlow): Deleted. (WTF::LockAlgorithm::unlockSlow): Deleted. * wtf/LockAlgorithmInlines.h: Added. (WTF::hasParkedBit>::lockSlow): (WTF::hasParkedBit>::unlockSlow): * wtf/MainThread.cpp: * wtf/RunLoopTimer.h: * wtf/Threading.cpp: * wtf/Threading.h: * wtf/ThreadingPthreads.cpp: (WTF::Thread::yield): * wtf/ThreadingWin.cpp: (WTF::Thread::yield): * wtf/WordLock.cpp: (WTF::WordLockBase::lockSlow): (WTF::WordLockBase::unlockSlow): Canonical link: https://commits.webkit.org/191572@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@219763 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-07-22 14:36:18 +00:00
* Copyright (C) 2015-2017 Apple Inc. All rights reserved.
Lightweight locks should be adaptive https://bugs.webkit.org/show_bug.cgi?id=147545 Reviewed by Geoffrey Garen. Source/JavaScriptCore: * dfg/DFGCommon.cpp: (JSC::DFG::startCrashing): * heap/CopiedBlock.h: (JSC::CopiedBlock::workListLock): * heap/CopiedBlockInlines.h: (JSC::CopiedBlock::shouldReportLiveBytes): (JSC::CopiedBlock::reportLiveBytes): * heap/CopiedSpace.cpp: (JSC::CopiedSpace::doneFillingBlock): * heap/CopiedSpace.h: (JSC::CopiedSpace::CopiedGeneration::CopiedGeneration): * heap/CopiedSpaceInlines.h: (JSC::CopiedSpace::recycleEvacuatedBlock): * heap/GCThreadSharedData.cpp: (JSC::GCThreadSharedData::didStartCopying): * heap/GCThreadSharedData.h: (JSC::GCThreadSharedData::getNextBlocksToCopy): * heap/ListableHandler.h: (JSC::ListableHandler::List::addThreadSafe): (JSC::ListableHandler::List::addNotThreadSafe): * heap/MachineStackMarker.cpp: (JSC::MachineThreads::tryCopyOtherThreadStacks): * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::copyLater): * parser/SourceProvider.cpp: (JSC::SourceProvider::~SourceProvider): (JSC::SourceProvider::getID): * profiler/ProfilerDatabase.cpp: (JSC::Profiler::Database::addDatabaseToAtExit): (JSC::Profiler::Database::removeDatabaseFromAtExit): (JSC::Profiler::Database::removeFirstAtExitDatabase): * runtime/TypeProfilerLog.h: Source/WebCore: * bindings/objc/WebScriptObject.mm: (WebCore::getJSWrapper): (WebCore::addJSWrapper): (WebCore::removeJSWrapper): (WebCore::removeJSWrapperIfRetainCountOne): * platform/audio/mac/CARingBuffer.cpp: (WebCore::CARingBuffer::setCurrentFrameBounds): (WebCore::CARingBuffer::getCurrentFrameBounds): * platform/audio/mac/CARingBuffer.h: * platform/ios/wak/WAKWindow.mm: (-[WAKWindow setExposedScrollViewRect:]): (-[WAKWindow exposedScrollViewRect]): Source/WebKit2: * WebProcess/WebPage/EventDispatcher.cpp: (WebKit::EventDispatcher::clearQueuedTouchEventsForPage): (WebKit::EventDispatcher::getQueuedTouchEventsForPage): (WebKit::EventDispatcher::touchEvent): (WebKit::EventDispatcher::dispatchTouchEvents): * WebProcess/WebPage/EventDispatcher.h: * WebProcess/WebPage/ViewUpdateDispatcher.cpp: (WebKit::ViewUpdateDispatcher::visibleContentRectUpdate): (WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate): * WebProcess/WebPage/ViewUpdateDispatcher.h: Source/WTF: A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition overhead is lower than system locks and because they take dramatically less space than system locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock acquire is up to 10x faster and under microcontention - short critical section with two or more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4 bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit should continue to avoid system locks - they are just far too slow and far too big. But there is a problem with this idiom. System lock implementations will sleep a thread when it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU. In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for microcontention, but awful when the lock will not be released for a while. In fact, when critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is almost 100x more than the CPU time cost of a system lock. This case doesn't arise too frequently in our current uses of spinlocks, but that's probably because right now there are places where we make a conscious decision to use system locks - even though they use more memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a while to acquire the lock. The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held, the slow path tries some number of spins to acquire the lock, and if that fails, the thread is put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and the lock itself is a tagged pointer: either it is just bits telling us the complete lock state (not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path if the lock is not contended, a short burst of spinning for microcontention, and a full-blown queue for critical sections that are held for a long time. On a locking microbenchmark, this new Lock exhibits the following performance characteristics: - Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster than a system mutex. - Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster than a system mutex. - CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a SpinLock. - Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than a SpinLock. This patch replaces all uses of SpinLock with Lock, since our critical sections are not no-ops so if you do basically anything in your critical section, the Lock overhead will be invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time. This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits of having a lock that just uses a byte are still better than the CPU wastage benefits of Lock. But, this work will enable some future work to create locks that will fit in just 1.6 bits: https://bugs.webkit.org/show_bug.cgi?id=147665. Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the possibility of a null queue head. [1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf * WTF.vcxproj/WTF.vcxproj: * WTF.xcodeproj/project.pbxproj: * benchmarks: Added. * benchmarks/LockSpeedTest.cpp: Added. (main): * wtf/Atomics.h: (WTF::Atomic::compareExchangeWeak): (WTF::Atomic::compareExchangeStrong): * wtf/CMakeLists.txt: * wtf/Lock.cpp: Added. (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): * wtf/Lock.h: Added. (WTF::LockBase::lock): (WTF::LockBase::unlock): (WTF::LockBase::isHeld): (WTF::LockBase::isLocked): (WTF::Lock::Lock): * wtf/MetaAllocator.cpp: (WTF::MetaAllocator::release): (WTF::MetaAllocatorHandle::shrink): (WTF::MetaAllocator::allocate): (WTF::MetaAllocator::currentStatistics): (WTF::MetaAllocator::addFreshFreeSpace): (WTF::MetaAllocator::debugFreeSpaceSize): * wtf/MetaAllocator.h: * wtf/SpinLock.h: * wtf/ThreadingPthreads.cpp: * wtf/ThreadingWin.cpp: * wtf/text/AtomicString.cpp: * wtf/text/AtomicStringImpl.cpp: (WTF::AtomicStringTableLocker::AtomicStringTableLocker): Tools: * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/Lock.cpp: Added. (TestWebKitAPI::runLockTest): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/165908@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
* OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "config.h"
Use pragma once in WTF https://bugs.webkit.org/show_bug.cgi?id=190527 Reviewed by Chris Dumez. Source/WTF: We also need to consistently include wtf headers from within wtf so we can build wtf without symbol redefinition errors from including the copy in Source and the copy in the build directory. * wtf/ASCIICType.h: * wtf/Assertions.cpp: * wtf/Assertions.h: * wtf/Atomics.h: * wtf/AutomaticThread.cpp: * wtf/AutomaticThread.h: * wtf/BackwardsGraph.h: * wtf/Bag.h: * wtf/BagToHashMap.h: * wtf/BitVector.cpp: * wtf/BitVector.h: * wtf/Bitmap.h: * wtf/BloomFilter.h: * wtf/Box.h: * wtf/BubbleSort.h: * wtf/BumpPointerAllocator.h: * wtf/ByteOrder.h: * wtf/CPUTime.cpp: * wtf/CallbackAggregator.h: * wtf/CheckedArithmetic.h: * wtf/CheckedBoolean.h: * wtf/ClockType.cpp: * wtf/ClockType.h: * wtf/CommaPrinter.h: * wtf/CompilationThread.cpp: * wtf/CompilationThread.h: * wtf/Compiler.h: * wtf/ConcurrentPtrHashSet.cpp: * wtf/ConcurrentVector.h: * wtf/Condition.h: * wtf/CountingLock.cpp: * wtf/CrossThreadTaskHandler.cpp: * wtf/CryptographicUtilities.cpp: * wtf/CryptographicUtilities.h: * wtf/CryptographicallyRandomNumber.cpp: * wtf/CryptographicallyRandomNumber.h: * wtf/CurrentTime.cpp: * wtf/DataLog.cpp: * wtf/DataLog.h: * wtf/DateMath.cpp: * wtf/DateMath.h: * wtf/DecimalNumber.cpp: * wtf/DecimalNumber.h: * wtf/Deque.h: * wtf/DisallowCType.h: * wtf/Dominators.h: * wtf/DoublyLinkedList.h: * wtf/FastBitVector.cpp: * wtf/FastMalloc.cpp: * wtf/FastMalloc.h: * wtf/FeatureDefines.h: * wtf/FilePrintStream.cpp: * wtf/FilePrintStream.h: * wtf/FlipBytes.h: * wtf/FunctionDispatcher.cpp: * wtf/FunctionDispatcher.h: * wtf/GetPtr.h: * wtf/Gigacage.cpp: * wtf/GlobalVersion.cpp: * wtf/GraphNodeWorklist.h: * wtf/GregorianDateTime.cpp: * wtf/GregorianDateTime.h: * wtf/HashFunctions.h: * wtf/HashMap.h: * wtf/HashMethod.h: * wtf/HashSet.h: * wtf/HashTable.cpp: * wtf/HashTraits.h: * wtf/Indenter.h: * wtf/IndexSparseSet.h: * wtf/InlineASM.h: * wtf/Insertion.h: * wtf/IteratorAdaptors.h: * wtf/IteratorRange.h: * wtf/JSONValues.cpp: * wtf/JSValueMalloc.cpp: * wtf/LEBDecoder.h: * wtf/Language.cpp: * wtf/ListDump.h: * wtf/Lock.cpp: * wtf/Lock.h: * wtf/LockAlgorithm.h: * wtf/LockedPrintStream.cpp: * wtf/Locker.h: * wtf/MD5.cpp: * wtf/MD5.h: * wtf/MainThread.cpp: * wtf/MainThread.h: * wtf/MallocPtr.h: * wtf/MathExtras.h: * wtf/MediaTime.cpp: * wtf/MediaTime.h: * wtf/MemoryPressureHandler.cpp: * wtf/MessageQueue.h: * wtf/MetaAllocator.cpp: * wtf/MetaAllocator.h: * wtf/MetaAllocatorHandle.h: * wtf/MonotonicTime.cpp: * wtf/MonotonicTime.h: * wtf/NakedPtr.h: * wtf/NoLock.h: * wtf/NoTailCalls.h: * wtf/Noncopyable.h: * wtf/NumberOfCores.cpp: * wtf/NumberOfCores.h: * wtf/OSAllocator.h: * wtf/OSAllocatorPosix.cpp: * wtf/OSRandomSource.cpp: * wtf/OSRandomSource.h: * wtf/ObjcRuntimeExtras.h: * wtf/OrderMaker.h: * wtf/PackedIntVector.h: * wtf/PageAllocation.h: * wtf/PageBlock.cpp: * wtf/PageBlock.h: * wtf/PageReservation.h: * wtf/ParallelHelperPool.cpp: * wtf/ParallelHelperPool.h: * wtf/ParallelJobs.h: * wtf/ParallelJobsLibdispatch.h: * wtf/ParallelVectorIterator.h: * wtf/ParkingLot.cpp: * wtf/ParkingLot.h: * wtf/Platform.h: * wtf/PointerComparison.h: * wtf/Poisoned.cpp: * wtf/PrintStream.cpp: * wtf/PrintStream.h: * wtf/ProcessID.h: * wtf/ProcessPrivilege.cpp: * wtf/RAMSize.cpp: * wtf/RAMSize.h: * wtf/RandomDevice.cpp: * wtf/RandomNumber.cpp: * wtf/RandomNumber.h: * wtf/RandomNumberSeed.h: * wtf/RangeSet.h: * wtf/RawPointer.h: * wtf/ReadWriteLock.cpp: * wtf/RedBlackTree.h: * wtf/Ref.h: * wtf/RefCountedArray.h: * wtf/RefCountedLeakCounter.cpp: * wtf/RefCountedLeakCounter.h: * wtf/RefCounter.h: * wtf/RefPtr.h: * wtf/RetainPtr.h: * wtf/RunLoop.cpp: * wtf/RunLoop.h: * wtf/RunLoopTimer.h: * wtf/RunLoopTimerCF.cpp: * wtf/SHA1.cpp: * wtf/SHA1.h: * wtf/SaturatedArithmetic.h: (saturatedSubtraction): * wtf/SchedulePair.h: * wtf/SchedulePairCF.cpp: * wtf/SchedulePairMac.mm: * wtf/ScopedLambda.h: * wtf/Seconds.cpp: * wtf/Seconds.h: * wtf/SegmentedVector.h: * wtf/SentinelLinkedList.h: * wtf/SharedTask.h: * wtf/SimpleStats.h: * wtf/SingleRootGraph.h: * wtf/SinglyLinkedList.h: * wtf/SixCharacterHash.cpp: * wtf/SixCharacterHash.h: * wtf/SmallPtrSet.h: * wtf/Spectrum.h: * wtf/StackBounds.cpp: * wtf/StackBounds.h: * wtf/StackStats.cpp: * wtf/StackStats.h: * wtf/StackTrace.cpp: * wtf/StdLibExtras.h: * wtf/StreamBuffer.h: * wtf/StringHashDumpContext.h: * wtf/StringPrintStream.cpp: * wtf/StringPrintStream.h: * wtf/ThreadGroup.cpp: * wtf/ThreadMessage.cpp: * wtf/ThreadSpecific.h: * wtf/Threading.cpp: * wtf/Threading.h: * wtf/ThreadingPrimitives.h: * wtf/ThreadingPthreads.cpp: * wtf/TimeWithDynamicClockType.cpp: * wtf/TimeWithDynamicClockType.h: * wtf/TimingScope.cpp: * wtf/TinyLRUCache.h: * wtf/TinyPtrSet.h: * wtf/TriState.h: * wtf/TypeCasts.h: * wtf/UUID.cpp: * wtf/UnionFind.h: * wtf/VMTags.h: * wtf/ValueCheck.h: * wtf/Vector.h: * wtf/VectorTraits.h: * wtf/WallTime.cpp: * wtf/WallTime.h: * wtf/WeakPtr.h: * wtf/WeakRandom.h: * wtf/WordLock.cpp: * wtf/WordLock.h: * wtf/WorkQueue.cpp: * wtf/WorkQueue.h: * wtf/WorkerPool.cpp: * wtf/cf/LanguageCF.cpp: * wtf/cf/RunLoopCF.cpp: * wtf/cocoa/Entitlements.mm: * wtf/cocoa/MachSendRight.cpp: * wtf/cocoa/MainThreadCocoa.mm: * wtf/cocoa/MemoryFootprintCocoa.cpp: * wtf/cocoa/WorkQueueCocoa.cpp: * wtf/dtoa.cpp: * wtf/dtoa.h: * wtf/ios/WebCoreThread.cpp: * wtf/ios/WebCoreThread.h: * wtf/mac/AppKitCompatibilityDeclarations.h: * wtf/mac/DeprecatedSymbolsUsedBySafari.mm: * wtf/mbmalloc.cpp: * wtf/persistence/PersistentCoders.cpp: * wtf/persistence/PersistentDecoder.cpp: * wtf/persistence/PersistentEncoder.cpp: * wtf/spi/cf/CFBundleSPI.h: * wtf/spi/darwin/CommonCryptoSPI.h: * wtf/text/ASCIIFastPath.h: * wtf/text/ASCIILiteral.cpp: * wtf/text/AtomicString.cpp: * wtf/text/AtomicString.h: * wtf/text/AtomicStringHash.h: * wtf/text/AtomicStringImpl.cpp: * wtf/text/AtomicStringImpl.h: * wtf/text/AtomicStringTable.cpp: * wtf/text/AtomicStringTable.h: * wtf/text/Base64.cpp: * wtf/text/CString.cpp: * wtf/text/CString.h: * wtf/text/ConversionMode.h: * wtf/text/ExternalStringImpl.cpp: * wtf/text/IntegerToStringConversion.h: * wtf/text/LChar.h: * wtf/text/LineEnding.cpp: * wtf/text/StringBuffer.h: * wtf/text/StringBuilder.cpp: * wtf/text/StringBuilder.h: * wtf/text/StringBuilderJSON.cpp: * wtf/text/StringCommon.h: * wtf/text/StringConcatenate.h: * wtf/text/StringHash.h: * wtf/text/StringImpl.cpp: * wtf/text/StringImpl.h: * wtf/text/StringOperators.h: * wtf/text/StringView.cpp: * wtf/text/StringView.h: * wtf/text/SymbolImpl.cpp: * wtf/text/SymbolRegistry.cpp: * wtf/text/SymbolRegistry.h: * wtf/text/TextBreakIterator.cpp: * wtf/text/TextBreakIterator.h: * wtf/text/TextBreakIteratorInternalICU.h: * wtf/text/TextPosition.h: * wtf/text/TextStream.cpp: * wtf/text/UniquedStringImpl.h: * wtf/text/WTFString.cpp: * wtf/text/WTFString.h: * wtf/text/cocoa/StringCocoa.mm: * wtf/text/cocoa/StringViewCocoa.mm: * wtf/text/cocoa/TextBreakIteratorInternalICUCocoa.cpp: * wtf/text/icu/UTextProvider.cpp: * wtf/text/icu/UTextProvider.h: * wtf/text/icu/UTextProviderLatin1.cpp: * wtf/text/icu/UTextProviderLatin1.h: * wtf/text/icu/UTextProviderUTF16.cpp: * wtf/text/icu/UTextProviderUTF16.h: * wtf/threads/BinarySemaphore.cpp: * wtf/threads/BinarySemaphore.h: * wtf/threads/Signals.cpp: * wtf/unicode/CharacterNames.h: * wtf/unicode/Collator.h: * wtf/unicode/CollatorDefault.cpp: * wtf/unicode/UTF8.cpp: * wtf/unicode/UTF8.h: Tools: Put WorkQueue in namespace DRT so it does not conflict with WTF::WorkQueue. * DumpRenderTree/TestRunner.cpp: (TestRunner::queueLoadHTMLString): (TestRunner::queueLoadAlternateHTMLString): (TestRunner::queueBackNavigation): (TestRunner::queueForwardNavigation): (TestRunner::queueLoadingScript): (TestRunner::queueNonLoadingScript): (TestRunner::queueReload): * DumpRenderTree/WorkQueue.cpp: (WorkQueue::singleton): Deleted. (WorkQueue::WorkQueue): Deleted. (WorkQueue::queue): Deleted. (WorkQueue::dequeue): Deleted. (WorkQueue::count): Deleted. (WorkQueue::clear): Deleted. (WorkQueue::processWork): Deleted. * DumpRenderTree/WorkQueue.h: (WorkQueue::setFrozen): Deleted. * DumpRenderTree/WorkQueueItem.h: * DumpRenderTree/mac/DumpRenderTree.mm: (runTest): * DumpRenderTree/mac/FrameLoadDelegate.mm: (-[FrameLoadDelegate processWork:]): (-[FrameLoadDelegate webView:locationChangeDone:forDataSource:]): * DumpRenderTree/mac/TestRunnerMac.mm: (TestRunner::notifyDone): (TestRunner::forceImmediateCompletion): (TestRunner::queueLoad): * DumpRenderTree/win/DumpRenderTree.cpp: (runTest): * DumpRenderTree/win/FrameLoadDelegate.cpp: (FrameLoadDelegate::processWork): (FrameLoadDelegate::locationChangeDone): * DumpRenderTree/win/TestRunnerWin.cpp: (TestRunner::notifyDone): (TestRunner::forceImmediateCompletion): (TestRunner::queueLoad): Canonical link: https://commits.webkit.org/205473@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@237099 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2018-10-15 14:24:49 +00:00
#include <wtf/Lock.h>
Lightweight locks should be adaptive https://bugs.webkit.org/show_bug.cgi?id=147545 Reviewed by Geoffrey Garen. Source/JavaScriptCore: * dfg/DFGCommon.cpp: (JSC::DFG::startCrashing): * heap/CopiedBlock.h: (JSC::CopiedBlock::workListLock): * heap/CopiedBlockInlines.h: (JSC::CopiedBlock::shouldReportLiveBytes): (JSC::CopiedBlock::reportLiveBytes): * heap/CopiedSpace.cpp: (JSC::CopiedSpace::doneFillingBlock): * heap/CopiedSpace.h: (JSC::CopiedSpace::CopiedGeneration::CopiedGeneration): * heap/CopiedSpaceInlines.h: (JSC::CopiedSpace::recycleEvacuatedBlock): * heap/GCThreadSharedData.cpp: (JSC::GCThreadSharedData::didStartCopying): * heap/GCThreadSharedData.h: (JSC::GCThreadSharedData::getNextBlocksToCopy): * heap/ListableHandler.h: (JSC::ListableHandler::List::addThreadSafe): (JSC::ListableHandler::List::addNotThreadSafe): * heap/MachineStackMarker.cpp: (JSC::MachineThreads::tryCopyOtherThreadStacks): * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::copyLater): * parser/SourceProvider.cpp: (JSC::SourceProvider::~SourceProvider): (JSC::SourceProvider::getID): * profiler/ProfilerDatabase.cpp: (JSC::Profiler::Database::addDatabaseToAtExit): (JSC::Profiler::Database::removeDatabaseFromAtExit): (JSC::Profiler::Database::removeFirstAtExitDatabase): * runtime/TypeProfilerLog.h: Source/WebCore: * bindings/objc/WebScriptObject.mm: (WebCore::getJSWrapper): (WebCore::addJSWrapper): (WebCore::removeJSWrapper): (WebCore::removeJSWrapperIfRetainCountOne): * platform/audio/mac/CARingBuffer.cpp: (WebCore::CARingBuffer::setCurrentFrameBounds): (WebCore::CARingBuffer::getCurrentFrameBounds): * platform/audio/mac/CARingBuffer.h: * platform/ios/wak/WAKWindow.mm: (-[WAKWindow setExposedScrollViewRect:]): (-[WAKWindow exposedScrollViewRect]): Source/WebKit2: * WebProcess/WebPage/EventDispatcher.cpp: (WebKit::EventDispatcher::clearQueuedTouchEventsForPage): (WebKit::EventDispatcher::getQueuedTouchEventsForPage): (WebKit::EventDispatcher::touchEvent): (WebKit::EventDispatcher::dispatchTouchEvents): * WebProcess/WebPage/EventDispatcher.h: * WebProcess/WebPage/ViewUpdateDispatcher.cpp: (WebKit::ViewUpdateDispatcher::visibleContentRectUpdate): (WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate): * WebProcess/WebPage/ViewUpdateDispatcher.h: Source/WTF: A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition overhead is lower than system locks and because they take dramatically less space than system locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock acquire is up to 10x faster and under microcontention - short critical section with two or more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4 bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit should continue to avoid system locks - they are just far too slow and far too big. But there is a problem with this idiom. System lock implementations will sleep a thread when it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU. In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for microcontention, but awful when the lock will not be released for a while. In fact, when critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is almost 100x more than the CPU time cost of a system lock. This case doesn't arise too frequently in our current uses of spinlocks, but that's probably because right now there are places where we make a conscious decision to use system locks - even though they use more memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a while to acquire the lock. The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held, the slow path tries some number of spins to acquire the lock, and if that fails, the thread is put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and the lock itself is a tagged pointer: either it is just bits telling us the complete lock state (not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path if the lock is not contended, a short burst of spinning for microcontention, and a full-blown queue for critical sections that are held for a long time. On a locking microbenchmark, this new Lock exhibits the following performance characteristics: - Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster than a system mutex. - Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster than a system mutex. - CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a SpinLock. - Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than a SpinLock. This patch replaces all uses of SpinLock with Lock, since our critical sections are not no-ops so if you do basically anything in your critical section, the Lock overhead will be invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time. This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits of having a lock that just uses a byte are still better than the CPU wastage benefits of Lock. But, this work will enable some future work to create locks that will fit in just 1.6 bits: https://bugs.webkit.org/show_bug.cgi?id=147665. Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the possibility of a null queue head. [1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf * WTF.vcxproj/WTF.vcxproj: * WTF.xcodeproj/project.pbxproj: * benchmarks: Added. * benchmarks/LockSpeedTest.cpp: Added. (main): * wtf/Atomics.h: (WTF::Atomic::compareExchangeWeak): (WTF::Atomic::compareExchangeStrong): * wtf/CMakeLists.txt: * wtf/Lock.cpp: Added. (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): * wtf/Lock.h: Added. (WTF::LockBase::lock): (WTF::LockBase::unlock): (WTF::LockBase::isHeld): (WTF::LockBase::isLocked): (WTF::Lock::Lock): * wtf/MetaAllocator.cpp: (WTF::MetaAllocator::release): (WTF::MetaAllocatorHandle::shrink): (WTF::MetaAllocator::allocate): (WTF::MetaAllocator::currentStatistics): (WTF::MetaAllocator::addFreshFreeSpace): (WTF::MetaAllocator::debugFreeSpaceSize): * wtf/MetaAllocator.h: * wtf/SpinLock.h: * wtf/ThreadingPthreads.cpp: * wtf/ThreadingWin.cpp: * wtf/text/AtomicString.cpp: * wtf/text/AtomicStringImpl.cpp: (WTF::AtomicStringTableLocker::AtomicStringTableLocker): Tools: * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/Lock.cpp: Added. (TestWebKitAPI::runLockTest): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/165908@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
It should be easy to decide how WebKit yields https://bugs.webkit.org/show_bug.cgi?id=174298 Reviewed by Saam Barati. Source/bmalloc: Use sched_yield() explicitly. * bmalloc/StaticMutex.cpp: (bmalloc::StaticMutex::lockSlowCase): Source/JavaScriptCore: Use the new WTF::Thread::yield() function for yielding instead of the C++ function. * heap/Heap.cpp: (JSC::Heap::resumeThePeriphery): * heap/VisitingTimeout.h: * runtime/JSCell.cpp: (JSC::JSCell::lockSlow): (JSC::JSCell::unlockSlow): * runtime/JSCell.h: * runtime/JSCellInlines.h: (JSC::JSCell::lock): (JSC::JSCell::unlock): * runtime/JSLock.cpp: (JSC::JSLock::grabAllLocks): * runtime/SamplingProfiler.cpp: Source/WebCore: No new tests because the WebCore change is just a change to how we #include things. * inspector/InspectorPageAgent.h: * inspector/TimelineRecordFactory.h: * workers/Worker.h: * workers/WorkerGlobalScopeProxy.h: * workers/WorkerMessagingProxy.h: Source/WebKitLegacy: * Storage/StorageTracker.h: Source/WTF: Created a Thread::yield() abstraction for sched_yield(), and made WTF use it everywhere that it had previously used std::this_thread::yield(). To make it less annoying to experiment with changes to the lock algorithm in the future, this also moves the meat of the algorithm into LockAlgorithmInlines.h. Only two files include that header. Since LockAlgorithm.h no longer includes ParkingLot.h, a bunch of files in WK now need to include timing headers (Seconds, MonotonicTime, etc) manually. * WTF.xcodeproj/project.pbxproj: * benchmarks/ToyLocks.h: * wtf/CMakeLists.txt: * wtf/Lock.cpp: * wtf/LockAlgorithm.h: (WTF::LockAlgorithm::lockSlow): Deleted. (WTF::LockAlgorithm::unlockSlow): Deleted. * wtf/LockAlgorithmInlines.h: Added. (WTF::hasParkedBit>::lockSlow): (WTF::hasParkedBit>::unlockSlow): * wtf/MainThread.cpp: * wtf/RunLoopTimer.h: * wtf/Threading.cpp: * wtf/Threading.h: * wtf/ThreadingPthreads.cpp: (WTF::Thread::yield): * wtf/ThreadingWin.cpp: (WTF::Thread::yield): * wtf/WordLock.cpp: (WTF::WordLockBase::lockSlow): (WTF::WordLockBase::unlockSlow): Canonical link: https://commits.webkit.org/191572@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@219763 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-07-22 14:36:18 +00:00
#include <wtf/LockAlgorithmInlines.h>
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
#include <wtf/StackShotProfiler.h>
It should be easy to decide how WebKit yields https://bugs.webkit.org/show_bug.cgi?id=174298 Reviewed by Saam Barati. Source/bmalloc: Use sched_yield() explicitly. * bmalloc/StaticMutex.cpp: (bmalloc::StaticMutex::lockSlowCase): Source/JavaScriptCore: Use the new WTF::Thread::yield() function for yielding instead of the C++ function. * heap/Heap.cpp: (JSC::Heap::resumeThePeriphery): * heap/VisitingTimeout.h: * runtime/JSCell.cpp: (JSC::JSCell::lockSlow): (JSC::JSCell::unlockSlow): * runtime/JSCell.h: * runtime/JSCellInlines.h: (JSC::JSCell::lock): (JSC::JSCell::unlock): * runtime/JSLock.cpp: (JSC::JSLock::grabAllLocks): * runtime/SamplingProfiler.cpp: Source/WebCore: No new tests because the WebCore change is just a change to how we #include things. * inspector/InspectorPageAgent.h: * inspector/TimelineRecordFactory.h: * workers/Worker.h: * workers/WorkerGlobalScopeProxy.h: * workers/WorkerMessagingProxy.h: Source/WebKitLegacy: * Storage/StorageTracker.h: Source/WTF: Created a Thread::yield() abstraction for sched_yield(), and made WTF use it everywhere that it had previously used std::this_thread::yield(). To make it less annoying to experiment with changes to the lock algorithm in the future, this also moves the meat of the algorithm into LockAlgorithmInlines.h. Only two files include that header. Since LockAlgorithm.h no longer includes ParkingLot.h, a bunch of files in WK now need to include timing headers (Seconds, MonotonicTime, etc) manually. * WTF.xcodeproj/project.pbxproj: * benchmarks/ToyLocks.h: * wtf/CMakeLists.txt: * wtf/Lock.cpp: * wtf/LockAlgorithm.h: (WTF::LockAlgorithm::lockSlow): Deleted. (WTF::LockAlgorithm::unlockSlow): Deleted. * wtf/LockAlgorithmInlines.h: Added. (WTF::hasParkedBit>::lockSlow): (WTF::hasParkedBit>::unlockSlow): * wtf/MainThread.cpp: * wtf/RunLoopTimer.h: * wtf/Threading.cpp: * wtf/Threading.h: * wtf/ThreadingPthreads.cpp: (WTF::Thread::yield): * wtf/ThreadingWin.cpp: (WTF::Thread::yield): * wtf/WordLock.cpp: (WTF::WordLockBase::lockSlow): (WTF::WordLockBase::unlockSlow): Canonical link: https://commits.webkit.org/191572@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@219763 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-07-22 14:36:18 +00:00
#if OS(WINDOWS)
#include <windows.h>
#else
#include <unistd.h>
#endif
Lightweight locks should be adaptive https://bugs.webkit.org/show_bug.cgi?id=147545 Reviewed by Geoffrey Garen. Source/JavaScriptCore: * dfg/DFGCommon.cpp: (JSC::DFG::startCrashing): * heap/CopiedBlock.h: (JSC::CopiedBlock::workListLock): * heap/CopiedBlockInlines.h: (JSC::CopiedBlock::shouldReportLiveBytes): (JSC::CopiedBlock::reportLiveBytes): * heap/CopiedSpace.cpp: (JSC::CopiedSpace::doneFillingBlock): * heap/CopiedSpace.h: (JSC::CopiedSpace::CopiedGeneration::CopiedGeneration): * heap/CopiedSpaceInlines.h: (JSC::CopiedSpace::recycleEvacuatedBlock): * heap/GCThreadSharedData.cpp: (JSC::GCThreadSharedData::didStartCopying): * heap/GCThreadSharedData.h: (JSC::GCThreadSharedData::getNextBlocksToCopy): * heap/ListableHandler.h: (JSC::ListableHandler::List::addThreadSafe): (JSC::ListableHandler::List::addNotThreadSafe): * heap/MachineStackMarker.cpp: (JSC::MachineThreads::tryCopyOtherThreadStacks): * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::copyLater): * parser/SourceProvider.cpp: (JSC::SourceProvider::~SourceProvider): (JSC::SourceProvider::getID): * profiler/ProfilerDatabase.cpp: (JSC::Profiler::Database::addDatabaseToAtExit): (JSC::Profiler::Database::removeDatabaseFromAtExit): (JSC::Profiler::Database::removeFirstAtExitDatabase): * runtime/TypeProfilerLog.h: Source/WebCore: * bindings/objc/WebScriptObject.mm: (WebCore::getJSWrapper): (WebCore::addJSWrapper): (WebCore::removeJSWrapper): (WebCore::removeJSWrapperIfRetainCountOne): * platform/audio/mac/CARingBuffer.cpp: (WebCore::CARingBuffer::setCurrentFrameBounds): (WebCore::CARingBuffer::getCurrentFrameBounds): * platform/audio/mac/CARingBuffer.h: * platform/ios/wak/WAKWindow.mm: (-[WAKWindow setExposedScrollViewRect:]): (-[WAKWindow exposedScrollViewRect]): Source/WebKit2: * WebProcess/WebPage/EventDispatcher.cpp: (WebKit::EventDispatcher::clearQueuedTouchEventsForPage): (WebKit::EventDispatcher::getQueuedTouchEventsForPage): (WebKit::EventDispatcher::touchEvent): (WebKit::EventDispatcher::dispatchTouchEvents): * WebProcess/WebPage/EventDispatcher.h: * WebProcess/WebPage/ViewUpdateDispatcher.cpp: (WebKit::ViewUpdateDispatcher::visibleContentRectUpdate): (WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate): * WebProcess/WebPage/ViewUpdateDispatcher.h: Source/WTF: A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition overhead is lower than system locks and because they take dramatically less space than system locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock acquire is up to 10x faster and under microcontention - short critical section with two or more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4 bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit should continue to avoid system locks - they are just far too slow and far too big. But there is a problem with this idiom. System lock implementations will sleep a thread when it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU. In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for microcontention, but awful when the lock will not be released for a while. In fact, when critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is almost 100x more than the CPU time cost of a system lock. This case doesn't arise too frequently in our current uses of spinlocks, but that's probably because right now there are places where we make a conscious decision to use system locks - even though they use more memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a while to acquire the lock. The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held, the slow path tries some number of spins to acquire the lock, and if that fails, the thread is put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and the lock itself is a tagged pointer: either it is just bits telling us the complete lock state (not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path if the lock is not contended, a short burst of spinning for microcontention, and a full-blown queue for critical sections that are held for a long time. On a locking microbenchmark, this new Lock exhibits the following performance characteristics: - Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster than a system mutex. - Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster than a system mutex. - CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a SpinLock. - Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than a SpinLock. This patch replaces all uses of SpinLock with Lock, since our critical sections are not no-ops so if you do basically anything in your critical section, the Lock overhead will be invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time. This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits of having a lock that just uses a byte are still better than the CPU wastage benefits of Lock. But, this work will enable some future work to create locks that will fit in just 1.6 bits: https://bugs.webkit.org/show_bug.cgi?id=147665. Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the possibility of a null queue head. [1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf * WTF.vcxproj/WTF.vcxproj: * WTF.xcodeproj/project.pbxproj: * benchmarks: Added. * benchmarks/LockSpeedTest.cpp: Added. (main): * wtf/Atomics.h: (WTF::Atomic::compareExchangeWeak): (WTF::Atomic::compareExchangeStrong): * wtf/CMakeLists.txt: * wtf/Lock.cpp: Added. (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): * wtf/Lock.h: Added. (WTF::LockBase::lock): (WTF::LockBase::unlock): (WTF::LockBase::isHeld): (WTF::LockBase::isLocked): (WTF::Lock::Lock): * wtf/MetaAllocator.cpp: (WTF::MetaAllocator::release): (WTF::MetaAllocatorHandle::shrink): (WTF::MetaAllocator::allocate): (WTF::MetaAllocator::currentStatistics): (WTF::MetaAllocator::addFreshFreeSpace): (WTF::MetaAllocator::debugFreeSpaceSize): * wtf/MetaAllocator.h: * wtf/SpinLock.h: * wtf/ThreadingPthreads.cpp: * wtf/ThreadingWin.cpp: * wtf/text/AtomicString.cpp: * wtf/text/AtomicStringImpl.cpp: (WTF::AtomicStringTableLocker::AtomicStringTableLocker): Tools: * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/Lock.cpp: Added. (TestWebKitAPI::runLockTest): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/165908@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
namespace WTF {
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
static constexpr bool profileLockContention = false;
void Lock::lockSlow()
Lightweight locks should be adaptive https://bugs.webkit.org/show_bug.cgi?id=147545 Reviewed by Geoffrey Garen. Source/JavaScriptCore: * dfg/DFGCommon.cpp: (JSC::DFG::startCrashing): * heap/CopiedBlock.h: (JSC::CopiedBlock::workListLock): * heap/CopiedBlockInlines.h: (JSC::CopiedBlock::shouldReportLiveBytes): (JSC::CopiedBlock::reportLiveBytes): * heap/CopiedSpace.cpp: (JSC::CopiedSpace::doneFillingBlock): * heap/CopiedSpace.h: (JSC::CopiedSpace::CopiedGeneration::CopiedGeneration): * heap/CopiedSpaceInlines.h: (JSC::CopiedSpace::recycleEvacuatedBlock): * heap/GCThreadSharedData.cpp: (JSC::GCThreadSharedData::didStartCopying): * heap/GCThreadSharedData.h: (JSC::GCThreadSharedData::getNextBlocksToCopy): * heap/ListableHandler.h: (JSC::ListableHandler::List::addThreadSafe): (JSC::ListableHandler::List::addNotThreadSafe): * heap/MachineStackMarker.cpp: (JSC::MachineThreads::tryCopyOtherThreadStacks): * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::copyLater): * parser/SourceProvider.cpp: (JSC::SourceProvider::~SourceProvider): (JSC::SourceProvider::getID): * profiler/ProfilerDatabase.cpp: (JSC::Profiler::Database::addDatabaseToAtExit): (JSC::Profiler::Database::removeDatabaseFromAtExit): (JSC::Profiler::Database::removeFirstAtExitDatabase): * runtime/TypeProfilerLog.h: Source/WebCore: * bindings/objc/WebScriptObject.mm: (WebCore::getJSWrapper): (WebCore::addJSWrapper): (WebCore::removeJSWrapper): (WebCore::removeJSWrapperIfRetainCountOne): * platform/audio/mac/CARingBuffer.cpp: (WebCore::CARingBuffer::setCurrentFrameBounds): (WebCore::CARingBuffer::getCurrentFrameBounds): * platform/audio/mac/CARingBuffer.h: * platform/ios/wak/WAKWindow.mm: (-[WAKWindow setExposedScrollViewRect:]): (-[WAKWindow exposedScrollViewRect]): Source/WebKit2: * WebProcess/WebPage/EventDispatcher.cpp: (WebKit::EventDispatcher::clearQueuedTouchEventsForPage): (WebKit::EventDispatcher::getQueuedTouchEventsForPage): (WebKit::EventDispatcher::touchEvent): (WebKit::EventDispatcher::dispatchTouchEvents): * WebProcess/WebPage/EventDispatcher.h: * WebProcess/WebPage/ViewUpdateDispatcher.cpp: (WebKit::ViewUpdateDispatcher::visibleContentRectUpdate): (WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate): * WebProcess/WebPage/ViewUpdateDispatcher.h: Source/WTF: A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition overhead is lower than system locks and because they take dramatically less space than system locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock acquire is up to 10x faster and under microcontention - short critical section with two or more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4 bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit should continue to avoid system locks - they are just far too slow and far too big. But there is a problem with this idiom. System lock implementations will sleep a thread when it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU. In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for microcontention, but awful when the lock will not be released for a while. In fact, when critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is almost 100x more than the CPU time cost of a system lock. This case doesn't arise too frequently in our current uses of spinlocks, but that's probably because right now there are places where we make a conscious decision to use system locks - even though they use more memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a while to acquire the lock. The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held, the slow path tries some number of spins to acquire the lock, and if that fails, the thread is put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and the lock itself is a tagged pointer: either it is just bits telling us the complete lock state (not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path if the lock is not contended, a short burst of spinning for microcontention, and a full-blown queue for critical sections that are held for a long time. On a locking microbenchmark, this new Lock exhibits the following performance characteristics: - Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster than a system mutex. - Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster than a system mutex. - CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a SpinLock. - Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than a SpinLock. This patch replaces all uses of SpinLock with Lock, since our critical sections are not no-ops so if you do basically anything in your critical section, the Lock overhead will be invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time. This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits of having a lock that just uses a byte are still better than the CPU wastage benefits of Lock. But, this work will enable some future work to create locks that will fit in just 1.6 bits: https://bugs.webkit.org/show_bug.cgi?id=147665. Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the possibility of a null queue head. [1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf * WTF.vcxproj/WTF.vcxproj: * WTF.xcodeproj/project.pbxproj: * benchmarks: Added. * benchmarks/LockSpeedTest.cpp: Added. (main): * wtf/Atomics.h: (WTF::Atomic::compareExchangeWeak): (WTF::Atomic::compareExchangeStrong): * wtf/CMakeLists.txt: * wtf/Lock.cpp: Added. (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): * wtf/Lock.h: Added. (WTF::LockBase::lock): (WTF::LockBase::unlock): (WTF::LockBase::isHeld): (WTF::LockBase::isLocked): (WTF::Lock::Lock): * wtf/MetaAllocator.cpp: (WTF::MetaAllocator::release): (WTF::MetaAllocatorHandle::shrink): (WTF::MetaAllocator::allocate): (WTF::MetaAllocator::currentStatistics): (WTF::MetaAllocator::addFreshFreeSpace): (WTF::MetaAllocator::debugFreeSpaceSize): * wtf/MetaAllocator.h: * wtf/SpinLock.h: * wtf/ThreadingPthreads.cpp: * wtf/ThreadingWin.cpp: * wtf/text/AtomicString.cpp: * wtf/text/AtomicStringImpl.cpp: (WTF::AtomicStringTableLocker::AtomicStringTableLocker): Tools: * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/Lock.cpp: Added. (TestWebKitAPI::runLockTest): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/165908@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
{
GC constraint solving should be parallel https://bugs.webkit.org/show_bug.cgi?id=179934 Reviewed by JF Bastien. PerformanceTests: Added a version of splay that measures latency in a way that run-jsc-benchmarks groks. * Octane/splay.js: Added. (this.Setup.setup.setup): (this.TearDown.tearDown.tearDown): (Benchmark): (BenchmarkResult): (BenchmarkResult.prototype.valueOf): (BenchmarkSuite): (alert): (Math.random): (BenchmarkSuite.ResetRNG): (RunStep): (BenchmarkSuite.RunSuites): (BenchmarkSuite.CountBenchmarks): (BenchmarkSuite.GeometricMean): (BenchmarkSuite.GeometricMeanTime): (BenchmarkSuite.AverageAbovePercentile): (BenchmarkSuite.GeometricMeanLatency): (BenchmarkSuite.FormatScore): (BenchmarkSuite.prototype.NotifyStep): (BenchmarkSuite.prototype.NotifyResult): (BenchmarkSuite.prototype.NotifyError): (BenchmarkSuite.prototype.RunSingleBenchmark): (RunNextSetup): (RunNextBenchmark): (RunNextTearDown): (BenchmarkSuite.prototype.RunStep): (GeneratePayloadTree): (GenerateKey): (SplayUpdateStats): (InsertNewNode): (SplaySetup): (SplayTearDown): (SplayRun): (SplayTree): (SplayTree.prototype.isEmpty): (SplayTree.prototype.insert): (SplayTree.prototype.remove): (SplayTree.prototype.find): (SplayTree.prototype.findMax): (SplayTree.prototype.findGreatestLessThan): (SplayTree.prototype.exportKeys): (SplayTree.prototype.splay_): (SplayTree.Node): (SplayTree.Node.prototype.traverse_): (report): (start): Source/JavaScriptCore: This makes it possible to do constraint solving in parallel. This looks like a 1% Speedometer speed-up. It's more than 1% on trunk-Speedometer. The constraint solver supports running constraints in parallel in two different ways: - Run multiple constraints in parallel to each other. This only works for constraints that can tolerate other constraints running concurrently to them (constraint.concurrency() == ConstraintConcurrency::Concurrent). This is the most basic kind of parallelism that the constraint solver supports. All constraints except the JSC SPI constraints are concurrent. We could probably make them concurrent, but I'm playing it safe for now. - A constraint can create parallel work for itself, which the constraint solver will interleave with other stuff. A constraint can report that it has parallel work by returning ConstraintParallelism::Parallel from its executeImpl() function. Then the solver will allow that constraint's doParallelWorkImpl() function to run on as many GC marker threads as are available, for as long as that function wants to run. It's not possible to have a non-concurrent constraint that creates parallel work. The parallelism is implemented in terms of the existing GC marker threads. This turns out to be most natural for two reasons: - No need to start any other threads. - The constraints all want to be passed a SlotVisitor. Running on the marker threads means having access to those threads' SlotVisitors. Also, it means less load balancing. The solver will create work on each marking thread's SlotVisitor. When the solver is done "stealing" a marker thread, that thread will have work it can start doing immediately. Before this change, we had to contribute the work found by the constraint solver to the global worklist so that it could be distributed to the marker threads by load balancing. This change probably helps to avoid that load balancing step. A lot of this change is about making it easy to iterate GC data structures in parallel. This change makes almost all constraints parallel-enabled, but only the DOM's output constraint uses the parallel work API. That constraint iterates the marked cells in two subspaces. This change makes it very easy to compose parallel iterators over subspaces, allocators, blocks, and cells. The marked cell parallel iterator is composed out of parallel iterators for the others. A parallel iterator is just an iterator that can do an atomic next() very quickly. We abstract them using RefPtr<SharedTask<...()>>, where ... is the type returned from the iterator. We know it's done when it returns a falsish version of ... (in the current code, that's always a pointer type, so done is indicated by null). * API/JSMarkingConstraintPrivate.cpp: (JSContextGroupAddMarkingConstraint): * API/JSVirtualMachine.mm: (scanExternalObjectGraph): (scanExternalRememberedSet): * JavaScriptCore.xcodeproj/project.pbxproj: * Sources.txt: * bytecode/AccessCase.cpp: (JSC::AccessCase::propagateTransitions const): * bytecode/CodeBlock.cpp: (JSC::CodeBlock::visitWeakly): (JSC::CodeBlock::shouldJettisonDueToOldAge): (JSC::shouldMarkTransition): (JSC::CodeBlock::propagateTransitions): (JSC::CodeBlock::determineLiveness): * dfg/DFGWorklist.cpp: * ftl/FTLCompile.cpp: (JSC::FTL::compile): * heap/ConstraintParallelism.h: Added. (WTF::printInternal): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::addToRememberedSet): (JSC::Heap::runFixpointPhase): (JSC::Heap::stopThePeriphery): (JSC::Heap::resumeThePeriphery): (JSC::Heap::addCoreConstraints): (JSC::Heap::setBonusVisitorTask): (JSC::Heap::runTaskInParallel): (JSC::Heap::forEachSlotVisitor): Deleted. * heap/Heap.h: (JSC::Heap::worldIsRunning const): (JSC::Heap::runFunctionInParallel): * heap/HeapInlines.h: (JSC::Heap::worldIsStopped const): (JSC::Heap::isMarked): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): (JSC::Heap::forEachSlotVisitor): (JSC::Heap::collectorBelievesThatTheWorldIsStopped const): Deleted. (JSC::Heap::isMarkedConcurrently): Deleted. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::appendNode): * heap/LargeAllocation.h: (JSC::LargeAllocation::isMarked): (JSC::LargeAllocation::isMarkedConcurrently): Deleted. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::parallelNotEmptyBlockSource): * heap/MarkedAllocator.h: * heap/MarkedBlock.h: (JSC::MarkedBlock::aboutToMark): (JSC::MarkedBlock::isMarked): (JSC::MarkedBlock::areMarksStaleWithDependency): Deleted. (JSC::MarkedBlock::isMarkedConcurrently): Deleted. * heap/MarkedSpace.h: (JSC::MarkedSpace::activeWeakSetsBegin): (JSC::MarkedSpace::activeWeakSetsEnd): (JSC::MarkedSpace::newActiveWeakSetsBegin): (JSC::MarkedSpace::newActiveWeakSetsEnd): * heap/MarkingConstraint.cpp: (JSC::MarkingConstraint::MarkingConstraint): (JSC::MarkingConstraint::execute): (JSC::MarkingConstraint::quickWorkEstimate): (JSC::MarkingConstraint::workEstimate): (JSC::MarkingConstraint::doParallelWork): (JSC::MarkingConstraint::finishParallelWork): (JSC::MarkingConstraint::doParallelWorkImpl): (JSC::MarkingConstraint::finishParallelWorkImpl): * heap/MarkingConstraint.h: (JSC::MarkingConstraint::lastExecuteParallelism const): (JSC::MarkingConstraint::parallelism const): (JSC::MarkingConstraint::quickWorkEstimate): Deleted. (JSC::MarkingConstraint::workEstimate): Deleted. * heap/MarkingConstraintSet.cpp: (JSC::MarkingConstraintSet::MarkingConstraintSet): (JSC::MarkingConstraintSet::add): (JSC::MarkingConstraintSet::executeConvergence): (JSC::MarkingConstraintSet::executeConvergenceImpl): (JSC::MarkingConstraintSet::executeAll): (JSC::MarkingConstraintSet::ExecutionContext::ExecutionContext): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didVisitSomething const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::shouldTimeOut const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::drain): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::didExecute const): Deleted. (JSC::MarkingConstraintSet::ExecutionContext::execute): Deleted. (): Deleted. * heap/MarkingConstraintSet.h: * heap/MarkingConstraintSolver.cpp: Added. (JSC::MarkingConstraintSolver::MarkingConstraintSolver): (JSC::MarkingConstraintSolver::~MarkingConstraintSolver): (JSC::MarkingConstraintSolver::didVisitSomething const): (JSC::MarkingConstraintSolver::execute): (JSC::MarkingConstraintSolver::drain): (JSC::MarkingConstraintSolver::converge): (JSC::MarkingConstraintSolver::runExecutionThread): (JSC::MarkingConstraintSolver::didExecute): * heap/MarkingConstraintSolver.h: Added. * heap/OpaqueRootSet.h: Removed. * heap/ParallelSourceAdapter.h: Added. (JSC::ParallelSourceAdapter::ParallelSourceAdapter): (JSC::createParallelSourceAdapter): * heap/SimpleMarkingConstraint.cpp: Added. (JSC::SimpleMarkingConstraint::SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::~SimpleMarkingConstraint): (JSC::SimpleMarkingConstraint::quickWorkEstimate): (JSC::SimpleMarkingConstraint::executeImpl): * heap/SimpleMarkingConstraint.h: Added. * heap/SlotVisitor.cpp: (JSC::SlotVisitor::didStartMarking): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::updateMutatorIsStopped): (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate const): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::performIncrementOfDraining): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): (JSC::SlotVisitor::waitForTermination): (JSC::SlotVisitor::addOpaqueRoot): Deleted. (JSC::SlotVisitor::containsOpaqueRoot const): Deleted. (JSC::SlotVisitor::containsOpaqueRootTriState const): Deleted. (JSC::SlotVisitor::mergeIfNecessary): Deleted. (JSC::SlotVisitor::mergeOpaqueRootsIfProfitable): Deleted. (JSC::SlotVisitor::mergeOpaqueRoots): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::addOpaqueRoot): (JSC::SlotVisitor::containsOpaqueRoot const): (JSC::SlotVisitor::vm): (JSC::SlotVisitor::vm const): * heap/Subspace.cpp: (JSC::Subspace::parallelAllocatorSource): (JSC::Subspace::parallelNotEmptyMarkedBlockSource): * heap/Subspace.h: * heap/SubspaceInlines.h: (JSC::Subspace::forEachMarkedCellInParallel): * heap/VisitCounter.h: Added. (JSC::VisitCounter::VisitCounter): (JSC::VisitCounter::visitCount const): * heap/VisitingTimeout.h: Removed. * heap/WeakBlock.cpp: (JSC::WeakBlock::specializedVisit): * runtime/Structure.cpp: (JSC::Structure::isCheapDuringGC): (JSC::Structure::markIfCheap): Source/WebCore: No new tests because no change in behavior. This change is best tested using DOM-GC-intensive benchmarks like Speedometer and Dromaeo. This parallelizes the DOM's output constraint, and makes some small changes to make this more scalable. * ForwardingHeaders/heap/SimpleMarkingConstraint.h: Added. * ForwardingHeaders/heap/VisitingTimeout.h: Removed. * Sources.txt: * WebCore.xcodeproj/project.pbxproj: * bindings/js/DOMGCOutputConstraint.cpp: Added. (WebCore::DOMGCOutputConstraint::DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::~DOMGCOutputConstraint): (WebCore::DOMGCOutputConstraint::executeImpl): (WebCore::DOMGCOutputConstraint::doParallelWorkImpl): (WebCore::DOMGCOutputConstraint::finishParallelWorkImpl): * bindings/js/DOMGCOutputConstraint.h: Added. * bindings/js/WebCoreJSClientData.cpp: (WebCore::JSVMClientData::initNormalWorld): * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): Source/WTF: This does some changes to make it easier to do parallel constraint solving: - I finally removed dependencyWith. This was a silly construct whose only purpose is to confuse people about what it means to have a dependency chain. I took that as an opportunity to grealy simplify the GC's use of dependency chaining. - Added more logic to Deque<>, since I use it for part of the load balancer. - Made it possible to profile lock contention. See https://bugs.webkit.org/show_bug.cgi?id=180250#c0 for some preliminary measurements. - Introduced holdLockIf, which makes it easy to perform predicated lock acquisition. We use that to pick a lock in WebCore. - Introduced CountingLock. It's like WTF::Lock except it also enables optimistic read transactions sorta like Java's StampedLock. * WTF.xcodeproj/project.pbxproj: * wtf/Atomics.h: (WTF::dependency): (WTF::DependencyWith::DependencyWith): Deleted. (WTF::dependencyWith): Deleted. * wtf/BitVector.h: (WTF::BitVector::iterator::operator++): * wtf/CMakeLists.txt: * wtf/ConcurrentPtrHashSet.cpp: Added. (WTF::ConcurrentPtrHashSet::ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::~ConcurrentPtrHashSet): (WTF::ConcurrentPtrHashSet::deleteOldTables): (WTF::ConcurrentPtrHashSet::clear): (WTF::ConcurrentPtrHashSet::initialize): (WTF::ConcurrentPtrHashSet::addSlow): (WTF::ConcurrentPtrHashSet::resizeIfNecessary): (WTF::ConcurrentPtrHashSet::resizeAndAdd): (WTF::ConcurrentPtrHashSet::Table::create): * wtf/ConcurrentPtrHashSet.h: Added. (WTF::ConcurrentPtrHashSet::contains): (WTF::ConcurrentPtrHashSet::add): (WTF::ConcurrentPtrHashSet::size const): (WTF::ConcurrentPtrHashSet::Table::maxLoad const): (WTF::ConcurrentPtrHashSet::hash): (WTF::ConcurrentPtrHashSet::cast): (WTF::ConcurrentPtrHashSet::containsImpl const): (WTF::ConcurrentPtrHashSet::addImpl): * wtf/Deque.h: (WTF::inlineCapacity>::takeFirst): * wtf/FastMalloc.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): * wtf/Locker.h: (WTF::holdLockIf): * wtf/ScopedLambda.h: * wtf/SharedTask.h: (WTF::SharedTask<PassedResultType): (WTF::SharedTask<ResultType): Deleted. * wtf/StackShot.h: Added. (WTF::StackShot::StackShot): (WTF::StackShot::operator=): (WTF::StackShot::array const): (WTF::StackShot::size const): (WTF::StackShot::operator bool const): (WTF::StackShot::operator== const): (WTF::StackShot::hash const): (WTF::StackShot::isHashTableDeletedValue const): (WTF::StackShot::operator> const): (WTF::StackShot::deletedValueArray): (WTF::StackShotHash::hash): (WTF::StackShotHash::equal): * wtf/StackShotProfiler.h: Added. (WTF::StackShotProfiler::StackShotProfiler): (WTF::StackShotProfiler::profile): (WTF::StackShotProfiler::run): Tools: * Scripts/run-jsc-benchmarks: Add splay-latency test, since this change needed to be carefully validated with that benchmark. * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/ConcurrentPtrHashSet.cpp: Added. This has unit tests of the new concurrent data structure. The tests focus on correctness under serial execution, which appears to be enough for now (it's so easy to catch a concurrency bug by just running the GC). (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/196360@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@225524 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2017-12-05 17:53:57 +00:00
if (profileLockContention)
STACK_SHOT_PROFILE(4, 2, 5);
// Heap allocations are forbidden on certain threads (e.g. audio rendering thread) for performance reasons so we need to
// explicitly allow the following allocation(s). In some rare cases, the lockSlow() algorithm may cause allocations.
DisableMallocRestrictionsForCurrentThreadScope disableMallocRestrictions;
The GC should be optionally concurrent and disabled by default https://bugs.webkit.org/show_bug.cgi?id=164454 Reviewed by Geoffrey Garen. Source/JavaScriptCore: This started out as a patch to have the GC scan the stack at the end, and then the outage happened and I decided to pick a more aggresive target: give the GC a concurrent mode that can be enabled at runtime, and whose only effect is that it turns on the ResumeTheWorldScope. This gives our GC a really intuitive workflow: by default, the GC thread is running solo with the world stopped and the parallel markers converged and waiting. We have a parallel work scope to enable the parallel markers and now we have a ResumeTheWorldScope that will optionally resume the world and then stop it again. It's easy to make a concurrent GC that always instantly crashes. I can't promise that this one won't do that when you run it. I set a specific goal: I wanted to do >10 concurrent GCs in debug mode with generations, optimizing JITs, and parallel marking disabled. To reach this milestone, I needed to do a bunch of stuff: - The mutator needs a separate mark stack for the barrier, since it will mutate this stack concurrently to the collector's slot visitors. - The use of CellState to indicate whether an object is being scanned the first time or a subsequent time was racy. It fails spectacularly when a barrier is fired at the same time as visitChildren is running or if the barrier runs at the same time as the GC marks the same object. So, I split SlotVisitor's mark stacks. It's now the case that you know why you're being scanned by looking at which stack you came off of. - All of root marking must be in the collector fixpoint. I renamed markRoots to markToFixpoint. They say concurrency is hard, but the collector looks more intuitive this way. We never gained anything from forcing people to make a choice between scanning something in the fixpoint versus outside of it. Because root scanning is cheap, we can afford to do it repeatedly, which means all root scanning can now do constraint-based marking (like: I'll mark you if that thing is marked). - JSObject::visitChildren's scanning of the butterfly raced with property additions, indexed storage transitions and resizing, and a bunch of miscellaneous dirty butterfly reshaping functions - like the one that flattens a dictionary and some sneaky ArrayStorage transformations. Many of these can be fixed by using store-store fences in the mutator and load-load fences in the collector. I've adopted the rule that the collector must always see either a butterfly and structure that match or a newer butterfly with an older structure, where their age is just one transition apart. This can be achieved with fences. For the cases where it breaks down, I added a lock to every JSCell. This is a full-fledged WTF lock that we sneak into two available bits in the indexingType. See the WTF ChangeLog for details. The mutator fencing rules are as follows: - Store-store fence before and after setting the butterfly. - Store-store fence before setting structure if you had changed the shape of the butterfly. - Store-store fence after initializing all fields in an allocation. - A dictionary Structure can change in strange ways while the GC is trying to scan it. So, JSObject::visitChildren will now grab the object's structure's lock if the object's structure is a dictionary. Dictionary structures are 1:1 with their object, so this does not reduce GC parallelism (super unlikely that the GC will simultaneously scan an object from two threads). - The GC can blow away a Structure's property table at any time. As a small consolation, it's now holding the Structure's lock when it does so. But there was tons of code in Structure that uses DeferGC to prevent the GC from blowing away the property table. This doesn't work with concurrent GC, since DeferGC only means that the GC won't run its safepoint (i.e. stop-the-world code) in the DeferGC region. It will still do marking and it was the Structure::visitChildren that would delete the table. It turns out that Structure's reliance on the property table not being deleted was the product of code rot. We already had functions that would materialize the table on demand. We were simply making the mistake of saying: structure->materializePropertyMap(); ... structure->propertyTable()->things Instead of saying: PropertyTable* table = structure->ensurePropertyTable(); ... table->things Switching the code to use the latter idiom allowed me to simplify the code a lot while fixing the race. - The LLInt's get_by_val handling was broken because the indexing shape constants were wrong. Once I started putting more things into the IndexingType, that started causing crashes for me. So I fixed LLInt. That turned out to be a lot of work, since that code had rotted in subtle ways. This is a speed-up in SunSpider, probably because of the LLInt fix. This is neutral on Octane and Kraken. It's a smaller slow-down on LongSpider, but I think we can ignore that (we don't view LongSpider as an official benchmark). By default, the concurrent GC is disabled: in all of the places where it would have resumed the world to run marking concurrently to the mutator, it will just skip the resume step. When you enable concurrent GC (--useConcurrentGC=true), it can sometimes run Octane/splay to completion. It seems to perform quite well: on my machine, it improves both splay-throughput and splay-latency. It's probably unstable for other programs. * API/JSVirtualMachine.mm: (-[JSVirtualMachine isOldExternalObject:]): * assembler/MacroAssemblerARMv7.h: (JSC::MacroAssemblerARMv7::storeFence): * bytecode/InlineAccess.cpp: (JSC::InlineAccess::dumpCacheSizesAndCrash): (JSC::InlineAccess::generateSelfPropertyAccess): (JSC::InlineAccess::generateArrayLength): * bytecode/ObjectAllocationProfile.h: (JSC::ObjectAllocationProfile::offsetOfInlineCapacity): (JSC::ObjectAllocationProfile::ObjectAllocationProfile): (JSC::ObjectAllocationProfile::initialize): (JSC::ObjectAllocationProfile::inlineCapacity): (JSC::ObjectAllocationProfile::clear): * bytecode/PolymorphicAccess.cpp: (JSC::AccessCase::generateWithGuard): (JSC::AccessCase::generateImpl): * dfg/DFGArrayifySlowPathGenerator.h: * dfg/DFGClobberize.h: (JSC::DFG::clobberize): * dfg/DFGOSRExitCompiler32_64.cpp: (JSC::DFG::OSRExitCompiler::compileExit): * dfg/DFGOSRExitCompiler64.cpp: (JSC::DFG::OSRExitCompiler::compileExit): * dfg/DFGOperations.cpp: * dfg/DFGPlan.cpp: (JSC::DFG::Plan::markCodeBlocks): (JSC::DFG::Plan::rememberCodeBlocks): * dfg/DFGPlan.h: * dfg/DFGSpeculativeJIT.cpp: (JSC::DFG::SpeculativeJIT::emitAllocateRawObject): (JSC::DFG::SpeculativeJIT::checkArray): (JSC::DFG::SpeculativeJIT::arrayify): (JSC::DFG::SpeculativeJIT::compileMakeRope): (JSC::DFG::SpeculativeJIT::compileNewFunctionCommon): (JSC::DFG::SpeculativeJIT::compileCreateActivation): (JSC::DFG::SpeculativeJIT::compileCreateDirectArguments): (JSC::DFG::SpeculativeJIT::compileSpread): (JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage): (JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage): (JSC::DFG::SpeculativeJIT::compileNewStringObject): (JSC::DFG::SpeculativeJIT::compileNewTypedArray): (JSC::DFG::SpeculativeJIT::compileStoreBarrier): * dfg/DFGSpeculativeJIT64.cpp: (JSC::DFG::SpeculativeJIT::compile): (JSC::DFG::SpeculativeJIT::compileAllocateNewArrayWithSize): * dfg/DFGTierUpCheckInjectionPhase.cpp: (JSC::DFG::TierUpCheckInjectionPhase::run): * dfg/DFGWorklist.cpp: (JSC::DFG::Worklist::markCodeBlocks): (JSC::DFG::Worklist::rememberCodeBlocks): (JSC::DFG::markCodeBlocks): (JSC::DFG::completeAllPlansForVM): (JSC::DFG::rememberCodeBlocks): * dfg/DFGWorklist.h: * ftl/FTLAbstractHeapRepository.cpp: (JSC::FTL::AbstractHeapRepository::AbstractHeapRepository): (JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions): * ftl/FTLAbstractHeapRepository.h: * ftl/FTLJITCode.cpp: (JSC::FTL::JITCode::~JITCode): * ftl/FTLLowerDFGToB3.cpp: (JSC::FTL::DFG::LowerDFGToB3::compilePutStructure): (JSC::FTL::DFG::LowerDFGToB3::compileCreateActivation): (JSC::FTL::DFG::LowerDFGToB3::compileNewFunction): (JSC::FTL::DFG::LowerDFGToB3::compileCreateDirectArguments): (JSC::FTL::DFG::LowerDFGToB3::compileCreateRest): (JSC::FTL::DFG::LowerDFGToB3::compileNewObject): (JSC::FTL::DFG::LowerDFGToB3::compileNewArray): (JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSpread): (JSC::FTL::DFG::LowerDFGToB3::compileSpread): (JSC::FTL::DFG::LowerDFGToB3::compileNewArrayBuffer): (JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSize): (JSC::FTL::DFG::LowerDFGToB3::compileNewTypedArray): (JSC::FTL::DFG::LowerDFGToB3::compileMakeRope): (JSC::FTL::DFG::LowerDFGToB3::compileMultiPutByOffset): (JSC::FTL::DFG::LowerDFGToB3::compileMaterializeNewObject): (JSC::FTL::DFG::LowerDFGToB3::compileMaterializeCreateActivation): (JSC::FTL::DFG::LowerDFGToB3::splatWords): (JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage): (JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage): (JSC::FTL::DFG::LowerDFGToB3::allocateObject): (JSC::FTL::DFG::LowerDFGToB3::isArrayType): (JSC::FTL::DFG::LowerDFGToB3::emitStoreBarrier): (JSC::FTL::DFG::LowerDFGToB3::mutatorFence): (JSC::FTL::DFG::LowerDFGToB3::setButterfly): * ftl/FTLOSRExitCompiler.cpp: (JSC::FTL::compileStub): * ftl/FTLOutput.cpp: (JSC::FTL::Output::signExt32ToPtr): (JSC::FTL::Output::fence): * ftl/FTLOutput.h: * heap/CellState.h: * heap/GCSegmentedArray.h: * heap/Heap.cpp: (JSC::Heap::ResumeTheWorldScope::ResumeTheWorldScope): (JSC::Heap::ResumeTheWorldScope::~ResumeTheWorldScope): (JSC::Heap::Heap): (JSC::Heap::~Heap): (JSC::Heap::harvestWeakReferences): (JSC::Heap::finalizeUnconditionalFinalizers): (JSC::Heap::completeAllJITPlans): (JSC::Heap::markToFixpoint): (JSC::Heap::gatherStackRoots): (JSC::Heap::beginMarking): (JSC::Heap::visitConservativeRoots): (JSC::Heap::visitCompilerWorklistWeakReferences): (JSC::Heap::updateObjectCounts): (JSC::Heap::endMarking): (JSC::Heap::addToRememberedSet): (JSC::Heap::collectInThread): (JSC::Heap::stopTheWorld): (JSC::Heap::resumeTheWorld): (JSC::Heap::setGCDidJIT): (JSC::Heap::setNeedFinalize): (JSC::Heap::setMutatorWaiting): (JSC::Heap::clearMutatorWaiting): (JSC::Heap::finalize): (JSC::Heap::flushWriteBarrierBuffer): (JSC::Heap::writeBarrierSlowPath): (JSC::Heap::canCollect): (JSC::Heap::reportExtraMemoryVisited): (JSC::Heap::reportExternalMemoryVisited): (JSC::Heap::notifyIsSafeToCollect): (JSC::Heap::markRoots): Deleted. (JSC::Heap::visitExternalRememberedSet): Deleted. (JSC::Heap::visitSmallStrings): Deleted. (JSC::Heap::visitProtectedObjects): Deleted. (JSC::Heap::visitArgumentBuffers): Deleted. (JSC::Heap::visitException): Deleted. (JSC::Heap::visitStrongHandles): Deleted. (JSC::Heap::visitHandleStack): Deleted. (JSC::Heap::visitSamplingProfiler): Deleted. (JSC::Heap::visitTypeProfiler): Deleted. (JSC::Heap::visitShadowChicken): Deleted. (JSC::Heap::traceCodeBlocksAndJITStubRoutines): Deleted. (JSC::Heap::visitWeakHandles): Deleted. (JSC::Heap::flushOldStructureIDTables): Deleted. (JSC::Heap::stopAllocation): Deleted. * heap/Heap.h: (JSC::Heap::collectorSlotVisitor): (JSC::Heap::mutatorMarkStack): (JSC::Heap::mutatorShouldBeFenced): (JSC::Heap::addressOfMutatorShouldBeFenced): (JSC::Heap::slotVisitor): Deleted. (JSC::Heap::notifyIsSafeToCollect): Deleted. (JSC::Heap::barrierShouldBeFenced): Deleted. (JSC::Heap::addressOfBarrierShouldBeFenced): Deleted. * heap/MarkStack.cpp: (JSC::MarkStackArray::transferTo): * heap/MarkStack.h: * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::tryAllocateIn): * heap/MarkedBlock.cpp: (JSC::MarkedBlock::MarkedBlock): (JSC::MarkedBlock::Handle::specializedSweep): (JSC::MarkedBlock::Handle::sweep): (JSC::MarkedBlock::Handle::sweepHelperSelectMarksMode): (JSC::MarkedBlock::Handle::stopAllocating): (JSC::MarkedBlock::Handle::resumeAllocating): (JSC::MarkedBlock::aboutToMarkSlow): (JSC::MarkedBlock::Handle::didConsumeFreeList): (JSC::SetNewlyAllocatedFunctor::SetNewlyAllocatedFunctor): Deleted. (JSC::SetNewlyAllocatedFunctor::operator()): Deleted. * heap/MarkedBlock.h: * heap/MarkedSpace.cpp: (JSC::MarkedSpace::resumeAllocating): * heap/SlotVisitor.cpp: (JSC::SlotVisitor::SlotVisitor): (JSC::SlotVisitor::~SlotVisitor): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::clearMarkStacks): (JSC::SlotVisitor::appendJSCellOrAuxiliary): (JSC::SlotVisitor::setMarkedAndAppendToMarkStack): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::appendToMutatorMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::donateKnownParallel): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::containsOpaqueRoot): (JSC::SlotVisitor::donateAndDrain): (JSC::SlotVisitor::mergeOpaqueRoots): (JSC::SlotVisitor::dump): (JSC::SlotVisitor::clearMarkStack): Deleted. (JSC::SlotVisitor::opaqueRootCount): Deleted. * heap/SlotVisitor.h: (JSC::SlotVisitor::collectorMarkStack): (JSC::SlotVisitor::mutatorMarkStack): (JSC::SlotVisitor::isEmpty): (JSC::SlotVisitor::bytesVisited): (JSC::SlotVisitor::markStack): Deleted. (JSC::SlotVisitor::bytesCopied): Deleted. * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::reportExtraMemoryVisited): (JSC::SlotVisitor::reportExternalMemoryVisited): * jit/AssemblyHelpers.cpp: (JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo): * jit/AssemblyHelpers.h: (JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo): (JSC::AssemblyHelpers::barrierStoreLoadFence): (JSC::AssemblyHelpers::mutatorFence): (JSC::AssemblyHelpers::storeButterfly): (JSC::AssemblyHelpers::jumpIfMutatorFenceNotNeeded): (JSC::AssemblyHelpers::emitInitializeInlineStorage): (JSC::AssemblyHelpers::emitInitializeOutOfLineStorage): (JSC::AssemblyHelpers::jumpIfBarrierStoreLoadFenceNotNeeded): Deleted. * jit/JITInlines.h: (JSC::JIT::emitArrayProfilingSiteWithCell): * jit/JITOperations.cpp: * jit/JITPropertyAccess.cpp: (JSC::JIT::emit_op_put_to_scope): (JSC::JIT::emit_op_put_to_arguments): * llint/LLIntData.cpp: (JSC::LLInt::Data::performAssertions): * llint/LowLevelInterpreter.asm: * llint/LowLevelInterpreter64.asm: * runtime/ButterflyInlines.h: (JSC::Butterfly::create): (JSC::Butterfly::createOrGrowPropertyStorage): * runtime/ConcurrentJITLock.h: (JSC::GCSafeConcurrentJITLocker::NoDefer::NoDefer): Deleted. * runtime/GenericArgumentsInlines.h: (JSC::GenericArguments<Type>::getOwnPropertySlotByIndex): (JSC::GenericArguments<Type>::putByIndex): * runtime/IndexingType.h: * runtime/JSArray.cpp: (JSC::JSArray::unshiftCountSlowCase): (JSC::JSArray::unshiftCountWithArrayStorage): * runtime/JSCell.h: (JSC::JSCell::InternalLocker::InternalLocker): (JSC::JSCell::InternalLocker::~InternalLocker): (JSC::JSCell::atomicCompareExchangeCellStateWeakRelaxed): (JSC::JSCell::atomicCompareExchangeCellStateStrong): (JSC::JSCell::indexingTypeAndMiscOffset): (JSC::JSCell::indexingTypeOffset): Deleted. * runtime/JSCellInlines.h: (JSC::JSCell::JSCell): (JSC::JSCell::finishCreation): (JSC::JSCell::indexingTypeAndMisc): (JSC::JSCell::indexingType): (JSC::JSCell::setStructure): (JSC::JSCell::callDestructor): (JSC::JSCell::lockInternalLock): (JSC::JSCell::unlockInternalLock): * runtime/JSObject.cpp: (JSC::JSObject::visitButterfly): (JSC::JSObject::visitChildren): (JSC::JSFinalObject::visitChildren): (JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists): (JSC::JSObject::createInitialUndecided): (JSC::JSObject::createInitialInt32): (JSC::JSObject::createInitialDouble): (JSC::JSObject::createInitialContiguous): (JSC::JSObject::createArrayStorage): (JSC::JSObject::convertUndecidedToArrayStorage): (JSC::JSObject::convertInt32ToArrayStorage): (JSC::JSObject::convertDoubleToArrayStorage): (JSC::JSObject::convertContiguousToArrayStorage): (JSC::JSObject::deleteProperty): (JSC::JSObject::defineOwnIndexedProperty): (JSC::JSObject::increaseVectorLength): (JSC::JSObject::ensureLengthSlow): (JSC::JSObject::reallocateAndShrinkButterfly): (JSC::JSObject::allocateMoreOutOfLineStorage): (JSC::JSObject::shiftButterflyAfterFlattening): (JSC::JSObject::growOutOfLineStorage): Deleted. * runtime/JSObject.h: (JSC::JSFinalObject::JSFinalObject): (JSC::JSObject::setButterfly): (JSC::JSObject::getOwnNonIndexPropertySlot): (JSC::JSObject::fillCustomGetterPropertySlot): (JSC::JSObject::getOwnPropertySlot): (JSC::JSObject::getPropertySlot): (JSC::JSObject::setStructureAndButterfly): Deleted. (JSC::JSObject::setButterflyWithoutChangingStructure): Deleted. (JSC::JSObject::putDirectInternal): Deleted. (JSC::JSObject::putDirectWithoutTransition): Deleted. * runtime/JSObjectInlines.h: (JSC::JSObject::getPropertySlot): (JSC::JSObject::getNonIndexPropertySlot): (JSC::JSObject::putDirectWithoutTransition): (JSC::JSObject::putDirectInternal): * runtime/Options.h: * runtime/SparseArrayValueMap.h: * runtime/Structure.cpp: (JSC::Structure::dumpStatistics): (JSC::Structure::findStructuresAndMapForMaterialization): (JSC::Structure::materializePropertyTable): (JSC::Structure::addNewPropertyTransition): (JSC::Structure::changePrototypeTransition): (JSC::Structure::attributeChangeTransition): (JSC::Structure::toDictionaryTransition): (JSC::Structure::takePropertyTableOrCloneIfPinned): (JSC::Structure::nonPropertyTransition): (JSC::Structure::isSealed): (JSC::Structure::isFrozen): (JSC::Structure::flattenDictionaryStructure): (JSC::Structure::pin): (JSC::Structure::pinForCaching): (JSC::Structure::willStoreValueSlow): (JSC::Structure::copyPropertyTableForPinning): (JSC::Structure::add): (JSC::Structure::remove): (JSC::Structure::getPropertyNamesFromStructure): (JSC::Structure::visitChildren): (JSC::Structure::materializePropertyMap): Deleted. (JSC::Structure::addPropertyWithoutTransition): Deleted. (JSC::Structure::removePropertyWithoutTransition): Deleted. (JSC::Structure::copyPropertyTable): Deleted. (JSC::Structure::createPropertyMap): Deleted. (JSC::PropertyTable::checkConsistency): Deleted. (JSC::Structure::checkConsistency): Deleted. * runtime/Structure.h: * runtime/StructureIDBlob.h: (JSC::StructureIDBlob::StructureIDBlob): (JSC::StructureIDBlob::indexingTypeIncludingHistory): (JSC::StructureIDBlob::setIndexingTypeIncludingHistory): (JSC::StructureIDBlob::indexingTypeIncludingHistoryOffset): (JSC::StructureIDBlob::indexingType): Deleted. (JSC::StructureIDBlob::setIndexingType): Deleted. (JSC::StructureIDBlob::indexingTypeOffset): Deleted. * runtime/StructureInlines.h: (JSC::Structure::get): (JSC::Structure::checkOffsetConsistency): (JSC::Structure::checkConsistency): (JSC::Structure::add): (JSC::Structure::remove): (JSC::Structure::addPropertyWithoutTransition): (JSC::Structure::removePropertyWithoutTransition): (JSC::Structure::setPropertyTable): (JSC::Structure::putWillGrowOutOfLineStorage): Deleted. (JSC::Structure::propertyTable): Deleted. (JSC::Structure::suggestedNewOutOfLineStorageCapacity): Deleted. Source/WTF: The reason why I went to such great pains to make WTF::Lock fit in two bits is that I knew that I would eventually need to stuff one into some miscellaneous bits of the JSCell header. That time has come, because the concurrent GC has numerous race conditions in visitChildren that can be trivially fixed if each object just has an internal lock. Some cell types might use it to simply protect their entire visitChildren function and anything that mutates the fields it touches, while other cell types might use it as a "lock of last resort" to handle corner cases of an otherwise wait-free or lock-free algorithm. Right now, it's used to protect certain transformations involving indexing storage. To make this happen, I factored the WTF::Lock algorithm into a LockAlgorithm struct that is templatized on lock type (uint8_t for WTF::Lock), the isHeldBit value (1 for WTF::Lock), and the hasParkedBit value (2 for WTF::Lock). This could have been done as a templatized Lock class that basically contains Atomic<LockType>. You could then make any field into a lock by bitwise_casting it to TemplateLock<field type, bit1, bit2>. But this felt too dirty, so instead, LockAlgorithm has static methods that take Atomic<LockType>& as their first argument. I think that this makes it more natural to project a LockAlgorithm onto an existing Atomic<> field. Sadly, some places have to cast their non-Atomic<> field to Atomic<> in order for this to work. Like so many other things we do, this just shows that the C++ style of labeling fields that are subject to atomic ops as atomic is counterproductive. Maybe some day I'll change LockAlgorithm to use our other Atomics API, which does not require Atomic<>. WTF::Lock now uses LockAlgorithm. The slow paths are still outlined. I don't feel too bad about the LockAlgorithm.h header being included in so many places because we change that algorithm so infrequently. Also, I added a hasElapsed(time) function. This function makes it so much more natural to write timeslicing code, which the concurrent GC has to do a lot of. * WTF.xcodeproj/project.pbxproj: * wtf/CMakeLists.txt: * wtf/ListDump.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): (WTF::LockBase::unlockFairlySlow): (WTF::LockBase::unlockSlowImpl): Deleted. * wtf/Lock.h: (WTF::LockBase::lock): (WTF::LockBase::tryLock): (WTF::LockBase::unlock): (WTF::LockBase::unlockFairly): (WTF::LockBase::isHeld): (): Deleted. * wtf/LockAlgorithm.h: Added. (WTF::LockAlgorithm::lockFastAssumingZero): (WTF::LockAlgorithm::lockFast): (WTF::LockAlgorithm::lock): (WTF::LockAlgorithm::tryLock): (WTF::LockAlgorithm::unlockFastAssumingZero): (WTF::LockAlgorithm::unlockFast): (WTF::LockAlgorithm::unlock): (WTF::LockAlgorithm::unlockFairly): (WTF::LockAlgorithm::isLocked): (WTF::LockAlgorithm::lockSlow): (WTF::LockAlgorithm::unlockSlow): * wtf/TimeWithDynamicClockType.cpp: (WTF::hasElapsed): * wtf/TimeWithDynamicClockType.h: Canonical link: https://commits.webkit.org/182434@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@208720 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-11-15 01:49:22 +00:00
DefaultLockAlgorithm::lockSlow(m_byte);
Lightweight locks should be adaptive https://bugs.webkit.org/show_bug.cgi?id=147545 Reviewed by Geoffrey Garen. Source/JavaScriptCore: * dfg/DFGCommon.cpp: (JSC::DFG::startCrashing): * heap/CopiedBlock.h: (JSC::CopiedBlock::workListLock): * heap/CopiedBlockInlines.h: (JSC::CopiedBlock::shouldReportLiveBytes): (JSC::CopiedBlock::reportLiveBytes): * heap/CopiedSpace.cpp: (JSC::CopiedSpace::doneFillingBlock): * heap/CopiedSpace.h: (JSC::CopiedSpace::CopiedGeneration::CopiedGeneration): * heap/CopiedSpaceInlines.h: (JSC::CopiedSpace::recycleEvacuatedBlock): * heap/GCThreadSharedData.cpp: (JSC::GCThreadSharedData::didStartCopying): * heap/GCThreadSharedData.h: (JSC::GCThreadSharedData::getNextBlocksToCopy): * heap/ListableHandler.h: (JSC::ListableHandler::List::addThreadSafe): (JSC::ListableHandler::List::addNotThreadSafe): * heap/MachineStackMarker.cpp: (JSC::MachineThreads::tryCopyOtherThreadStacks): * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::copyLater): * parser/SourceProvider.cpp: (JSC::SourceProvider::~SourceProvider): (JSC::SourceProvider::getID): * profiler/ProfilerDatabase.cpp: (JSC::Profiler::Database::addDatabaseToAtExit): (JSC::Profiler::Database::removeDatabaseFromAtExit): (JSC::Profiler::Database::removeFirstAtExitDatabase): * runtime/TypeProfilerLog.h: Source/WebCore: * bindings/objc/WebScriptObject.mm: (WebCore::getJSWrapper): (WebCore::addJSWrapper): (WebCore::removeJSWrapper): (WebCore::removeJSWrapperIfRetainCountOne): * platform/audio/mac/CARingBuffer.cpp: (WebCore::CARingBuffer::setCurrentFrameBounds): (WebCore::CARingBuffer::getCurrentFrameBounds): * platform/audio/mac/CARingBuffer.h: * platform/ios/wak/WAKWindow.mm: (-[WAKWindow setExposedScrollViewRect:]): (-[WAKWindow exposedScrollViewRect]): Source/WebKit2: * WebProcess/WebPage/EventDispatcher.cpp: (WebKit::EventDispatcher::clearQueuedTouchEventsForPage): (WebKit::EventDispatcher::getQueuedTouchEventsForPage): (WebKit::EventDispatcher::touchEvent): (WebKit::EventDispatcher::dispatchTouchEvents): * WebProcess/WebPage/EventDispatcher.h: * WebProcess/WebPage/ViewUpdateDispatcher.cpp: (WebKit::ViewUpdateDispatcher::visibleContentRectUpdate): (WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate): * WebProcess/WebPage/ViewUpdateDispatcher.h: Source/WTF: A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition overhead is lower than system locks and because they take dramatically less space than system locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock acquire is up to 10x faster and under microcontention - short critical section with two or more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4 bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit should continue to avoid system locks - they are just far too slow and far too big. But there is a problem with this idiom. System lock implementations will sleep a thread when it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU. In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for microcontention, but awful when the lock will not be released for a while. In fact, when critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is almost 100x more than the CPU time cost of a system lock. This case doesn't arise too frequently in our current uses of spinlocks, but that's probably because right now there are places where we make a conscious decision to use system locks - even though they use more memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a while to acquire the lock. The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held, the slow path tries some number of spins to acquire the lock, and if that fails, the thread is put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and the lock itself is a tagged pointer: either it is just bits telling us the complete lock state (not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path if the lock is not contended, a short burst of spinning for microcontention, and a full-blown queue for critical sections that are held for a long time. On a locking microbenchmark, this new Lock exhibits the following performance characteristics: - Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster than a system mutex. - Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster than a system mutex. - CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a SpinLock. - Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than a SpinLock. This patch replaces all uses of SpinLock with Lock, since our critical sections are not no-ops so if you do basically anything in your critical section, the Lock overhead will be invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time. This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits of having a lock that just uses a byte are still better than the CPU wastage benefits of Lock. But, this work will enable some future work to create locks that will fit in just 1.6 bits: https://bugs.webkit.org/show_bug.cgi?id=147665. Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the possibility of a null queue head. [1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf * WTF.vcxproj/WTF.vcxproj: * WTF.xcodeproj/project.pbxproj: * benchmarks: Added. * benchmarks/LockSpeedTest.cpp: Added. (main): * wtf/Atomics.h: (WTF::Atomic::compareExchangeWeak): (WTF::Atomic::compareExchangeStrong): * wtf/CMakeLists.txt: * wtf/Lock.cpp: Added. (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): * wtf/Lock.h: Added. (WTF::LockBase::lock): (WTF::LockBase::unlock): (WTF::LockBase::isHeld): (WTF::LockBase::isLocked): (WTF::Lock::Lock): * wtf/MetaAllocator.cpp: (WTF::MetaAllocator::release): (WTF::MetaAllocatorHandle::shrink): (WTF::MetaAllocator::allocate): (WTF::MetaAllocator::currentStatistics): (WTF::MetaAllocator::addFreshFreeSpace): (WTF::MetaAllocator::debugFreeSpaceSize): * wtf/MetaAllocator.h: * wtf/SpinLock.h: * wtf/ThreadingPthreads.cpp: * wtf/ThreadingWin.cpp: * wtf/text/AtomicString.cpp: * wtf/text/AtomicStringImpl.cpp: (WTF::AtomicStringTableLocker::AtomicStringTableLocker): Tools: * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/Lock.cpp: Added. (TestWebKitAPI::runLockTest): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/165908@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
}
void Lock::unlockSlow()
WTF::Lock should be fair eventually https://bugs.webkit.org/show_bug.cgi?id=159384 Reviewed by Geoffrey Garen. Source/WTF: In https://webkit.org/blog/6161/locking-in-webkit/ we showed how relaxing the fairness of locks makes them fast. That post presented lock fairness as a trade-off between two extremes: - Barging. A barging lock, like WTF::Lock, releases the lock in unlock() even if there was a thread on the queue. If there was a thread on the queue, the lock is released and that thread is made runnable. That thread may then grab the lock, or some other thread may grab the lock first (it may barge). Usually, the barging thread is the thread that released the lock in the first place. This maximizes throughput but hurts fairness. There is no good theoretical bound on how unfair the lock may become, but empirical data suggests that it's fair enough for the cases we previously measured. - FIFO. A FIFO lock, like HandoffLock in ToyLocks.h, does not release the lock in unlock() if there is a thread waiting. If there is a thread waiting, unlock() will make that thread runnable and inform it that it now holds the lock. This ensures perfect round-robin fairness and allows us to reason theoretically about how long it may take for a thread to grab the lock. For example, if we know that only N threads are running and each one may contend on a critical section, and each one may hold the lock for at most S seconds, then the time it takes to grab the lock is N * S. Unfortunately, FIFO locks perform very badly in most cases. This is because for the common case of short critical sections, they force a context switch after each critical section if the lock is contended. This change makes WTF::Lock almost as fair as FIFO while still being as fast as barging. Thanks to this new algorithm, you can now have both of these things at the same time. This change makes WTF::Lock eventually fair. We can almost (more on the caveats below) guarantee that the time it takes to grab a lock is N * max(1ms, S). In other words, critical sections that are longer than 1ms are always fair. For shorter critical sections, the amount of time that any thread waits is 1ms times the number of threads. There are some caveats that arise from our use of randomness, but even then, in the limit as the critical section length goes to infinity, the lock becomes fair. The corner cases are unlikely to happen; our experiments show that the lock becomes exactly as fair as a FIFO lock for any critical section that is 1ms or longer. The fairness mechanism is broken into two parts. WTF::Lock can now choose to unlock a lock fairly or unfairly thanks to the new ParkingLot token mechanism. WTF::Lock knows when to use fair unlocking based on a timeout mechanism in ParkingLot called timeToBeFair. ParkingLot::unparkOne() and ParkingLot::parkConditionally() can now communicate with each other via a token. unparkOne() can pass a token, which parkConditionally() will return. This change also makes parkConditionally() a lot more precise about when it was unparked due to a call to unparkOne(). If unparkOne() is told that a thread was unparked then this thread is guaranteed to report that it was unparked rather than timing out, and that thread is guaranteed to get the token that unparkOne() passed. The token is an intptr_t. We use it as a boolean variable in WTF::Lock, but you could use it to pass arbitrary data structures. By default, the token is zero. WTF::Lock's unlock() will pass 1 as the token if it is doing fair unlocking. In that case, unlock() will not release the lock, and lock() will know that it holds the lock as soon as parkConditionally() returns. Note that this algorithm relies on unparkOne() invoking WTF::Lock's callback while the queue lock is held, so that WTF::Lock can make a decision about unlock strategy and inject a token while it has complete knowledge over the state of the queue. As such, it's not immediately obvious how to implement this algorithm on top of futexes. You really need ParkingLot! WTF::Lock does not use fair unlocking every time. We expose a new API, Lock::unlockFairly(), which forces the fair unlocking behavior. Additionally, ParkingLot now maintains a per-bucket stochastic fairness timeout. When the timeout fires, the unparkOne() callback sees UnparkResult::timeToBeFair = true. This timeout is set to be anywhere from 0ms to 1ms at random. When a dequeue happens and there are threads that actually get dequeued, we check if the time since the last unfair unlock (the last time timeToBeFair was set to true) is more than the timeout amount. If so, then we set timeToBeFair to true and reset the timeout. This means that in the absence of ParkingLot collisions, unfair unlocking is guaranteed to happen at least once per millisecond. It will happen at 2 KHz on average. If there are collisions, then each collision adds one millisecond to the worst case (and 0.5 ms to the average case). The reason why we don't just use a fixed 1ms timeout is that we want to avoid resonance. Imagine a program in which some thread acquires a lock at 1 KHz in-phase with the timeToBeFair timeout. Then this thread would be the benefactor of fairness to the detriment of everyone else. Randomness ensures that we aren't too fair to any one thread. Empirically, this is neutral on our major benchmarks like JetStream but it's an enormous improvement in LockFairnessTest. It's common for an unfair lock (either our BargingLock, the old WTF::Lock, any of the other futex-based locks that barge, or new os_unfair_lock) to allow only one thread to hold the lock during a whole second in which each thread is holding the lock for 1ms at a time. This is because in a barging lock, releasing a lock after holding it for 1ms and then reacquiring it immediately virtually ensures that none of the other threads can wake up in time to grab it before it's relocked. But the new WTF::Lock handles this case like a champ: each thread gets equal turns. Here's some data. If we launch 10 threads and have each of them run for 1 second while repeatedly holding a critical section for 1ms, then here's how many times each thread gets to hold the lock using the old WTF::Lock algorithm: 799, 6, 1, 1, 1, 1, 1, 1, 1, 1 One thread hogged the lock for almost the whole time! With the new WTF::Lock, the lock becomes totally fair: 80, 79, 79, 79, 79, 79, 79, 80, 80, 79 I don't know of anyone creating such an automatically-fair adaptive lock before, so I think that this is a pretty awesome advancement to the state of the art! This change is good for three reasons: - We do have long critical sections in WebKit and we don't want to have to worry about starvation. This reduces the likelihood that we will see starvation due to our lock strategy. - I was talking to ggaren about bmalloc's locking needs, and he wanted unlockFairly() or lockFairly() or some moral equivalent for the scavenger thread. - If we use a WTF::Lock to manage heap access in a multithreaded GC, we'll need the ability to unlock and relock without barging. * benchmarks/LockFairnessTest.cpp: (main): * benchmarks/ToyLocks.h: * wtf/Condition.h: (WTF::ConditionBase::waitUntil): (WTF::ConditionBase::notifyOne): * wtf/Lock.cpp: (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): (WTF::LockBase::unlockFairlySlow): (WTF::LockBase::unlockSlowImpl): * wtf/Lock.h: (WTF::LockBase::try_lock): (WTF::LockBase::unlock): (WTF::LockBase::unlockFairly): (WTF::LockBase::isHeld): (WTF::LockBase::isFullyReset): * wtf/ParkingLot.cpp: (WTF::ParkingLot::parkConditionallyImpl): (WTF::ParkingLot::unparkOne): (WTF::ParkingLot::unparkOneImpl): (WTF::ParkingLot::unparkAll): * wtf/ParkingLot.h: (WTF::ParkingLot::parkConditionally): (WTF::ParkingLot::compareAndPark): (WTF::ParkingLot::unparkOne): Tools: * TestWebKitAPI/Tests/WTF/ParkingLot.cpp: Canonical link: https://commits.webkit.org/178039@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@203350 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-07-18 18:32:52 +00:00
{
// Heap allocations are forbidden on certain threads (e.g. audio rendering thread) for performance reasons so we need to
// explicitly allow the following allocation(s). In some rare cases, the unlockSlow() algorithm may cause allocations.
DisableMallocRestrictionsForCurrentThreadScope disableMallocRestrictions;
The GC should be optionally concurrent and disabled by default https://bugs.webkit.org/show_bug.cgi?id=164454 Reviewed by Geoffrey Garen. Source/JavaScriptCore: This started out as a patch to have the GC scan the stack at the end, and then the outage happened and I decided to pick a more aggresive target: give the GC a concurrent mode that can be enabled at runtime, and whose only effect is that it turns on the ResumeTheWorldScope. This gives our GC a really intuitive workflow: by default, the GC thread is running solo with the world stopped and the parallel markers converged and waiting. We have a parallel work scope to enable the parallel markers and now we have a ResumeTheWorldScope that will optionally resume the world and then stop it again. It's easy to make a concurrent GC that always instantly crashes. I can't promise that this one won't do that when you run it. I set a specific goal: I wanted to do >10 concurrent GCs in debug mode with generations, optimizing JITs, and parallel marking disabled. To reach this milestone, I needed to do a bunch of stuff: - The mutator needs a separate mark stack for the barrier, since it will mutate this stack concurrently to the collector's slot visitors. - The use of CellState to indicate whether an object is being scanned the first time or a subsequent time was racy. It fails spectacularly when a barrier is fired at the same time as visitChildren is running or if the barrier runs at the same time as the GC marks the same object. So, I split SlotVisitor's mark stacks. It's now the case that you know why you're being scanned by looking at which stack you came off of. - All of root marking must be in the collector fixpoint. I renamed markRoots to markToFixpoint. They say concurrency is hard, but the collector looks more intuitive this way. We never gained anything from forcing people to make a choice between scanning something in the fixpoint versus outside of it. Because root scanning is cheap, we can afford to do it repeatedly, which means all root scanning can now do constraint-based marking (like: I'll mark you if that thing is marked). - JSObject::visitChildren's scanning of the butterfly raced with property additions, indexed storage transitions and resizing, and a bunch of miscellaneous dirty butterfly reshaping functions - like the one that flattens a dictionary and some sneaky ArrayStorage transformations. Many of these can be fixed by using store-store fences in the mutator and load-load fences in the collector. I've adopted the rule that the collector must always see either a butterfly and structure that match or a newer butterfly with an older structure, where their age is just one transition apart. This can be achieved with fences. For the cases where it breaks down, I added a lock to every JSCell. This is a full-fledged WTF lock that we sneak into two available bits in the indexingType. See the WTF ChangeLog for details. The mutator fencing rules are as follows: - Store-store fence before and after setting the butterfly. - Store-store fence before setting structure if you had changed the shape of the butterfly. - Store-store fence after initializing all fields in an allocation. - A dictionary Structure can change in strange ways while the GC is trying to scan it. So, JSObject::visitChildren will now grab the object's structure's lock if the object's structure is a dictionary. Dictionary structures are 1:1 with their object, so this does not reduce GC parallelism (super unlikely that the GC will simultaneously scan an object from two threads). - The GC can blow away a Structure's property table at any time. As a small consolation, it's now holding the Structure's lock when it does so. But there was tons of code in Structure that uses DeferGC to prevent the GC from blowing away the property table. This doesn't work with concurrent GC, since DeferGC only means that the GC won't run its safepoint (i.e. stop-the-world code) in the DeferGC region. It will still do marking and it was the Structure::visitChildren that would delete the table. It turns out that Structure's reliance on the property table not being deleted was the product of code rot. We already had functions that would materialize the table on demand. We were simply making the mistake of saying: structure->materializePropertyMap(); ... structure->propertyTable()->things Instead of saying: PropertyTable* table = structure->ensurePropertyTable(); ... table->things Switching the code to use the latter idiom allowed me to simplify the code a lot while fixing the race. - The LLInt's get_by_val handling was broken because the indexing shape constants were wrong. Once I started putting more things into the IndexingType, that started causing crashes for me. So I fixed LLInt. That turned out to be a lot of work, since that code had rotted in subtle ways. This is a speed-up in SunSpider, probably because of the LLInt fix. This is neutral on Octane and Kraken. It's a smaller slow-down on LongSpider, but I think we can ignore that (we don't view LongSpider as an official benchmark). By default, the concurrent GC is disabled: in all of the places where it would have resumed the world to run marking concurrently to the mutator, it will just skip the resume step. When you enable concurrent GC (--useConcurrentGC=true), it can sometimes run Octane/splay to completion. It seems to perform quite well: on my machine, it improves both splay-throughput and splay-latency. It's probably unstable for other programs. * API/JSVirtualMachine.mm: (-[JSVirtualMachine isOldExternalObject:]): * assembler/MacroAssemblerARMv7.h: (JSC::MacroAssemblerARMv7::storeFence): * bytecode/InlineAccess.cpp: (JSC::InlineAccess::dumpCacheSizesAndCrash): (JSC::InlineAccess::generateSelfPropertyAccess): (JSC::InlineAccess::generateArrayLength): * bytecode/ObjectAllocationProfile.h: (JSC::ObjectAllocationProfile::offsetOfInlineCapacity): (JSC::ObjectAllocationProfile::ObjectAllocationProfile): (JSC::ObjectAllocationProfile::initialize): (JSC::ObjectAllocationProfile::inlineCapacity): (JSC::ObjectAllocationProfile::clear): * bytecode/PolymorphicAccess.cpp: (JSC::AccessCase::generateWithGuard): (JSC::AccessCase::generateImpl): * dfg/DFGArrayifySlowPathGenerator.h: * dfg/DFGClobberize.h: (JSC::DFG::clobberize): * dfg/DFGOSRExitCompiler32_64.cpp: (JSC::DFG::OSRExitCompiler::compileExit): * dfg/DFGOSRExitCompiler64.cpp: (JSC::DFG::OSRExitCompiler::compileExit): * dfg/DFGOperations.cpp: * dfg/DFGPlan.cpp: (JSC::DFG::Plan::markCodeBlocks): (JSC::DFG::Plan::rememberCodeBlocks): * dfg/DFGPlan.h: * dfg/DFGSpeculativeJIT.cpp: (JSC::DFG::SpeculativeJIT::emitAllocateRawObject): (JSC::DFG::SpeculativeJIT::checkArray): (JSC::DFG::SpeculativeJIT::arrayify): (JSC::DFG::SpeculativeJIT::compileMakeRope): (JSC::DFG::SpeculativeJIT::compileNewFunctionCommon): (JSC::DFG::SpeculativeJIT::compileCreateActivation): (JSC::DFG::SpeculativeJIT::compileCreateDirectArguments): (JSC::DFG::SpeculativeJIT::compileSpread): (JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage): (JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage): (JSC::DFG::SpeculativeJIT::compileNewStringObject): (JSC::DFG::SpeculativeJIT::compileNewTypedArray): (JSC::DFG::SpeculativeJIT::compileStoreBarrier): * dfg/DFGSpeculativeJIT64.cpp: (JSC::DFG::SpeculativeJIT::compile): (JSC::DFG::SpeculativeJIT::compileAllocateNewArrayWithSize): * dfg/DFGTierUpCheckInjectionPhase.cpp: (JSC::DFG::TierUpCheckInjectionPhase::run): * dfg/DFGWorklist.cpp: (JSC::DFG::Worklist::markCodeBlocks): (JSC::DFG::Worklist::rememberCodeBlocks): (JSC::DFG::markCodeBlocks): (JSC::DFG::completeAllPlansForVM): (JSC::DFG::rememberCodeBlocks): * dfg/DFGWorklist.h: * ftl/FTLAbstractHeapRepository.cpp: (JSC::FTL::AbstractHeapRepository::AbstractHeapRepository): (JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions): * ftl/FTLAbstractHeapRepository.h: * ftl/FTLJITCode.cpp: (JSC::FTL::JITCode::~JITCode): * ftl/FTLLowerDFGToB3.cpp: (JSC::FTL::DFG::LowerDFGToB3::compilePutStructure): (JSC::FTL::DFG::LowerDFGToB3::compileCreateActivation): (JSC::FTL::DFG::LowerDFGToB3::compileNewFunction): (JSC::FTL::DFG::LowerDFGToB3::compileCreateDirectArguments): (JSC::FTL::DFG::LowerDFGToB3::compileCreateRest): (JSC::FTL::DFG::LowerDFGToB3::compileNewObject): (JSC::FTL::DFG::LowerDFGToB3::compileNewArray): (JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSpread): (JSC::FTL::DFG::LowerDFGToB3::compileSpread): (JSC::FTL::DFG::LowerDFGToB3::compileNewArrayBuffer): (JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSize): (JSC::FTL::DFG::LowerDFGToB3::compileNewTypedArray): (JSC::FTL::DFG::LowerDFGToB3::compileMakeRope): (JSC::FTL::DFG::LowerDFGToB3::compileMultiPutByOffset): (JSC::FTL::DFG::LowerDFGToB3::compileMaterializeNewObject): (JSC::FTL::DFG::LowerDFGToB3::compileMaterializeCreateActivation): (JSC::FTL::DFG::LowerDFGToB3::splatWords): (JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage): (JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage): (JSC::FTL::DFG::LowerDFGToB3::allocateObject): (JSC::FTL::DFG::LowerDFGToB3::isArrayType): (JSC::FTL::DFG::LowerDFGToB3::emitStoreBarrier): (JSC::FTL::DFG::LowerDFGToB3::mutatorFence): (JSC::FTL::DFG::LowerDFGToB3::setButterfly): * ftl/FTLOSRExitCompiler.cpp: (JSC::FTL::compileStub): * ftl/FTLOutput.cpp: (JSC::FTL::Output::signExt32ToPtr): (JSC::FTL::Output::fence): * ftl/FTLOutput.h: * heap/CellState.h: * heap/GCSegmentedArray.h: * heap/Heap.cpp: (JSC::Heap::ResumeTheWorldScope::ResumeTheWorldScope): (JSC::Heap::ResumeTheWorldScope::~ResumeTheWorldScope): (JSC::Heap::Heap): (JSC::Heap::~Heap): (JSC::Heap::harvestWeakReferences): (JSC::Heap::finalizeUnconditionalFinalizers): (JSC::Heap::completeAllJITPlans): (JSC::Heap::markToFixpoint): (JSC::Heap::gatherStackRoots): (JSC::Heap::beginMarking): (JSC::Heap::visitConservativeRoots): (JSC::Heap::visitCompilerWorklistWeakReferences): (JSC::Heap::updateObjectCounts): (JSC::Heap::endMarking): (JSC::Heap::addToRememberedSet): (JSC::Heap::collectInThread): (JSC::Heap::stopTheWorld): (JSC::Heap::resumeTheWorld): (JSC::Heap::setGCDidJIT): (JSC::Heap::setNeedFinalize): (JSC::Heap::setMutatorWaiting): (JSC::Heap::clearMutatorWaiting): (JSC::Heap::finalize): (JSC::Heap::flushWriteBarrierBuffer): (JSC::Heap::writeBarrierSlowPath): (JSC::Heap::canCollect): (JSC::Heap::reportExtraMemoryVisited): (JSC::Heap::reportExternalMemoryVisited): (JSC::Heap::notifyIsSafeToCollect): (JSC::Heap::markRoots): Deleted. (JSC::Heap::visitExternalRememberedSet): Deleted. (JSC::Heap::visitSmallStrings): Deleted. (JSC::Heap::visitProtectedObjects): Deleted. (JSC::Heap::visitArgumentBuffers): Deleted. (JSC::Heap::visitException): Deleted. (JSC::Heap::visitStrongHandles): Deleted. (JSC::Heap::visitHandleStack): Deleted. (JSC::Heap::visitSamplingProfiler): Deleted. (JSC::Heap::visitTypeProfiler): Deleted. (JSC::Heap::visitShadowChicken): Deleted. (JSC::Heap::traceCodeBlocksAndJITStubRoutines): Deleted. (JSC::Heap::visitWeakHandles): Deleted. (JSC::Heap::flushOldStructureIDTables): Deleted. (JSC::Heap::stopAllocation): Deleted. * heap/Heap.h: (JSC::Heap::collectorSlotVisitor): (JSC::Heap::mutatorMarkStack): (JSC::Heap::mutatorShouldBeFenced): (JSC::Heap::addressOfMutatorShouldBeFenced): (JSC::Heap::slotVisitor): Deleted. (JSC::Heap::notifyIsSafeToCollect): Deleted. (JSC::Heap::barrierShouldBeFenced): Deleted. (JSC::Heap::addressOfBarrierShouldBeFenced): Deleted. * heap/MarkStack.cpp: (JSC::MarkStackArray::transferTo): * heap/MarkStack.h: * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::tryAllocateIn): * heap/MarkedBlock.cpp: (JSC::MarkedBlock::MarkedBlock): (JSC::MarkedBlock::Handle::specializedSweep): (JSC::MarkedBlock::Handle::sweep): (JSC::MarkedBlock::Handle::sweepHelperSelectMarksMode): (JSC::MarkedBlock::Handle::stopAllocating): (JSC::MarkedBlock::Handle::resumeAllocating): (JSC::MarkedBlock::aboutToMarkSlow): (JSC::MarkedBlock::Handle::didConsumeFreeList): (JSC::SetNewlyAllocatedFunctor::SetNewlyAllocatedFunctor): Deleted. (JSC::SetNewlyAllocatedFunctor::operator()): Deleted. * heap/MarkedBlock.h: * heap/MarkedSpace.cpp: (JSC::MarkedSpace::resumeAllocating): * heap/SlotVisitor.cpp: (JSC::SlotVisitor::SlotVisitor): (JSC::SlotVisitor::~SlotVisitor): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::clearMarkStacks): (JSC::SlotVisitor::appendJSCellOrAuxiliary): (JSC::SlotVisitor::setMarkedAndAppendToMarkStack): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::appendToMutatorMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::donateKnownParallel): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::containsOpaqueRoot): (JSC::SlotVisitor::donateAndDrain): (JSC::SlotVisitor::mergeOpaqueRoots): (JSC::SlotVisitor::dump): (JSC::SlotVisitor::clearMarkStack): Deleted. (JSC::SlotVisitor::opaqueRootCount): Deleted. * heap/SlotVisitor.h: (JSC::SlotVisitor::collectorMarkStack): (JSC::SlotVisitor::mutatorMarkStack): (JSC::SlotVisitor::isEmpty): (JSC::SlotVisitor::bytesVisited): (JSC::SlotVisitor::markStack): Deleted. (JSC::SlotVisitor::bytesCopied): Deleted. * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::reportExtraMemoryVisited): (JSC::SlotVisitor::reportExternalMemoryVisited): * jit/AssemblyHelpers.cpp: (JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo): * jit/AssemblyHelpers.h: (JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo): (JSC::AssemblyHelpers::barrierStoreLoadFence): (JSC::AssemblyHelpers::mutatorFence): (JSC::AssemblyHelpers::storeButterfly): (JSC::AssemblyHelpers::jumpIfMutatorFenceNotNeeded): (JSC::AssemblyHelpers::emitInitializeInlineStorage): (JSC::AssemblyHelpers::emitInitializeOutOfLineStorage): (JSC::AssemblyHelpers::jumpIfBarrierStoreLoadFenceNotNeeded): Deleted. * jit/JITInlines.h: (JSC::JIT::emitArrayProfilingSiteWithCell): * jit/JITOperations.cpp: * jit/JITPropertyAccess.cpp: (JSC::JIT::emit_op_put_to_scope): (JSC::JIT::emit_op_put_to_arguments): * llint/LLIntData.cpp: (JSC::LLInt::Data::performAssertions): * llint/LowLevelInterpreter.asm: * llint/LowLevelInterpreter64.asm: * runtime/ButterflyInlines.h: (JSC::Butterfly::create): (JSC::Butterfly::createOrGrowPropertyStorage): * runtime/ConcurrentJITLock.h: (JSC::GCSafeConcurrentJITLocker::NoDefer::NoDefer): Deleted. * runtime/GenericArgumentsInlines.h: (JSC::GenericArguments<Type>::getOwnPropertySlotByIndex): (JSC::GenericArguments<Type>::putByIndex): * runtime/IndexingType.h: * runtime/JSArray.cpp: (JSC::JSArray::unshiftCountSlowCase): (JSC::JSArray::unshiftCountWithArrayStorage): * runtime/JSCell.h: (JSC::JSCell::InternalLocker::InternalLocker): (JSC::JSCell::InternalLocker::~InternalLocker): (JSC::JSCell::atomicCompareExchangeCellStateWeakRelaxed): (JSC::JSCell::atomicCompareExchangeCellStateStrong): (JSC::JSCell::indexingTypeAndMiscOffset): (JSC::JSCell::indexingTypeOffset): Deleted. * runtime/JSCellInlines.h: (JSC::JSCell::JSCell): (JSC::JSCell::finishCreation): (JSC::JSCell::indexingTypeAndMisc): (JSC::JSCell::indexingType): (JSC::JSCell::setStructure): (JSC::JSCell::callDestructor): (JSC::JSCell::lockInternalLock): (JSC::JSCell::unlockInternalLock): * runtime/JSObject.cpp: (JSC::JSObject::visitButterfly): (JSC::JSObject::visitChildren): (JSC::JSFinalObject::visitChildren): (JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists): (JSC::JSObject::createInitialUndecided): (JSC::JSObject::createInitialInt32): (JSC::JSObject::createInitialDouble): (JSC::JSObject::createInitialContiguous): (JSC::JSObject::createArrayStorage): (JSC::JSObject::convertUndecidedToArrayStorage): (JSC::JSObject::convertInt32ToArrayStorage): (JSC::JSObject::convertDoubleToArrayStorage): (JSC::JSObject::convertContiguousToArrayStorage): (JSC::JSObject::deleteProperty): (JSC::JSObject::defineOwnIndexedProperty): (JSC::JSObject::increaseVectorLength): (JSC::JSObject::ensureLengthSlow): (JSC::JSObject::reallocateAndShrinkButterfly): (JSC::JSObject::allocateMoreOutOfLineStorage): (JSC::JSObject::shiftButterflyAfterFlattening): (JSC::JSObject::growOutOfLineStorage): Deleted. * runtime/JSObject.h: (JSC::JSFinalObject::JSFinalObject): (JSC::JSObject::setButterfly): (JSC::JSObject::getOwnNonIndexPropertySlot): (JSC::JSObject::fillCustomGetterPropertySlot): (JSC::JSObject::getOwnPropertySlot): (JSC::JSObject::getPropertySlot): (JSC::JSObject::setStructureAndButterfly): Deleted. (JSC::JSObject::setButterflyWithoutChangingStructure): Deleted. (JSC::JSObject::putDirectInternal): Deleted. (JSC::JSObject::putDirectWithoutTransition): Deleted. * runtime/JSObjectInlines.h: (JSC::JSObject::getPropertySlot): (JSC::JSObject::getNonIndexPropertySlot): (JSC::JSObject::putDirectWithoutTransition): (JSC::JSObject::putDirectInternal): * runtime/Options.h: * runtime/SparseArrayValueMap.h: * runtime/Structure.cpp: (JSC::Structure::dumpStatistics): (JSC::Structure::findStructuresAndMapForMaterialization): (JSC::Structure::materializePropertyTable): (JSC::Structure::addNewPropertyTransition): (JSC::Structure::changePrototypeTransition): (JSC::Structure::attributeChangeTransition): (JSC::Structure::toDictionaryTransition): (JSC::Structure::takePropertyTableOrCloneIfPinned): (JSC::Structure::nonPropertyTransition): (JSC::Structure::isSealed): (JSC::Structure::isFrozen): (JSC::Structure::flattenDictionaryStructure): (JSC::Structure::pin): (JSC::Structure::pinForCaching): (JSC::Structure::willStoreValueSlow): (JSC::Structure::copyPropertyTableForPinning): (JSC::Structure::add): (JSC::Structure::remove): (JSC::Structure::getPropertyNamesFromStructure): (JSC::Structure::visitChildren): (JSC::Structure::materializePropertyMap): Deleted. (JSC::Structure::addPropertyWithoutTransition): Deleted. (JSC::Structure::removePropertyWithoutTransition): Deleted. (JSC::Structure::copyPropertyTable): Deleted. (JSC::Structure::createPropertyMap): Deleted. (JSC::PropertyTable::checkConsistency): Deleted. (JSC::Structure::checkConsistency): Deleted. * runtime/Structure.h: * runtime/StructureIDBlob.h: (JSC::StructureIDBlob::StructureIDBlob): (JSC::StructureIDBlob::indexingTypeIncludingHistory): (JSC::StructureIDBlob::setIndexingTypeIncludingHistory): (JSC::StructureIDBlob::indexingTypeIncludingHistoryOffset): (JSC::StructureIDBlob::indexingType): Deleted. (JSC::StructureIDBlob::setIndexingType): Deleted. (JSC::StructureIDBlob::indexingTypeOffset): Deleted. * runtime/StructureInlines.h: (JSC::Structure::get): (JSC::Structure::checkOffsetConsistency): (JSC::Structure::checkConsistency): (JSC::Structure::add): (JSC::Structure::remove): (JSC::Structure::addPropertyWithoutTransition): (JSC::Structure::removePropertyWithoutTransition): (JSC::Structure::setPropertyTable): (JSC::Structure::putWillGrowOutOfLineStorage): Deleted. (JSC::Structure::propertyTable): Deleted. (JSC::Structure::suggestedNewOutOfLineStorageCapacity): Deleted. Source/WTF: The reason why I went to such great pains to make WTF::Lock fit in two bits is that I knew that I would eventually need to stuff one into some miscellaneous bits of the JSCell header. That time has come, because the concurrent GC has numerous race conditions in visitChildren that can be trivially fixed if each object just has an internal lock. Some cell types might use it to simply protect their entire visitChildren function and anything that mutates the fields it touches, while other cell types might use it as a "lock of last resort" to handle corner cases of an otherwise wait-free or lock-free algorithm. Right now, it's used to protect certain transformations involving indexing storage. To make this happen, I factored the WTF::Lock algorithm into a LockAlgorithm struct that is templatized on lock type (uint8_t for WTF::Lock), the isHeldBit value (1 for WTF::Lock), and the hasParkedBit value (2 for WTF::Lock). This could have been done as a templatized Lock class that basically contains Atomic<LockType>. You could then make any field into a lock by bitwise_casting it to TemplateLock<field type, bit1, bit2>. But this felt too dirty, so instead, LockAlgorithm has static methods that take Atomic<LockType>& as their first argument. I think that this makes it more natural to project a LockAlgorithm onto an existing Atomic<> field. Sadly, some places have to cast their non-Atomic<> field to Atomic<> in order for this to work. Like so many other things we do, this just shows that the C++ style of labeling fields that are subject to atomic ops as atomic is counterproductive. Maybe some day I'll change LockAlgorithm to use our other Atomics API, which does not require Atomic<>. WTF::Lock now uses LockAlgorithm. The slow paths are still outlined. I don't feel too bad about the LockAlgorithm.h header being included in so many places because we change that algorithm so infrequently. Also, I added a hasElapsed(time) function. This function makes it so much more natural to write timeslicing code, which the concurrent GC has to do a lot of. * WTF.xcodeproj/project.pbxproj: * wtf/CMakeLists.txt: * wtf/ListDump.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): (WTF::LockBase::unlockFairlySlow): (WTF::LockBase::unlockSlowImpl): Deleted. * wtf/Lock.h: (WTF::LockBase::lock): (WTF::LockBase::tryLock): (WTF::LockBase::unlock): (WTF::LockBase::unlockFairly): (WTF::LockBase::isHeld): (): Deleted. * wtf/LockAlgorithm.h: Added. (WTF::LockAlgorithm::lockFastAssumingZero): (WTF::LockAlgorithm::lockFast): (WTF::LockAlgorithm::lock): (WTF::LockAlgorithm::tryLock): (WTF::LockAlgorithm::unlockFastAssumingZero): (WTF::LockAlgorithm::unlockFast): (WTF::LockAlgorithm::unlock): (WTF::LockAlgorithm::unlockFairly): (WTF::LockAlgorithm::isLocked): (WTF::LockAlgorithm::lockSlow): (WTF::LockAlgorithm::unlockSlow): * wtf/TimeWithDynamicClockType.cpp: (WTF::hasElapsed): * wtf/TimeWithDynamicClockType.h: Canonical link: https://commits.webkit.org/182434@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@208720 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-11-15 01:49:22 +00:00
DefaultLockAlgorithm::unlockSlow(m_byte, DefaultLockAlgorithm::Unfair);
WTF::Lock should be fair eventually https://bugs.webkit.org/show_bug.cgi?id=159384 Reviewed by Geoffrey Garen. Source/WTF: In https://webkit.org/blog/6161/locking-in-webkit/ we showed how relaxing the fairness of locks makes them fast. That post presented lock fairness as a trade-off between two extremes: - Barging. A barging lock, like WTF::Lock, releases the lock in unlock() even if there was a thread on the queue. If there was a thread on the queue, the lock is released and that thread is made runnable. That thread may then grab the lock, or some other thread may grab the lock first (it may barge). Usually, the barging thread is the thread that released the lock in the first place. This maximizes throughput but hurts fairness. There is no good theoretical bound on how unfair the lock may become, but empirical data suggests that it's fair enough for the cases we previously measured. - FIFO. A FIFO lock, like HandoffLock in ToyLocks.h, does not release the lock in unlock() if there is a thread waiting. If there is a thread waiting, unlock() will make that thread runnable and inform it that it now holds the lock. This ensures perfect round-robin fairness and allows us to reason theoretically about how long it may take for a thread to grab the lock. For example, if we know that only N threads are running and each one may contend on a critical section, and each one may hold the lock for at most S seconds, then the time it takes to grab the lock is N * S. Unfortunately, FIFO locks perform very badly in most cases. This is because for the common case of short critical sections, they force a context switch after each critical section if the lock is contended. This change makes WTF::Lock almost as fair as FIFO while still being as fast as barging. Thanks to this new algorithm, you can now have both of these things at the same time. This change makes WTF::Lock eventually fair. We can almost (more on the caveats below) guarantee that the time it takes to grab a lock is N * max(1ms, S). In other words, critical sections that are longer than 1ms are always fair. For shorter critical sections, the amount of time that any thread waits is 1ms times the number of threads. There are some caveats that arise from our use of randomness, but even then, in the limit as the critical section length goes to infinity, the lock becomes fair. The corner cases are unlikely to happen; our experiments show that the lock becomes exactly as fair as a FIFO lock for any critical section that is 1ms or longer. The fairness mechanism is broken into two parts. WTF::Lock can now choose to unlock a lock fairly or unfairly thanks to the new ParkingLot token mechanism. WTF::Lock knows when to use fair unlocking based on a timeout mechanism in ParkingLot called timeToBeFair. ParkingLot::unparkOne() and ParkingLot::parkConditionally() can now communicate with each other via a token. unparkOne() can pass a token, which parkConditionally() will return. This change also makes parkConditionally() a lot more precise about when it was unparked due to a call to unparkOne(). If unparkOne() is told that a thread was unparked then this thread is guaranteed to report that it was unparked rather than timing out, and that thread is guaranteed to get the token that unparkOne() passed. The token is an intptr_t. We use it as a boolean variable in WTF::Lock, but you could use it to pass arbitrary data structures. By default, the token is zero. WTF::Lock's unlock() will pass 1 as the token if it is doing fair unlocking. In that case, unlock() will not release the lock, and lock() will know that it holds the lock as soon as parkConditionally() returns. Note that this algorithm relies on unparkOne() invoking WTF::Lock's callback while the queue lock is held, so that WTF::Lock can make a decision about unlock strategy and inject a token while it has complete knowledge over the state of the queue. As such, it's not immediately obvious how to implement this algorithm on top of futexes. You really need ParkingLot! WTF::Lock does not use fair unlocking every time. We expose a new API, Lock::unlockFairly(), which forces the fair unlocking behavior. Additionally, ParkingLot now maintains a per-bucket stochastic fairness timeout. When the timeout fires, the unparkOne() callback sees UnparkResult::timeToBeFair = true. This timeout is set to be anywhere from 0ms to 1ms at random. When a dequeue happens and there are threads that actually get dequeued, we check if the time since the last unfair unlock (the last time timeToBeFair was set to true) is more than the timeout amount. If so, then we set timeToBeFair to true and reset the timeout. This means that in the absence of ParkingLot collisions, unfair unlocking is guaranteed to happen at least once per millisecond. It will happen at 2 KHz on average. If there are collisions, then each collision adds one millisecond to the worst case (and 0.5 ms to the average case). The reason why we don't just use a fixed 1ms timeout is that we want to avoid resonance. Imagine a program in which some thread acquires a lock at 1 KHz in-phase with the timeToBeFair timeout. Then this thread would be the benefactor of fairness to the detriment of everyone else. Randomness ensures that we aren't too fair to any one thread. Empirically, this is neutral on our major benchmarks like JetStream but it's an enormous improvement in LockFairnessTest. It's common for an unfair lock (either our BargingLock, the old WTF::Lock, any of the other futex-based locks that barge, or new os_unfair_lock) to allow only one thread to hold the lock during a whole second in which each thread is holding the lock for 1ms at a time. This is because in a barging lock, releasing a lock after holding it for 1ms and then reacquiring it immediately virtually ensures that none of the other threads can wake up in time to grab it before it's relocked. But the new WTF::Lock handles this case like a champ: each thread gets equal turns. Here's some data. If we launch 10 threads and have each of them run for 1 second while repeatedly holding a critical section for 1ms, then here's how many times each thread gets to hold the lock using the old WTF::Lock algorithm: 799, 6, 1, 1, 1, 1, 1, 1, 1, 1 One thread hogged the lock for almost the whole time! With the new WTF::Lock, the lock becomes totally fair: 80, 79, 79, 79, 79, 79, 79, 80, 80, 79 I don't know of anyone creating such an automatically-fair adaptive lock before, so I think that this is a pretty awesome advancement to the state of the art! This change is good for three reasons: - We do have long critical sections in WebKit and we don't want to have to worry about starvation. This reduces the likelihood that we will see starvation due to our lock strategy. - I was talking to ggaren about bmalloc's locking needs, and he wanted unlockFairly() or lockFairly() or some moral equivalent for the scavenger thread. - If we use a WTF::Lock to manage heap access in a multithreaded GC, we'll need the ability to unlock and relock without barging. * benchmarks/LockFairnessTest.cpp: (main): * benchmarks/ToyLocks.h: * wtf/Condition.h: (WTF::ConditionBase::waitUntil): (WTF::ConditionBase::notifyOne): * wtf/Lock.cpp: (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): (WTF::LockBase::unlockFairlySlow): (WTF::LockBase::unlockSlowImpl): * wtf/Lock.h: (WTF::LockBase::try_lock): (WTF::LockBase::unlock): (WTF::LockBase::unlockFairly): (WTF::LockBase::isHeld): (WTF::LockBase::isFullyReset): * wtf/ParkingLot.cpp: (WTF::ParkingLot::parkConditionallyImpl): (WTF::ParkingLot::unparkOne): (WTF::ParkingLot::unparkOneImpl): (WTF::ParkingLot::unparkAll): * wtf/ParkingLot.h: (WTF::ParkingLot::parkConditionally): (WTF::ParkingLot::compareAndPark): (WTF::ParkingLot::unparkOne): Tools: * TestWebKitAPI/Tests/WTF/ParkingLot.cpp: Canonical link: https://commits.webkit.org/178039@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@203350 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-07-18 18:32:52 +00:00
}
void Lock::unlockFairlySlow()
WTF::Lock should be fair eventually https://bugs.webkit.org/show_bug.cgi?id=159384 Reviewed by Geoffrey Garen. Source/WTF: In https://webkit.org/blog/6161/locking-in-webkit/ we showed how relaxing the fairness of locks makes them fast. That post presented lock fairness as a trade-off between two extremes: - Barging. A barging lock, like WTF::Lock, releases the lock in unlock() even if there was a thread on the queue. If there was a thread on the queue, the lock is released and that thread is made runnable. That thread may then grab the lock, or some other thread may grab the lock first (it may barge). Usually, the barging thread is the thread that released the lock in the first place. This maximizes throughput but hurts fairness. There is no good theoretical bound on how unfair the lock may become, but empirical data suggests that it's fair enough for the cases we previously measured. - FIFO. A FIFO lock, like HandoffLock in ToyLocks.h, does not release the lock in unlock() if there is a thread waiting. If there is a thread waiting, unlock() will make that thread runnable and inform it that it now holds the lock. This ensures perfect round-robin fairness and allows us to reason theoretically about how long it may take for a thread to grab the lock. For example, if we know that only N threads are running and each one may contend on a critical section, and each one may hold the lock for at most S seconds, then the time it takes to grab the lock is N * S. Unfortunately, FIFO locks perform very badly in most cases. This is because for the common case of short critical sections, they force a context switch after each critical section if the lock is contended. This change makes WTF::Lock almost as fair as FIFO while still being as fast as barging. Thanks to this new algorithm, you can now have both of these things at the same time. This change makes WTF::Lock eventually fair. We can almost (more on the caveats below) guarantee that the time it takes to grab a lock is N * max(1ms, S). In other words, critical sections that are longer than 1ms are always fair. For shorter critical sections, the amount of time that any thread waits is 1ms times the number of threads. There are some caveats that arise from our use of randomness, but even then, in the limit as the critical section length goes to infinity, the lock becomes fair. The corner cases are unlikely to happen; our experiments show that the lock becomes exactly as fair as a FIFO lock for any critical section that is 1ms or longer. The fairness mechanism is broken into two parts. WTF::Lock can now choose to unlock a lock fairly or unfairly thanks to the new ParkingLot token mechanism. WTF::Lock knows when to use fair unlocking based on a timeout mechanism in ParkingLot called timeToBeFair. ParkingLot::unparkOne() and ParkingLot::parkConditionally() can now communicate with each other via a token. unparkOne() can pass a token, which parkConditionally() will return. This change also makes parkConditionally() a lot more precise about when it was unparked due to a call to unparkOne(). If unparkOne() is told that a thread was unparked then this thread is guaranteed to report that it was unparked rather than timing out, and that thread is guaranteed to get the token that unparkOne() passed. The token is an intptr_t. We use it as a boolean variable in WTF::Lock, but you could use it to pass arbitrary data structures. By default, the token is zero. WTF::Lock's unlock() will pass 1 as the token if it is doing fair unlocking. In that case, unlock() will not release the lock, and lock() will know that it holds the lock as soon as parkConditionally() returns. Note that this algorithm relies on unparkOne() invoking WTF::Lock's callback while the queue lock is held, so that WTF::Lock can make a decision about unlock strategy and inject a token while it has complete knowledge over the state of the queue. As such, it's not immediately obvious how to implement this algorithm on top of futexes. You really need ParkingLot! WTF::Lock does not use fair unlocking every time. We expose a new API, Lock::unlockFairly(), which forces the fair unlocking behavior. Additionally, ParkingLot now maintains a per-bucket stochastic fairness timeout. When the timeout fires, the unparkOne() callback sees UnparkResult::timeToBeFair = true. This timeout is set to be anywhere from 0ms to 1ms at random. When a dequeue happens and there are threads that actually get dequeued, we check if the time since the last unfair unlock (the last time timeToBeFair was set to true) is more than the timeout amount. If so, then we set timeToBeFair to true and reset the timeout. This means that in the absence of ParkingLot collisions, unfair unlocking is guaranteed to happen at least once per millisecond. It will happen at 2 KHz on average. If there are collisions, then each collision adds one millisecond to the worst case (and 0.5 ms to the average case). The reason why we don't just use a fixed 1ms timeout is that we want to avoid resonance. Imagine a program in which some thread acquires a lock at 1 KHz in-phase with the timeToBeFair timeout. Then this thread would be the benefactor of fairness to the detriment of everyone else. Randomness ensures that we aren't too fair to any one thread. Empirically, this is neutral on our major benchmarks like JetStream but it's an enormous improvement in LockFairnessTest. It's common for an unfair lock (either our BargingLock, the old WTF::Lock, any of the other futex-based locks that barge, or new os_unfair_lock) to allow only one thread to hold the lock during a whole second in which each thread is holding the lock for 1ms at a time. This is because in a barging lock, releasing a lock after holding it for 1ms and then reacquiring it immediately virtually ensures that none of the other threads can wake up in time to grab it before it's relocked. But the new WTF::Lock handles this case like a champ: each thread gets equal turns. Here's some data. If we launch 10 threads and have each of them run for 1 second while repeatedly holding a critical section for 1ms, then here's how many times each thread gets to hold the lock using the old WTF::Lock algorithm: 799, 6, 1, 1, 1, 1, 1, 1, 1, 1 One thread hogged the lock for almost the whole time! With the new WTF::Lock, the lock becomes totally fair: 80, 79, 79, 79, 79, 79, 79, 80, 80, 79 I don't know of anyone creating such an automatically-fair adaptive lock before, so I think that this is a pretty awesome advancement to the state of the art! This change is good for three reasons: - We do have long critical sections in WebKit and we don't want to have to worry about starvation. This reduces the likelihood that we will see starvation due to our lock strategy. - I was talking to ggaren about bmalloc's locking needs, and he wanted unlockFairly() or lockFairly() or some moral equivalent for the scavenger thread. - If we use a WTF::Lock to manage heap access in a multithreaded GC, we'll need the ability to unlock and relock without barging. * benchmarks/LockFairnessTest.cpp: (main): * benchmarks/ToyLocks.h: * wtf/Condition.h: (WTF::ConditionBase::waitUntil): (WTF::ConditionBase::notifyOne): * wtf/Lock.cpp: (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): (WTF::LockBase::unlockFairlySlow): (WTF::LockBase::unlockSlowImpl): * wtf/Lock.h: (WTF::LockBase::try_lock): (WTF::LockBase::unlock): (WTF::LockBase::unlockFairly): (WTF::LockBase::isHeld): (WTF::LockBase::isFullyReset): * wtf/ParkingLot.cpp: (WTF::ParkingLot::parkConditionallyImpl): (WTF::ParkingLot::unparkOne): (WTF::ParkingLot::unparkOneImpl): (WTF::ParkingLot::unparkAll): * wtf/ParkingLot.h: (WTF::ParkingLot::parkConditionally): (WTF::ParkingLot::compareAndPark): (WTF::ParkingLot::unparkOne): Tools: * TestWebKitAPI/Tests/WTF/ParkingLot.cpp: Canonical link: https://commits.webkit.org/178039@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@203350 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-07-18 18:32:52 +00:00
{
// Heap allocations are forbidden on certain threads (e.g. audio rendering thread) for performance reasons so we need to
// explicitly allow the following allocation(s). In some rare cases, the unlockSlow() algorithm may cause allocations.
DisableMallocRestrictionsForCurrentThreadScope disableMallocRestrictions;
The GC should be optionally concurrent and disabled by default https://bugs.webkit.org/show_bug.cgi?id=164454 Reviewed by Geoffrey Garen. Source/JavaScriptCore: This started out as a patch to have the GC scan the stack at the end, and then the outage happened and I decided to pick a more aggresive target: give the GC a concurrent mode that can be enabled at runtime, and whose only effect is that it turns on the ResumeTheWorldScope. This gives our GC a really intuitive workflow: by default, the GC thread is running solo with the world stopped and the parallel markers converged and waiting. We have a parallel work scope to enable the parallel markers and now we have a ResumeTheWorldScope that will optionally resume the world and then stop it again. It's easy to make a concurrent GC that always instantly crashes. I can't promise that this one won't do that when you run it. I set a specific goal: I wanted to do >10 concurrent GCs in debug mode with generations, optimizing JITs, and parallel marking disabled. To reach this milestone, I needed to do a bunch of stuff: - The mutator needs a separate mark stack for the barrier, since it will mutate this stack concurrently to the collector's slot visitors. - The use of CellState to indicate whether an object is being scanned the first time or a subsequent time was racy. It fails spectacularly when a barrier is fired at the same time as visitChildren is running or if the barrier runs at the same time as the GC marks the same object. So, I split SlotVisitor's mark stacks. It's now the case that you know why you're being scanned by looking at which stack you came off of. - All of root marking must be in the collector fixpoint. I renamed markRoots to markToFixpoint. They say concurrency is hard, but the collector looks more intuitive this way. We never gained anything from forcing people to make a choice between scanning something in the fixpoint versus outside of it. Because root scanning is cheap, we can afford to do it repeatedly, which means all root scanning can now do constraint-based marking (like: I'll mark you if that thing is marked). - JSObject::visitChildren's scanning of the butterfly raced with property additions, indexed storage transitions and resizing, and a bunch of miscellaneous dirty butterfly reshaping functions - like the one that flattens a dictionary and some sneaky ArrayStorage transformations. Many of these can be fixed by using store-store fences in the mutator and load-load fences in the collector. I've adopted the rule that the collector must always see either a butterfly and structure that match or a newer butterfly with an older structure, where their age is just one transition apart. This can be achieved with fences. For the cases where it breaks down, I added a lock to every JSCell. This is a full-fledged WTF lock that we sneak into two available bits in the indexingType. See the WTF ChangeLog for details. The mutator fencing rules are as follows: - Store-store fence before and after setting the butterfly. - Store-store fence before setting structure if you had changed the shape of the butterfly. - Store-store fence after initializing all fields in an allocation. - A dictionary Structure can change in strange ways while the GC is trying to scan it. So, JSObject::visitChildren will now grab the object's structure's lock if the object's structure is a dictionary. Dictionary structures are 1:1 with their object, so this does not reduce GC parallelism (super unlikely that the GC will simultaneously scan an object from two threads). - The GC can blow away a Structure's property table at any time. As a small consolation, it's now holding the Structure's lock when it does so. But there was tons of code in Structure that uses DeferGC to prevent the GC from blowing away the property table. This doesn't work with concurrent GC, since DeferGC only means that the GC won't run its safepoint (i.e. stop-the-world code) in the DeferGC region. It will still do marking and it was the Structure::visitChildren that would delete the table. It turns out that Structure's reliance on the property table not being deleted was the product of code rot. We already had functions that would materialize the table on demand. We were simply making the mistake of saying: structure->materializePropertyMap(); ... structure->propertyTable()->things Instead of saying: PropertyTable* table = structure->ensurePropertyTable(); ... table->things Switching the code to use the latter idiom allowed me to simplify the code a lot while fixing the race. - The LLInt's get_by_val handling was broken because the indexing shape constants were wrong. Once I started putting more things into the IndexingType, that started causing crashes for me. So I fixed LLInt. That turned out to be a lot of work, since that code had rotted in subtle ways. This is a speed-up in SunSpider, probably because of the LLInt fix. This is neutral on Octane and Kraken. It's a smaller slow-down on LongSpider, but I think we can ignore that (we don't view LongSpider as an official benchmark). By default, the concurrent GC is disabled: in all of the places where it would have resumed the world to run marking concurrently to the mutator, it will just skip the resume step. When you enable concurrent GC (--useConcurrentGC=true), it can sometimes run Octane/splay to completion. It seems to perform quite well: on my machine, it improves both splay-throughput and splay-latency. It's probably unstable for other programs. * API/JSVirtualMachine.mm: (-[JSVirtualMachine isOldExternalObject:]): * assembler/MacroAssemblerARMv7.h: (JSC::MacroAssemblerARMv7::storeFence): * bytecode/InlineAccess.cpp: (JSC::InlineAccess::dumpCacheSizesAndCrash): (JSC::InlineAccess::generateSelfPropertyAccess): (JSC::InlineAccess::generateArrayLength): * bytecode/ObjectAllocationProfile.h: (JSC::ObjectAllocationProfile::offsetOfInlineCapacity): (JSC::ObjectAllocationProfile::ObjectAllocationProfile): (JSC::ObjectAllocationProfile::initialize): (JSC::ObjectAllocationProfile::inlineCapacity): (JSC::ObjectAllocationProfile::clear): * bytecode/PolymorphicAccess.cpp: (JSC::AccessCase::generateWithGuard): (JSC::AccessCase::generateImpl): * dfg/DFGArrayifySlowPathGenerator.h: * dfg/DFGClobberize.h: (JSC::DFG::clobberize): * dfg/DFGOSRExitCompiler32_64.cpp: (JSC::DFG::OSRExitCompiler::compileExit): * dfg/DFGOSRExitCompiler64.cpp: (JSC::DFG::OSRExitCompiler::compileExit): * dfg/DFGOperations.cpp: * dfg/DFGPlan.cpp: (JSC::DFG::Plan::markCodeBlocks): (JSC::DFG::Plan::rememberCodeBlocks): * dfg/DFGPlan.h: * dfg/DFGSpeculativeJIT.cpp: (JSC::DFG::SpeculativeJIT::emitAllocateRawObject): (JSC::DFG::SpeculativeJIT::checkArray): (JSC::DFG::SpeculativeJIT::arrayify): (JSC::DFG::SpeculativeJIT::compileMakeRope): (JSC::DFG::SpeculativeJIT::compileNewFunctionCommon): (JSC::DFG::SpeculativeJIT::compileCreateActivation): (JSC::DFG::SpeculativeJIT::compileCreateDirectArguments): (JSC::DFG::SpeculativeJIT::compileSpread): (JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage): (JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage): (JSC::DFG::SpeculativeJIT::compileNewStringObject): (JSC::DFG::SpeculativeJIT::compileNewTypedArray): (JSC::DFG::SpeculativeJIT::compileStoreBarrier): * dfg/DFGSpeculativeJIT64.cpp: (JSC::DFG::SpeculativeJIT::compile): (JSC::DFG::SpeculativeJIT::compileAllocateNewArrayWithSize): * dfg/DFGTierUpCheckInjectionPhase.cpp: (JSC::DFG::TierUpCheckInjectionPhase::run): * dfg/DFGWorklist.cpp: (JSC::DFG::Worklist::markCodeBlocks): (JSC::DFG::Worklist::rememberCodeBlocks): (JSC::DFG::markCodeBlocks): (JSC::DFG::completeAllPlansForVM): (JSC::DFG::rememberCodeBlocks): * dfg/DFGWorklist.h: * ftl/FTLAbstractHeapRepository.cpp: (JSC::FTL::AbstractHeapRepository::AbstractHeapRepository): (JSC::FTL::AbstractHeapRepository::computeRangesAndDecorateInstructions): * ftl/FTLAbstractHeapRepository.h: * ftl/FTLJITCode.cpp: (JSC::FTL::JITCode::~JITCode): * ftl/FTLLowerDFGToB3.cpp: (JSC::FTL::DFG::LowerDFGToB3::compilePutStructure): (JSC::FTL::DFG::LowerDFGToB3::compileCreateActivation): (JSC::FTL::DFG::LowerDFGToB3::compileNewFunction): (JSC::FTL::DFG::LowerDFGToB3::compileCreateDirectArguments): (JSC::FTL::DFG::LowerDFGToB3::compileCreateRest): (JSC::FTL::DFG::LowerDFGToB3::compileNewObject): (JSC::FTL::DFG::LowerDFGToB3::compileNewArray): (JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSpread): (JSC::FTL::DFG::LowerDFGToB3::compileSpread): (JSC::FTL::DFG::LowerDFGToB3::compileNewArrayBuffer): (JSC::FTL::DFG::LowerDFGToB3::compileNewArrayWithSize): (JSC::FTL::DFG::LowerDFGToB3::compileNewTypedArray): (JSC::FTL::DFG::LowerDFGToB3::compileMakeRope): (JSC::FTL::DFG::LowerDFGToB3::compileMultiPutByOffset): (JSC::FTL::DFG::LowerDFGToB3::compileMaterializeNewObject): (JSC::FTL::DFG::LowerDFGToB3::compileMaterializeCreateActivation): (JSC::FTL::DFG::LowerDFGToB3::splatWords): (JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage): (JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage): (JSC::FTL::DFG::LowerDFGToB3::allocateObject): (JSC::FTL::DFG::LowerDFGToB3::isArrayType): (JSC::FTL::DFG::LowerDFGToB3::emitStoreBarrier): (JSC::FTL::DFG::LowerDFGToB3::mutatorFence): (JSC::FTL::DFG::LowerDFGToB3::setButterfly): * ftl/FTLOSRExitCompiler.cpp: (JSC::FTL::compileStub): * ftl/FTLOutput.cpp: (JSC::FTL::Output::signExt32ToPtr): (JSC::FTL::Output::fence): * ftl/FTLOutput.h: * heap/CellState.h: * heap/GCSegmentedArray.h: * heap/Heap.cpp: (JSC::Heap::ResumeTheWorldScope::ResumeTheWorldScope): (JSC::Heap::ResumeTheWorldScope::~ResumeTheWorldScope): (JSC::Heap::Heap): (JSC::Heap::~Heap): (JSC::Heap::harvestWeakReferences): (JSC::Heap::finalizeUnconditionalFinalizers): (JSC::Heap::completeAllJITPlans): (JSC::Heap::markToFixpoint): (JSC::Heap::gatherStackRoots): (JSC::Heap::beginMarking): (JSC::Heap::visitConservativeRoots): (JSC::Heap::visitCompilerWorklistWeakReferences): (JSC::Heap::updateObjectCounts): (JSC::Heap::endMarking): (JSC::Heap::addToRememberedSet): (JSC::Heap::collectInThread): (JSC::Heap::stopTheWorld): (JSC::Heap::resumeTheWorld): (JSC::Heap::setGCDidJIT): (JSC::Heap::setNeedFinalize): (JSC::Heap::setMutatorWaiting): (JSC::Heap::clearMutatorWaiting): (JSC::Heap::finalize): (JSC::Heap::flushWriteBarrierBuffer): (JSC::Heap::writeBarrierSlowPath): (JSC::Heap::canCollect): (JSC::Heap::reportExtraMemoryVisited): (JSC::Heap::reportExternalMemoryVisited): (JSC::Heap::notifyIsSafeToCollect): (JSC::Heap::markRoots): Deleted. (JSC::Heap::visitExternalRememberedSet): Deleted. (JSC::Heap::visitSmallStrings): Deleted. (JSC::Heap::visitProtectedObjects): Deleted. (JSC::Heap::visitArgumentBuffers): Deleted. (JSC::Heap::visitException): Deleted. (JSC::Heap::visitStrongHandles): Deleted. (JSC::Heap::visitHandleStack): Deleted. (JSC::Heap::visitSamplingProfiler): Deleted. (JSC::Heap::visitTypeProfiler): Deleted. (JSC::Heap::visitShadowChicken): Deleted. (JSC::Heap::traceCodeBlocksAndJITStubRoutines): Deleted. (JSC::Heap::visitWeakHandles): Deleted. (JSC::Heap::flushOldStructureIDTables): Deleted. (JSC::Heap::stopAllocation): Deleted. * heap/Heap.h: (JSC::Heap::collectorSlotVisitor): (JSC::Heap::mutatorMarkStack): (JSC::Heap::mutatorShouldBeFenced): (JSC::Heap::addressOfMutatorShouldBeFenced): (JSC::Heap::slotVisitor): Deleted. (JSC::Heap::notifyIsSafeToCollect): Deleted. (JSC::Heap::barrierShouldBeFenced): Deleted. (JSC::Heap::addressOfBarrierShouldBeFenced): Deleted. * heap/MarkStack.cpp: (JSC::MarkStackArray::transferTo): * heap/MarkStack.h: * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::tryAllocateIn): * heap/MarkedBlock.cpp: (JSC::MarkedBlock::MarkedBlock): (JSC::MarkedBlock::Handle::specializedSweep): (JSC::MarkedBlock::Handle::sweep): (JSC::MarkedBlock::Handle::sweepHelperSelectMarksMode): (JSC::MarkedBlock::Handle::stopAllocating): (JSC::MarkedBlock::Handle::resumeAllocating): (JSC::MarkedBlock::aboutToMarkSlow): (JSC::MarkedBlock::Handle::didConsumeFreeList): (JSC::SetNewlyAllocatedFunctor::SetNewlyAllocatedFunctor): Deleted. (JSC::SetNewlyAllocatedFunctor::operator()): Deleted. * heap/MarkedBlock.h: * heap/MarkedSpace.cpp: (JSC::MarkedSpace::resumeAllocating): * heap/SlotVisitor.cpp: (JSC::SlotVisitor::SlotVisitor): (JSC::SlotVisitor::~SlotVisitor): (JSC::SlotVisitor::reset): (JSC::SlotVisitor::clearMarkStacks): (JSC::SlotVisitor::appendJSCellOrAuxiliary): (JSC::SlotVisitor::setMarkedAndAppendToMarkStack): (JSC::SlotVisitor::appendToMarkStack): (JSC::SlotVisitor::appendToMutatorMarkStack): (JSC::SlotVisitor::visitChildren): (JSC::SlotVisitor::donateKnownParallel): (JSC::SlotVisitor::drain): (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::containsOpaqueRoot): (JSC::SlotVisitor::donateAndDrain): (JSC::SlotVisitor::mergeOpaqueRoots): (JSC::SlotVisitor::dump): (JSC::SlotVisitor::clearMarkStack): Deleted. (JSC::SlotVisitor::opaqueRootCount): Deleted. * heap/SlotVisitor.h: (JSC::SlotVisitor::collectorMarkStack): (JSC::SlotVisitor::mutatorMarkStack): (JSC::SlotVisitor::isEmpty): (JSC::SlotVisitor::bytesVisited): (JSC::SlotVisitor::markStack): Deleted. (JSC::SlotVisitor::bytesCopied): Deleted. * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::reportExtraMemoryVisited): (JSC::SlotVisitor::reportExternalMemoryVisited): * jit/AssemblyHelpers.cpp: (JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo): * jit/AssemblyHelpers.h: (JSC::AssemblyHelpers::emitStoreStructureWithTypeInfo): (JSC::AssemblyHelpers::barrierStoreLoadFence): (JSC::AssemblyHelpers::mutatorFence): (JSC::AssemblyHelpers::storeButterfly): (JSC::AssemblyHelpers::jumpIfMutatorFenceNotNeeded): (JSC::AssemblyHelpers::emitInitializeInlineStorage): (JSC::AssemblyHelpers::emitInitializeOutOfLineStorage): (JSC::AssemblyHelpers::jumpIfBarrierStoreLoadFenceNotNeeded): Deleted. * jit/JITInlines.h: (JSC::JIT::emitArrayProfilingSiteWithCell): * jit/JITOperations.cpp: * jit/JITPropertyAccess.cpp: (JSC::JIT::emit_op_put_to_scope): (JSC::JIT::emit_op_put_to_arguments): * llint/LLIntData.cpp: (JSC::LLInt::Data::performAssertions): * llint/LowLevelInterpreter.asm: * llint/LowLevelInterpreter64.asm: * runtime/ButterflyInlines.h: (JSC::Butterfly::create): (JSC::Butterfly::createOrGrowPropertyStorage): * runtime/ConcurrentJITLock.h: (JSC::GCSafeConcurrentJITLocker::NoDefer::NoDefer): Deleted. * runtime/GenericArgumentsInlines.h: (JSC::GenericArguments<Type>::getOwnPropertySlotByIndex): (JSC::GenericArguments<Type>::putByIndex): * runtime/IndexingType.h: * runtime/JSArray.cpp: (JSC::JSArray::unshiftCountSlowCase): (JSC::JSArray::unshiftCountWithArrayStorage): * runtime/JSCell.h: (JSC::JSCell::InternalLocker::InternalLocker): (JSC::JSCell::InternalLocker::~InternalLocker): (JSC::JSCell::atomicCompareExchangeCellStateWeakRelaxed): (JSC::JSCell::atomicCompareExchangeCellStateStrong): (JSC::JSCell::indexingTypeAndMiscOffset): (JSC::JSCell::indexingTypeOffset): Deleted. * runtime/JSCellInlines.h: (JSC::JSCell::JSCell): (JSC::JSCell::finishCreation): (JSC::JSCell::indexingTypeAndMisc): (JSC::JSCell::indexingType): (JSC::JSCell::setStructure): (JSC::JSCell::callDestructor): (JSC::JSCell::lockInternalLock): (JSC::JSCell::unlockInternalLock): * runtime/JSObject.cpp: (JSC::JSObject::visitButterfly): (JSC::JSObject::visitChildren): (JSC::JSFinalObject::visitChildren): (JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists): (JSC::JSObject::createInitialUndecided): (JSC::JSObject::createInitialInt32): (JSC::JSObject::createInitialDouble): (JSC::JSObject::createInitialContiguous): (JSC::JSObject::createArrayStorage): (JSC::JSObject::convertUndecidedToArrayStorage): (JSC::JSObject::convertInt32ToArrayStorage): (JSC::JSObject::convertDoubleToArrayStorage): (JSC::JSObject::convertContiguousToArrayStorage): (JSC::JSObject::deleteProperty): (JSC::JSObject::defineOwnIndexedProperty): (JSC::JSObject::increaseVectorLength): (JSC::JSObject::ensureLengthSlow): (JSC::JSObject::reallocateAndShrinkButterfly): (JSC::JSObject::allocateMoreOutOfLineStorage): (JSC::JSObject::shiftButterflyAfterFlattening): (JSC::JSObject::growOutOfLineStorage): Deleted. * runtime/JSObject.h: (JSC::JSFinalObject::JSFinalObject): (JSC::JSObject::setButterfly): (JSC::JSObject::getOwnNonIndexPropertySlot): (JSC::JSObject::fillCustomGetterPropertySlot): (JSC::JSObject::getOwnPropertySlot): (JSC::JSObject::getPropertySlot): (JSC::JSObject::setStructureAndButterfly): Deleted. (JSC::JSObject::setButterflyWithoutChangingStructure): Deleted. (JSC::JSObject::putDirectInternal): Deleted. (JSC::JSObject::putDirectWithoutTransition): Deleted. * runtime/JSObjectInlines.h: (JSC::JSObject::getPropertySlot): (JSC::JSObject::getNonIndexPropertySlot): (JSC::JSObject::putDirectWithoutTransition): (JSC::JSObject::putDirectInternal): * runtime/Options.h: * runtime/SparseArrayValueMap.h: * runtime/Structure.cpp: (JSC::Structure::dumpStatistics): (JSC::Structure::findStructuresAndMapForMaterialization): (JSC::Structure::materializePropertyTable): (JSC::Structure::addNewPropertyTransition): (JSC::Structure::changePrototypeTransition): (JSC::Structure::attributeChangeTransition): (JSC::Structure::toDictionaryTransition): (JSC::Structure::takePropertyTableOrCloneIfPinned): (JSC::Structure::nonPropertyTransition): (JSC::Structure::isSealed): (JSC::Structure::isFrozen): (JSC::Structure::flattenDictionaryStructure): (JSC::Structure::pin): (JSC::Structure::pinForCaching): (JSC::Structure::willStoreValueSlow): (JSC::Structure::copyPropertyTableForPinning): (JSC::Structure::add): (JSC::Structure::remove): (JSC::Structure::getPropertyNamesFromStructure): (JSC::Structure::visitChildren): (JSC::Structure::materializePropertyMap): Deleted. (JSC::Structure::addPropertyWithoutTransition): Deleted. (JSC::Structure::removePropertyWithoutTransition): Deleted. (JSC::Structure::copyPropertyTable): Deleted. (JSC::Structure::createPropertyMap): Deleted. (JSC::PropertyTable::checkConsistency): Deleted. (JSC::Structure::checkConsistency): Deleted. * runtime/Structure.h: * runtime/StructureIDBlob.h: (JSC::StructureIDBlob::StructureIDBlob): (JSC::StructureIDBlob::indexingTypeIncludingHistory): (JSC::StructureIDBlob::setIndexingTypeIncludingHistory): (JSC::StructureIDBlob::indexingTypeIncludingHistoryOffset): (JSC::StructureIDBlob::indexingType): Deleted. (JSC::StructureIDBlob::setIndexingType): Deleted. (JSC::StructureIDBlob::indexingTypeOffset): Deleted. * runtime/StructureInlines.h: (JSC::Structure::get): (JSC::Structure::checkOffsetConsistency): (JSC::Structure::checkConsistency): (JSC::Structure::add): (JSC::Structure::remove): (JSC::Structure::addPropertyWithoutTransition): (JSC::Structure::removePropertyWithoutTransition): (JSC::Structure::setPropertyTable): (JSC::Structure::putWillGrowOutOfLineStorage): Deleted. (JSC::Structure::propertyTable): Deleted. (JSC::Structure::suggestedNewOutOfLineStorageCapacity): Deleted. Source/WTF: The reason why I went to such great pains to make WTF::Lock fit in two bits is that I knew that I would eventually need to stuff one into some miscellaneous bits of the JSCell header. That time has come, because the concurrent GC has numerous race conditions in visitChildren that can be trivially fixed if each object just has an internal lock. Some cell types might use it to simply protect their entire visitChildren function and anything that mutates the fields it touches, while other cell types might use it as a "lock of last resort" to handle corner cases of an otherwise wait-free or lock-free algorithm. Right now, it's used to protect certain transformations involving indexing storage. To make this happen, I factored the WTF::Lock algorithm into a LockAlgorithm struct that is templatized on lock type (uint8_t for WTF::Lock), the isHeldBit value (1 for WTF::Lock), and the hasParkedBit value (2 for WTF::Lock). This could have been done as a templatized Lock class that basically contains Atomic<LockType>. You could then make any field into a lock by bitwise_casting it to TemplateLock<field type, bit1, bit2>. But this felt too dirty, so instead, LockAlgorithm has static methods that take Atomic<LockType>& as their first argument. I think that this makes it more natural to project a LockAlgorithm onto an existing Atomic<> field. Sadly, some places have to cast their non-Atomic<> field to Atomic<> in order for this to work. Like so many other things we do, this just shows that the C++ style of labeling fields that are subject to atomic ops as atomic is counterproductive. Maybe some day I'll change LockAlgorithm to use our other Atomics API, which does not require Atomic<>. WTF::Lock now uses LockAlgorithm. The slow paths are still outlined. I don't feel too bad about the LockAlgorithm.h header being included in so many places because we change that algorithm so infrequently. Also, I added a hasElapsed(time) function. This function makes it so much more natural to write timeslicing code, which the concurrent GC has to do a lot of. * WTF.xcodeproj/project.pbxproj: * wtf/CMakeLists.txt: * wtf/ListDump.h: * wtf/Lock.cpp: (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): (WTF::LockBase::unlockFairlySlow): (WTF::LockBase::unlockSlowImpl): Deleted. * wtf/Lock.h: (WTF::LockBase::lock): (WTF::LockBase::tryLock): (WTF::LockBase::unlock): (WTF::LockBase::unlockFairly): (WTF::LockBase::isHeld): (): Deleted. * wtf/LockAlgorithm.h: Added. (WTF::LockAlgorithm::lockFastAssumingZero): (WTF::LockAlgorithm::lockFast): (WTF::LockAlgorithm::lock): (WTF::LockAlgorithm::tryLock): (WTF::LockAlgorithm::unlockFastAssumingZero): (WTF::LockAlgorithm::unlockFast): (WTF::LockAlgorithm::unlock): (WTF::LockAlgorithm::unlockFairly): (WTF::LockAlgorithm::isLocked): (WTF::LockAlgorithm::lockSlow): (WTF::LockAlgorithm::unlockSlow): * wtf/TimeWithDynamicClockType.cpp: (WTF::hasElapsed): * wtf/TimeWithDynamicClockType.h: Canonical link: https://commits.webkit.org/182434@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@208720 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-11-15 01:49:22 +00:00
DefaultLockAlgorithm::unlockSlow(m_byte, DefaultLockAlgorithm::Fair);
Lightweight locks should be adaptive https://bugs.webkit.org/show_bug.cgi?id=147545 Reviewed by Geoffrey Garen. Source/JavaScriptCore: * dfg/DFGCommon.cpp: (JSC::DFG::startCrashing): * heap/CopiedBlock.h: (JSC::CopiedBlock::workListLock): * heap/CopiedBlockInlines.h: (JSC::CopiedBlock::shouldReportLiveBytes): (JSC::CopiedBlock::reportLiveBytes): * heap/CopiedSpace.cpp: (JSC::CopiedSpace::doneFillingBlock): * heap/CopiedSpace.h: (JSC::CopiedSpace::CopiedGeneration::CopiedGeneration): * heap/CopiedSpaceInlines.h: (JSC::CopiedSpace::recycleEvacuatedBlock): * heap/GCThreadSharedData.cpp: (JSC::GCThreadSharedData::didStartCopying): * heap/GCThreadSharedData.h: (JSC::GCThreadSharedData::getNextBlocksToCopy): * heap/ListableHandler.h: (JSC::ListableHandler::List::addThreadSafe): (JSC::ListableHandler::List::addNotThreadSafe): * heap/MachineStackMarker.cpp: (JSC::MachineThreads::tryCopyOtherThreadStacks): * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::copyLater): * parser/SourceProvider.cpp: (JSC::SourceProvider::~SourceProvider): (JSC::SourceProvider::getID): * profiler/ProfilerDatabase.cpp: (JSC::Profiler::Database::addDatabaseToAtExit): (JSC::Profiler::Database::removeDatabaseFromAtExit): (JSC::Profiler::Database::removeFirstAtExitDatabase): * runtime/TypeProfilerLog.h: Source/WebCore: * bindings/objc/WebScriptObject.mm: (WebCore::getJSWrapper): (WebCore::addJSWrapper): (WebCore::removeJSWrapper): (WebCore::removeJSWrapperIfRetainCountOne): * platform/audio/mac/CARingBuffer.cpp: (WebCore::CARingBuffer::setCurrentFrameBounds): (WebCore::CARingBuffer::getCurrentFrameBounds): * platform/audio/mac/CARingBuffer.h: * platform/ios/wak/WAKWindow.mm: (-[WAKWindow setExposedScrollViewRect:]): (-[WAKWindow exposedScrollViewRect]): Source/WebKit2: * WebProcess/WebPage/EventDispatcher.cpp: (WebKit::EventDispatcher::clearQueuedTouchEventsForPage): (WebKit::EventDispatcher::getQueuedTouchEventsForPage): (WebKit::EventDispatcher::touchEvent): (WebKit::EventDispatcher::dispatchTouchEvents): * WebProcess/WebPage/EventDispatcher.h: * WebProcess/WebPage/ViewUpdateDispatcher.cpp: (WebKit::ViewUpdateDispatcher::visibleContentRectUpdate): (WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate): * WebProcess/WebPage/ViewUpdateDispatcher.h: Source/WTF: A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition overhead is lower than system locks and because they take dramatically less space than system locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock acquire is up to 10x faster and under microcontention - short critical section with two or more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4 bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit should continue to avoid system locks - they are just far too slow and far too big. But there is a problem with this idiom. System lock implementations will sleep a thread when it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU. In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for microcontention, but awful when the lock will not be released for a while. In fact, when critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is almost 100x more than the CPU time cost of a system lock. This case doesn't arise too frequently in our current uses of spinlocks, but that's probably because right now there are places where we make a conscious decision to use system locks - even though they use more memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a while to acquire the lock. The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held, the slow path tries some number of spins to acquire the lock, and if that fails, the thread is put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and the lock itself is a tagged pointer: either it is just bits telling us the complete lock state (not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path if the lock is not contended, a short burst of spinning for microcontention, and a full-blown queue for critical sections that are held for a long time. On a locking microbenchmark, this new Lock exhibits the following performance characteristics: - Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster than a system mutex. - Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster than a system mutex. - CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a SpinLock. - Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than a SpinLock. This patch replaces all uses of SpinLock with Lock, since our critical sections are not no-ops so if you do basically anything in your critical section, the Lock overhead will be invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time. This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits of having a lock that just uses a byte are still better than the CPU wastage benefits of Lock. But, this work will enable some future work to create locks that will fit in just 1.6 bits: https://bugs.webkit.org/show_bug.cgi?id=147665. Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the possibility of a null queue head. [1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf * WTF.vcxproj/WTF.vcxproj: * WTF.xcodeproj/project.pbxproj: * benchmarks: Added. * benchmarks/LockSpeedTest.cpp: Added. (main): * wtf/Atomics.h: (WTF::Atomic::compareExchangeWeak): (WTF::Atomic::compareExchangeStrong): * wtf/CMakeLists.txt: * wtf/Lock.cpp: Added. (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): * wtf/Lock.h: Added. (WTF::LockBase::lock): (WTF::LockBase::unlock): (WTF::LockBase::isHeld): (WTF::LockBase::isLocked): (WTF::Lock::Lock): * wtf/MetaAllocator.cpp: (WTF::MetaAllocator::release): (WTF::MetaAllocatorHandle::shrink): (WTF::MetaAllocator::allocate): (WTF::MetaAllocator::currentStatistics): (WTF::MetaAllocator::addFreshFreeSpace): (WTF::MetaAllocator::debugFreeSpaceSize): * wtf/MetaAllocator.h: * wtf/SpinLock.h: * wtf/ThreadingPthreads.cpp: * wtf/ThreadingWin.cpp: * wtf/text/AtomicString.cpp: * wtf/text/AtomicStringImpl.cpp: (WTF::AtomicStringTableLocker::AtomicStringTableLocker): Tools: * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/Lock.cpp: Added. (TestWebKitAPI::runLockTest): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/165908@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
}
void Lock::safepointSlow()
PerformanceTests: Concurrent GC should be stable enough to land enabled https://bugs.webkit.org/show_bug.cgi?id=164990 Reviewed by Geoffrey Garen. Made CDjs more configurable and refined the "large.js" configuration. I was using that one and the new "long.js" configuration to tune concurrent eden GCs. Added a new way of running Splay in browser, which using chartjs to plot the execution times of 2000 iterations. This includes the minified chartjs. * JetStream/Octane2/splay-detail.html: Added. * JetStream/cdjs/benchmark.js: (benchmarkImpl): (benchmark): * JetStream/cdjs/long.js: Added. Source/JavaScriptCore: Concurrent GC should be stable enough to land enabled on X86_64 https://bugs.webkit.org/show_bug.cgi?id=164990 Reviewed by Geoffrey Garen. This fixes a ton of performance and correctness bugs revealed by getting the concurrent GC to be stable enough to land enabled. I had to redo the JSObject::visitChildren concurrency protocol again. This time I think it's even more correct than ever! This is an enormous win on JetStream/splay-latency and Octane/SplayLatency. It looks to be mostly neutral on everything else, though Speedometer is showing statistically weak signs of a slight regression. * API/JSAPIWrapperObject.mm: Added locking. (JSC::JSAPIWrapperObject::visitChildren): * API/JSCallbackObject.h: Added locking. (JSC::JSCallbackObjectData::visitChildren): (JSC::JSCallbackObjectData::JSPrivatePropertyMap::setPrivateProperty): (JSC::JSCallbackObjectData::JSPrivatePropertyMap::deletePrivateProperty): (JSC::JSCallbackObjectData::JSPrivatePropertyMap::visitChildren): * CMakeLists.txt: * JavaScriptCore.xcodeproj/project.pbxproj: * bytecode/CodeBlock.cpp: (JSC::CodeBlock::UnconditionalFinalizer::finalizeUnconditionally): This had a TOCTOU race on shouldJettisonDueToOldAge. (JSC::EvalCodeCache::visitAggregate): Moved to EvalCodeCache.cpp. * bytecode/DirectEvalCodeCache.cpp: Added. Outlined some functions and made them use locks. (JSC::DirectEvalCodeCache::setSlow): (JSC::DirectEvalCodeCache::clear): (JSC::DirectEvalCodeCache::visitAggregate): * bytecode/DirectEvalCodeCache.h: (JSC::DirectEvalCodeCache::set): (JSC::DirectEvalCodeCache::clear): Deleted. * bytecode/UnlinkedCodeBlock.cpp: Added locking. (JSC::UnlinkedCodeBlock::visitChildren): (JSC::UnlinkedCodeBlock::setInstructions): (JSC::UnlinkedCodeBlock::shrinkToFit): * bytecode/UnlinkedCodeBlock.h: Added locking. (JSC::UnlinkedCodeBlock::addRegExp): (JSC::UnlinkedCodeBlock::addConstant): (JSC::UnlinkedCodeBlock::addFunctionDecl): (JSC::UnlinkedCodeBlock::addFunctionExpr): (JSC::UnlinkedCodeBlock::createRareDataIfNecessary): (JSC::UnlinkedCodeBlock::shrinkToFit): Deleted. * debugger/Debugger.cpp: Use the right delete API. (JSC::Debugger::recompileAllJSFunctions): * dfg/DFGAbstractInterpreterInlines.h: (JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects): Fix a pre-existing bug in ToFunction constant folding. * dfg/DFGClobberize.h: Add support for nuking. (JSC::DFG::clobberize): * dfg/DFGClobbersExitState.cpp: Add support for nuking. (JSC::DFG::clobbersExitState): * dfg/DFGFixupPhase.cpp: Add support for nuking. (JSC::DFG::FixupPhase::fixupNode): (JSC::DFG::FixupPhase::indexForChecks): (JSC::DFG::FixupPhase::originForCheck): (JSC::DFG::FixupPhase::speculateForBarrier): (JSC::DFG::FixupPhase::insertCheck): (JSC::DFG::FixupPhase::fixupChecksInBlock): * dfg/DFGSpeculativeJIT.cpp: Add support for nuking. (JSC::DFG::SpeculativeJIT::compileAllocatePropertyStorage): (JSC::DFG::SpeculativeJIT::compileReallocatePropertyStorage): * ftl/FTLLowerDFGToB3.cpp: Add support for nuking. (JSC::FTL::DFG::LowerDFGToB3::allocatePropertyStorage): (JSC::FTL::DFG::LowerDFGToB3::reallocatePropertyStorage): (JSC::FTL::DFG::LowerDFGToB3::mutatorFence): (JSC::FTL::DFG::LowerDFGToB3::nukeStructureAndSetButterfly): (JSC::FTL::DFG::LowerDFGToB3::setButterfly): Deleted. * heap/CodeBlockSet.cpp: We need to be more careful about the CodeBlockSet workflow during GC, since we will allocate CodeBlocks in eden while collecting. (JSC::CodeBlockSet::clearMarksForFullCollection): (JSC::CodeBlockSet::deleteUnmarkedAndUnreferenced): * heap/Heap.cpp: Added code to measure max pauses. Added a better collectContinuously mode. (JSC::Heap::lastChanceToFinalize): Stop the collectContinuously thread. (JSC::Heap::harvestWeakReferences): Inline SlotVisitor::harvestWeakReferences. (JSC::Heap::finalizeUnconditionalFinalizers): Inline SlotVisitor::finalizeUnconditionalReferences. (JSC::Heap::markToFixpoint): We need to do some MarkedSpace stuff before every conservative scan, rather than just at the start of marking, so we now call prepareForConservativeScan() before each conservative scan. Also call a less-parallel version of drainInParallel when the mutator is running. (JSC::Heap::collectInThread): Inline Heap::prepareForAllocation(). (JSC::Heap::stopIfNecessarySlow): We need to be more careful about ensuring that we run finalization before and after stopping. Also, we should sanitize stack when stopping the world. (JSC::Heap::acquireAccessSlow): Add some optional debug prints. (JSC::Heap::handleNeedFinalize): Assert that we are running this when the world is not stopped. (JSC::Heap::finalize): Remove the old collectContinuously code. (JSC::Heap::requestCollection): We don't need to sanitize stack here anymore. (JSC::Heap::notifyIsSafeToCollect): Start the collectContinuously thread. It will request collection 1 KHz. (JSC::Heap::prepareForAllocation): Deleted. (JSC::Heap::preventCollection): Prevent any new concurrent GCs from being initiated. (JSC::Heap::allowCollection): (JSC::Heap::forEachSlotVisitor): Allows us to safely iterate slot visitors. * heap/Heap.h: * heap/HeapInlines.h: (JSC::Heap::writeBarrier): If the 'to' cell is not NewWhite then it could be AnthraciteOrBlack. During a full collection, objects may be AnthraciteOrBlack from a previous GC. Turns out, we don't benefit from this optimization so we can just kill it. * heap/HeapSnapshotBuilder.cpp: (JSC::HeapSnapshotBuilder::buildSnapshot): This needs to use PreventCollectionScope to ensure snapshot soundness. * heap/ListableHandler.h: (JSC::ListableHandler::isOnList): Useful helper. * heap/LockDuringMarking.h: (JSC::lockDuringMarking): It's a locker that only locks while we're marking. * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::addBlock): Hold the bitvector lock while resizing. * heap/MarkedBlock.cpp: Hold the bitvector lock while accessing the bitvectors while the mutator is running. * heap/MarkedSpace.cpp: (JSC::MarkedSpace::prepareForConservativeScan): We used to do this in prepareForMarking, but we need to do it before each conservative scan not just before marking. (JSC::MarkedSpace::prepareForMarking): Remove the logic moved to prepareForConservativeScan. * heap/MarkedSpace.h: * heap/PreventCollectionScope.h: Added. * heap/SlotVisitor.cpp: Refactored drainFromShared so that we can write a similar function called drainInParallelPassively. (JSC::SlotVisitor::updateMutatorIsStopped): Update whether we can use "fast" scanning. (JSC::SlotVisitor::mutatorIsStoppedIsUpToDate): (JSC::SlotVisitor::didReachTermination): (JSC::SlotVisitor::hasWork): (JSC::SlotVisitor::drain): This now uses the rightToRun lock to allow the main GC thread to safepoint the workers. (JSC::SlotVisitor::drainFromShared): (JSC::SlotVisitor::drainInParallelPassively): This runs marking with one fewer threads than normal. It's useful for when we have resumed the mutator, since then the mutator has a better chance of getting on a core. (JSC::SlotVisitor::addWeakReferenceHarvester): (JSC::SlotVisitor::addUnconditionalFinalizer): (JSC::SlotVisitor::harvestWeakReferences): Deleted. (JSC::SlotVisitor::finalizeUnconditionalFinalizers): Deleted. * heap/SlotVisitor.h: * heap/SlotVisitorInlines.h: Outline stuff. (JSC::SlotVisitor::addWeakReferenceHarvester): Deleted. (JSC::SlotVisitor::addUnconditionalFinalizer): Deleted. * runtime/InferredType.cpp: This needed thread safety. (JSC::InferredType::visitChildren): This needs to keep its structure finalizer alive until it runs. (JSC::InferredType::set): (JSC::InferredType::InferredStructureFinalizer::finalizeUnconditionally): * runtime/InferredType.h: * runtime/InferredValue.cpp: This needed thread safety. (JSC::InferredValue::visitChildren): (JSC::InferredValue::ValueCleanup::finalizeUnconditionally): * runtime/JSArray.cpp: (JSC::JSArray::unshiftCountSlowCase): Update to use new butterfly API. (JSC::JSArray::unshiftCountWithArrayStorage): Update to use new butterfly API. * runtime/JSArrayBufferView.cpp: (JSC::JSArrayBufferView::visitChildren): Thread safety. * runtime/JSCell.h: (JSC::JSCell::setStructureIDDirectly): This is used for nuking the structure. (JSC::JSCell::InternalLocker::InternalLocker): Deleted. The cell is now the lock. (JSC::JSCell::InternalLocker::~InternalLocker): Deleted. The cell is now the lock. * runtime/JSCellInlines.h: (JSC::JSCell::structure): Clean this up. (JSC::JSCell::lock): The cell is now the lock. (JSC::JSCell::tryLock): (JSC::JSCell::unlock): (JSC::JSCell::isLocked): (JSC::JSCell::lockInternalLock): Deleted. (JSC::JSCell::unlockInternalLock): Deleted. * runtime/JSFunction.cpp: (JSC::JSFunction::visitChildren): Thread safety. * runtime/JSGenericTypedArrayViewInlines.h: (JSC::JSGenericTypedArrayView<Adaptor>::visitChildren): Thread safety. (JSC::JSGenericTypedArrayView<Adaptor>::slowDownAndWasteMemory): Thread safety. * runtime/JSObject.cpp: (JSC::JSObject::markAuxiliaryAndVisitOutOfLineProperties): Factor out this "easy" step of butterfly visiting. (JSC::JSObject::visitButterfly): Make this achieve 100% precision about structure-butterfly relationships. This relies on the mutator "nuking" the structure prior to "locked" structure-butterfly transitions. (JSC::JSObject::visitChildren): Use the new, nicer API. (JSC::JSFinalObject::visitChildren): Use the new, nicer API. (JSC::JSObject::enterDictionaryIndexingModeWhenArrayStorageAlreadyExists): Use the new butterfly API. (JSC::JSObject::createInitialUndecided): Use the new butterfly API. (JSC::JSObject::createInitialInt32): Use the new butterfly API. (JSC::JSObject::createInitialDouble): Use the new butterfly API. (JSC::JSObject::createInitialContiguous): Use the new butterfly API. (JSC::JSObject::createArrayStorage): Use the new butterfly API. (JSC::JSObject::convertUndecidedToContiguous): Use the new butterfly API. (JSC::JSObject::convertUndecidedToArrayStorage): Use the new butterfly API. (JSC::JSObject::convertInt32ToArrayStorage): Use the new butterfly API. (JSC::JSObject::convertDoubleToContiguous): Use the new butterfly API. (JSC::JSObject::convertDoubleToArrayStorage): Use the new butterfly API. (JSC::JSObject::convertContiguousToArrayStorage): Use the new butterfly API. (JSC::JSObject::increaseVectorLength): Use the new butterfly API. (JSC::JSObject::shiftButterflyAfterFlattening): Use the new butterfly API. * runtime/JSObject.h: (JSC::JSObject::setButterfly): This now does all of the fences. Only use this when you are not also transitioning the structure or the structure's lastOffset. (JSC::JSObject::nukeStructureAndSetButterfly): Use this when doing locked structure-butterfly transitions. * runtime/JSObjectInlines.h: (JSC::JSObject::putDirectWithoutTransition): Use the newly factored out API. (JSC::JSObject::prepareToPutDirectWithoutTransition): Factor this out! (JSC::JSObject::putDirectInternal): Use the newly factored out API. * runtime/JSPropertyNameEnumerator.cpp: (JSC::JSPropertyNameEnumerator::finishCreation): Locks! (JSC::JSPropertyNameEnumerator::visitChildren): Locks! * runtime/JSSegmentedVariableObject.cpp: (JSC::JSSegmentedVariableObject::visitChildren): Locks! * runtime/JSString.cpp: (JSC::JSString::visitChildren): Thread safety. * runtime/ModuleProgramExecutable.cpp: (JSC::ModuleProgramExecutable::visitChildren): Thread safety. * runtime/Options.cpp: For now we disable concurrent GC on not-X86_64. (JSC::recomputeDependentOptions): * runtime/Options.h: Change the default max GC parallelism to 8. I don't know why it was still 7. * runtime/SamplingProfiler.cpp: (JSC::SamplingProfiler::stackTracesAsJSON): This needs to defer GC before grabbing its lock. * runtime/SparseArrayValueMap.cpp: This needed thread safety. (JSC::SparseArrayValueMap::add): (JSC::SparseArrayValueMap::remove): (JSC::SparseArrayValueMap::visitChildren): * runtime/SparseArrayValueMap.h: * runtime/Structure.cpp: This had a race between addNewPropertyTransition and visitChildren. (JSC::Structure::Structure): (JSC::Structure::materializePropertyTable): (JSC::Structure::addNewPropertyTransition): (JSC::Structure::flattenDictionaryStructure): (JSC::Structure::add): Help out with nuking support - the m_offset needs to play along. (JSC::Structure::visitChildren): * runtime/Structure.h: Make some useful things public - like the notion of a lastOffset. * runtime/StructureChain.cpp: (JSC::StructureChain::visitChildren): Thread safety! * runtime/StructureChain.h: Thread safety! * runtime/StructureIDTable.cpp: (JSC::StructureIDTable::allocateID): Ensure that we don't get nuked IDs. * runtime/StructureIDTable.h: Add the notion of a nuked ID! It's a bit that the runtime never sees except during specific shady actions like locked structure-butterfly transitions. "Nuking" tells the GC to steer clear and rescan once we fire the barrier. (JSC::nukedStructureIDBit): (JSC::nuke): (JSC::isNuked): (JSC::decontaminate): * runtime/StructureInlines.h: (JSC::Structure::hasIndexingHeader): Better API. (JSC::Structure::add): * runtime/VM.cpp: Better GC interaction. (JSC::VM::ensureWatchdog): (JSC::VM::deleteAllLinkedCode): (JSC::VM::deleteAllCode): * runtime/VM.h: (JSC::VM::getStructure): Why wasn't this always an API! * runtime/WebAssemblyExecutable.cpp: (JSC::WebAssemblyExecutable::visitChildren): Thread safety. Source/WebCore: Concurrent GC should be stable enough to land enabled on X86_64 https://bugs.webkit.org/show_bug.cgi?id=164990 Reviewed by Geoffrey Garen. Made WebCore down with concurrent marking by adding some locking and adapting to some new API. This has new test modes in run-sjc-stress-tests. Also, the way that LayoutTests run is already a fantastic GC test. * ForwardingHeaders/heap/DeleteAllCodeEffort.h: Added. * ForwardingHeaders/heap/LockDuringMarking.h: Added. * bindings/js/GCController.cpp: (WebCore::GCController::deleteAllCode): (WebCore::GCController::deleteAllLinkedCode): * bindings/js/GCController.h: * bindings/js/JSDOMBinding.cpp: (WebCore::getCachedDOMStructure): (WebCore::cacheDOMStructure): * bindings/js/JSDOMGlobalObject.cpp: (WebCore::JSDOMGlobalObject::addBuiltinGlobals): (WebCore::JSDOMGlobalObject::visitChildren): * bindings/js/JSDOMGlobalObject.h: (WebCore::getDOMConstructor): * bindings/js/JSDOMPromise.cpp: (WebCore::DeferredPromise::DeferredPromise): (WebCore::DeferredPromise::clear): * bindings/js/JSXPathResultCustom.cpp: (WebCore::JSXPathResult::visitAdditionalChildren): * dom/EventListenerMap.cpp: (WebCore::EventListenerMap::clear): (WebCore::EventListenerMap::replace): (WebCore::EventListenerMap::add): (WebCore::EventListenerMap::remove): (WebCore::EventListenerMap::find): (WebCore::EventListenerMap::removeFirstEventListenerCreatedFromMarkup): (WebCore::EventListenerMap::copyEventListenersNotCreatedFromMarkupToTarget): (WebCore::EventListenerIterator::EventListenerIterator): * dom/EventListenerMap.h: (WebCore::EventListenerMap::lock): * dom/EventTarget.cpp: (WebCore::EventTarget::visitJSEventListeners): * dom/EventTarget.h: (WebCore::EventTarget::visitJSEventListeners): Deleted. * dom/Node.cpp: (WebCore::Node::eventTargetDataConcurrently): (WebCore::Node::ensureEventTargetData): (WebCore::Node::clearEventTargetData): * dom/Node.h: * page/MemoryRelease.cpp: (WebCore::releaseCriticalMemory): * page/cocoa/MemoryReleaseCocoa.mm: (WebCore::jettisonExpensiveObjectsOnTopLevelNavigation): (WebCore::registerMemoryReleaseNotifyCallbacks): Source/WTF: Concurrent GC should be stable enough to land enabled on X86_64 https://bugs.webkit.org/show_bug.cgi?id=164990 Reviewed by Geoffrey Garen. Adds the ability to say: auto locker = holdLock(any type of lock) Instead of having to say: Locker<LockType> locker(locks of type LockType) I think that we should use "auto locker = holdLock(lock)" as the default way that we acquire locks unless we need to use a special locker type. This also adds the ability to safepoint a lock. Safepointing a lock is basically a super fast way of unlocking it fairly and then immediately relocking it - i.e. letting anyone who is waiting to run without losing steam of there is noone waiting. * wtf/Lock.cpp: (WTF::LockBase::safepointSlow): * wtf/Lock.h: (WTF::LockBase::safepoint): * wtf/LockAlgorithm.h: (WTF::LockAlgorithm::safepointFast): (WTF::LockAlgorithm::safepoint): (WTF::LockAlgorithm::safepointSlow): * wtf/Locker.h: (WTF::AbstractLocker::AbstractLocker): (WTF::Locker::tryLock): (WTF::Locker::operator bool): (WTF::Locker::Locker): (WTF::Locker::operator=): (WTF::holdLock): (WTF::tryHoldLock): Tools: Concurrent GC should be stable enough to land enabled https://bugs.webkit.org/show_bug.cgi?id=164990 Reviewed by Geoffrey Garen. Add a new mode that runs GC continuously. Also made eager modes run GC continuously. It's clear that this works just fine in release, but I'm still trying to figure out if it's safe for debug. It might be too slow for debug. * Scripts/run-jsc-stress-tests: Canonical link: https://commits.webkit.org/183229@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@209570 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2016-12-08 22:14:50 +00:00
{
DefaultLockAlgorithm::safepointSlow(m_byte);
}
bool Lock::tryLockWithTimeout(Seconds timeout)
{
// This function may be called from a signal handler (e.g. via visit()). Hence,
// it should only use APIs that are safe to call from signal handlers. This is
// why we use unistd.h's sleep() instead of its alternatives.
// We'll be doing sleep(1) between tries below. Hence, sleepPerRetry is 1.
unsigned maxRetries = (timeout < Seconds::infinity()) ? timeout.value() : std::numeric_limits<unsigned>::max();
unsigned tryCount = 0;
while (!tryLock() && tryCount++ <= maxRetries) {
#if OS(WINDOWS)
Sleep(1000);
#else
::sleep(1);
#endif
}
return isHeld();
}
Lightweight locks should be adaptive https://bugs.webkit.org/show_bug.cgi?id=147545 Reviewed by Geoffrey Garen. Source/JavaScriptCore: * dfg/DFGCommon.cpp: (JSC::DFG::startCrashing): * heap/CopiedBlock.h: (JSC::CopiedBlock::workListLock): * heap/CopiedBlockInlines.h: (JSC::CopiedBlock::shouldReportLiveBytes): (JSC::CopiedBlock::reportLiveBytes): * heap/CopiedSpace.cpp: (JSC::CopiedSpace::doneFillingBlock): * heap/CopiedSpace.h: (JSC::CopiedSpace::CopiedGeneration::CopiedGeneration): * heap/CopiedSpaceInlines.h: (JSC::CopiedSpace::recycleEvacuatedBlock): * heap/GCThreadSharedData.cpp: (JSC::GCThreadSharedData::didStartCopying): * heap/GCThreadSharedData.h: (JSC::GCThreadSharedData::getNextBlocksToCopy): * heap/ListableHandler.h: (JSC::ListableHandler::List::addThreadSafe): (JSC::ListableHandler::List::addNotThreadSafe): * heap/MachineStackMarker.cpp: (JSC::MachineThreads::tryCopyOtherThreadStacks): * heap/SlotVisitorInlines.h: (JSC::SlotVisitor::copyLater): * parser/SourceProvider.cpp: (JSC::SourceProvider::~SourceProvider): (JSC::SourceProvider::getID): * profiler/ProfilerDatabase.cpp: (JSC::Profiler::Database::addDatabaseToAtExit): (JSC::Profiler::Database::removeDatabaseFromAtExit): (JSC::Profiler::Database::removeFirstAtExitDatabase): * runtime/TypeProfilerLog.h: Source/WebCore: * bindings/objc/WebScriptObject.mm: (WebCore::getJSWrapper): (WebCore::addJSWrapper): (WebCore::removeJSWrapper): (WebCore::removeJSWrapperIfRetainCountOne): * platform/audio/mac/CARingBuffer.cpp: (WebCore::CARingBuffer::setCurrentFrameBounds): (WebCore::CARingBuffer::getCurrentFrameBounds): * platform/audio/mac/CARingBuffer.h: * platform/ios/wak/WAKWindow.mm: (-[WAKWindow setExposedScrollViewRect:]): (-[WAKWindow exposedScrollViewRect]): Source/WebKit2: * WebProcess/WebPage/EventDispatcher.cpp: (WebKit::EventDispatcher::clearQueuedTouchEventsForPage): (WebKit::EventDispatcher::getQueuedTouchEventsForPage): (WebKit::EventDispatcher::touchEvent): (WebKit::EventDispatcher::dispatchTouchEvents): * WebProcess/WebPage/EventDispatcher.h: * WebProcess/WebPage/ViewUpdateDispatcher.cpp: (WebKit::ViewUpdateDispatcher::visibleContentRectUpdate): (WebKit::ViewUpdateDispatcher::dispatchVisibleContentRectUpdate): * WebProcess/WebPage/ViewUpdateDispatcher.h: Source/WTF: A common idiom in WebKit is to use spinlocks. We use them because the lock acquisition overhead is lower than system locks and because they take dramatically less space than system locks. The speed and space advantages of spinlocks can be astonishing: an uncontended spinlock acquire is up to 10x faster and under microcontention - short critical section with two or more threads taking turns - spinlocks are up to 100x faster. Spinlocks take only 1 byte or 4 bytes depending on the flavor, while system locks take 64 bytes or more. Clearly, WebKit should continue to avoid system locks - they are just far too slow and far too big. But there is a problem with this idiom. System lock implementations will sleep a thread when it attempts to acquire a lock that is held, while spinlocks will cause the thread to burn CPU. In WebKit spinlocks, the thread will repeatedly call sched_yield(). This is awesome for microcontention, but awful when the lock will not be released for a while. In fact, when critical sections take tens of microseconds or more, the CPU time cost of our spinlocks is almost 100x more than the CPU time cost of a system lock. This case doesn't arise too frequently in our current uses of spinlocks, but that's probably because right now there are places where we make a conscious decision to use system locks - even though they use more memory and are slower - because we don't want to waste CPU cycles when a thread has to wait a while to acquire the lock. The solution is to just implement a modern adaptive mutex in WTF. Luckily, this isn't a new concept. This patch implements a mutex that is reminiscent of the kinds of low-overhead locks that JVMs use. The actual implementation here is inspired by some of the ideas from [1]. The idea is simple: the fast path is an inlined CAS to immediately acquire a lock that isn't held, the slow path tries some number of spins to acquire the lock, and if that fails, the thread is put on a queue and put to sleep. The queue is made up of statically allocated thread nodes and the lock itself is a tagged pointer: either it is just bits telling us the complete lock state (not held or held) or it is a pointer to the head of a queue of threads waiting to acquire the lock. This approach gives WTF::Lock three different levels of adaptation: an inlined fast path if the lock is not contended, a short burst of spinning for microcontention, and a full-blown queue for critical sections that are held for a long time. On a locking microbenchmark, this new Lock exhibits the following performance characteristics: - Lock+unlock on an uncontended no-op critical section: 2x slower than SpinLock and 3x faster than a system mutex. - Lock+unlock on a contended no-op critical section: 2x slower than SpinLock and 100x faster than a system mutex. - CPU time spent in lock() on a lock held for a while: same as system mutex, 90x less than a SpinLock. - Memory usage: sizeof(void*), so on 64-bit it's 8x less than a system mutex but 2x worse than a SpinLock. This patch replaces all uses of SpinLock with Lock, since our critical sections are not no-ops so if you do basically anything in your critical section, the Lock overhead will be invisible. Also, in all places where we used SpinLock, we could tolerate 8 bytes of overhead instead of 4. Performance benchmarking using JSC macrobenchmarks shows no difference, which is as it should be: the purpose of this change is to reduce CPU time wasted, not wallclock time. This patch doesn't replace any uses of ByteSpinLock, since we expect that the space benefits of having a lock that just uses a byte are still better than the CPU wastage benefits of Lock. But, this work will enable some future work to create locks that will fit in just 1.6 bits: https://bugs.webkit.org/show_bug.cgi?id=147665. Rolling this back in after fixing Lock::unlockSlow() for architectures that have a truly weak CAS. Since the Lock::unlock() fast path can go to slow path spuriously, it may go there even if there aren't any threads on the Lock's queue. So, unlockSlow() must be able to deal with the possibility of a null queue head. [1] http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf * WTF.vcxproj/WTF.vcxproj: * WTF.xcodeproj/project.pbxproj: * benchmarks: Added. * benchmarks/LockSpeedTest.cpp: Added. (main): * wtf/Atomics.h: (WTF::Atomic::compareExchangeWeak): (WTF::Atomic::compareExchangeStrong): * wtf/CMakeLists.txt: * wtf/Lock.cpp: Added. (WTF::LockBase::lockSlow): (WTF::LockBase::unlockSlow): * wtf/Lock.h: Added. (WTF::LockBase::lock): (WTF::LockBase::unlock): (WTF::LockBase::isHeld): (WTF::LockBase::isLocked): (WTF::Lock::Lock): * wtf/MetaAllocator.cpp: (WTF::MetaAllocator::release): (WTF::MetaAllocatorHandle::shrink): (WTF::MetaAllocator::allocate): (WTF::MetaAllocator::currentStatistics): (WTF::MetaAllocator::addFreshFreeSpace): (WTF::MetaAllocator::debugFreeSpaceSize): * wtf/MetaAllocator.h: * wtf/SpinLock.h: * wtf/ThreadingPthreads.cpp: * wtf/ThreadingWin.cpp: * wtf/text/AtomicString.cpp: * wtf/text/AtomicStringImpl.cpp: (WTF::AtomicStringTableLocker::AtomicStringTableLocker): Tools: * TestWebKitAPI/CMakeLists.txt: * TestWebKitAPI/TestWebKitAPI.vcxproj/TestWebKitAPI.vcxproj: * TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj: * TestWebKitAPI/Tests/WTF/Lock.cpp: Added. (TestWebKitAPI::runLockTest): (TestWebKitAPI::TEST): Canonical link: https://commits.webkit.org/165908@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@188169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
2015-08-07 22:38:59 +00:00
} // namespace WTF